Search | arXiv e-print repository

On the Nystrom Approximation for Preconditioning in Kernel Machines

Authors: Amirhesam Abedsoltan, Parthe Pandit, Luis Rademacher, Mikhail Belkin

Abstract: Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an important tool to speed-up the convergence of such iterative algorithms for training kernel models. However computing and storing a spectral precondi… ▽ More Kernel methods are a popular class of nonlinear predictive models in machine learning. Scalable algorithms for learning kernel models need to be iterative in nature, but convergence can be slow due to poor conditioning. Spectral preconditioning is an important tool to speed-up the convergence of such iterative algorithms for training kernel models. However computing and storing a spectral preconditioner can be expensive which can lead to large computational and storage overheads, precluding the application of kernel methods to problems with large datasets. A Nystrom approximation of the spectral preconditioner is often cheaper to compute and store, and has demonstrated success in practical applications. In this paper we analyze the trade-offs of using such an approximated preconditioner. Specifically, we show that a sample of logarithmic size (as a function of the size of the dataset) enables the Nystrom-based approximated preconditioner to accelerate gradient descent nearly as well as the exact preconditioner, while also reducing the computational and storage overheads. △ Less

Submitted 24 January, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

arXiv:2309.00570 [pdf, other]

Mechanism of feature learning in convolutional neural networks

Authors: Daniel Beaglehole, Adityanarayanan Radhakrishnan, Parthe Pandit, Mikhail Belkin

Abstract: Understanding the mechanism of how convolutional neural networks learn features from image data is a fundamental problem in machine learning and computer vision. In this work, we identify such a mechanism. We posit the Convolutional Neural Feature Ansatz, which states that covariances of filters in any convolutional layer are proportional to the average gradient outer product (AGOP) taken with res… ▽ More Understanding the mechanism of how convolutional neural networks learn features from image data is a fundamental problem in machine learning and computer vision. In this work, we identify such a mechanism. We posit the Convolutional Neural Feature Ansatz, which states that covariances of filters in any convolutional layer are proportional to the average gradient outer product (AGOP) taken with respect to patches of the input to that layer. We present extensive empirical evidence for our ansatz, including identifying high correlation between covariances of filters and patch-based AGOPs for convolutional layers in standard neural architectures, such as AlexNet, VGG, and ResNets pre-trained on ImageNet. We also provide supporting theoretical evidence. We then demonstrate the generality of our result by using the patch-based AGOP to enable deep feature learning in convolutional kernel machines. We refer to the resulting algorithm as (Deep) ConvRFM and show that our algorithm recovers similar features to deep convolutional networks including the notable emergence of edge detectors. Moreover, we find that Deep ConvRFM overcomes previously identified limitations of convolutional kernels, such as their inability to adapt to local signals in images and, as a result, leads to sizable performance improvement over fixed convolutional kernels. △ Less

Submitted 1 September, 2023; originally announced September 2023.

arXiv:2305.08277 [pdf, other]

Local Convergence of Gradient Descent-Ascent for Training Generative Adversarial Networks

Authors: Evan Becker, Parthe Pandit, Sundeep Rangan, Alyson K. Fletcher

Abstract: Generative Adversarial Networks (GANs) are a popular formulation to train generative models for complex high dimensional data. The standard method for training GANs involves a gradient descent-ascent (GDA) procedure on a minimax optimization problem. This procedure is hard to analyze in general due to the nonlinear nature of the dynamics. We study the local dynamics of GDA for training a GAN with… ▽ More Generative Adversarial Networks (GANs) are a popular formulation to train generative models for complex high dimensional data. The standard method for training GANs involves a gradient descent-ascent (GDA) procedure on a minimax optimization problem. This procedure is hard to analyze in general due to the nonlinear nature of the dynamics. We study the local dynamics of GDA for training a GAN with a kernel-based discriminator. This convergence analysis is based on a linearization of a non-linear dynamical system that describes the GDA iterations, under an \textit{isolated points model} assumption from [Becker et al. 2022]. Our analysis brings out the effect of the learning rates, regularization, and the bandwidth of the kernel discriminator, on the local convergence rate of GDA. Importantly, we show phase transitions that indicate when the system converges, oscillates, or diverges. We also provide numerical simulations that verify our claims. △ Less

Submitted 29 May, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

arXiv:2302.02605 [pdf, other]

Toward Large Kernel Models

Authors: Amirhesam Abedsoltan, Mikhail Belkin, Parthe Pandit

Abstract: Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas i… ▽ More Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods. △ Less

Submitted 19 June, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: Code is available at github.com/EigenPro/EigenPro3

arXiv:2212.13881 [pdf, other]

Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features

Authors: Adityanarayanan Radhakrishnan, Daniel Beaglehole, Parthe Pandit, Mikhail Belkin

Abstract: In recent years neural networks have achieved impressive results on many technological and scientific tasks. Yet, the mechanism through which these models automatically select features, or patterns in data, for prediction remains unclear. Identifying such a mechanism is key to advancing performance and interpretability of neural networks and promoting reliable adoption of these models in scientifi… ▽ More In recent years neural networks have achieved impressive results on many technological and scientific tasks. Yet, the mechanism through which these models automatically select features, or patterns in data, for prediction remains unclear. Identifying such a mechanism is key to advancing performance and interpretability of neural networks and promoting reliable adoption of these models in scientific applications. In this paper, we identify and characterize the mechanism through which deep fully connected neural networks learn features. We posit the Deep Neural Feature Ansatz, which states that neural feature learning occurs by implementing the average gradient outer product to up-weight features strongly related to model output. Our ansatz sheds light on various deep learning phenomena including emergence of spurious features and simplicity biases and how pruning networks can increase performance, the "lottery ticket hypothesis." Moreover, the mechanism identified in our work leads to a backpropagation-free method for feature learning with any machine learning model. To demonstrate the effectiveness of this feature learning mechanism, we use it to enable feature learning in classical, non-feature learning models known as kernel machines and show that the resulting models, which we refer to as Recursive Feature Machines, achieve state-of-the-art performance on tabular data. △ Less

Submitted 9 May, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

arXiv:2208.09938 [pdf, other]

Instability and Local Minima in GAN Training with Kernel Discriminators

Authors: Evan Becker, Parthe Pandit, Sundeep Rangan, Alyson K. Fletcher

Abstract: Generative Adversarial Networks (GANs) are a widely-used tool for generative modeling of complex data. Despite their empirical success, the training of GANs is not fully understood due to the min-max optimization of the generator and discriminator. This paper analyzes these joint dynamics when the true samples, as well as the generated samples, are discrete, finite sets, and the discriminator is k… ▽ More Generative Adversarial Networks (GANs) are a widely-used tool for generative modeling of complex data. Despite their empirical success, the training of GANs is not fully understood due to the min-max optimization of the generator and discriminator. This paper analyzes these joint dynamics when the true samples, as well as the generated samples, are discrete, finite sets, and the discriminator is kernel-based. A simple yet expressive framework for analyzing training called the $\textit{Isolated Points Model}$ is introduced. In the proposed model, the distance between true samples greatly exceeds the kernel width, so each generated point is influenced by at most one true point. Our model enables precise characterization of the conditions for convergence, both to good and bad minima. In particular, the analysis explains two common failure modes: (i) an approximate mode collapse and (ii) divergence. Numerical simulations are provided that predictably replicate these behaviors. △ Less

Submitted 21 August, 2022; originally announced August 2022.

arXiv:2207.06569 [pdf, other]

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

Authors: Neil Mallinar, James B. Simon, Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, Preetum Nakkiran

Abstract: The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body… ▽ More The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body of recent work has studied benign overfitting, a phenomenon where some interpolating methods approach Bayes optimality, even in the presence of noise. In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks do not fit benignly: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime. We call this intermediate regime tempered overfitting, and we initiate its systematic study. We first explore this phenomenon in the context of kernel (ridge) regression (KR) by obtaining conditions on the ridge parameter and kernel eigenspectrum under which KR exhibits each of the three behaviors. We find that kernels with powerlaw spectra, including Laplace kernels and ReLU neural tangent kernels, exhibit tempered overfitting. We then empirically study deep neural networks through the lens of our taxonomy, and find that those trained to interpolation are tempered, while those stopped early are benign. We hope our work leads to a more refined understanding of overfitting in modern learning. △ Less

Submitted 20 October, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

Comments: NM and JS co-first authors

arXiv:2206.15058 [pdf, other]

A note on Linear Bottleneck networks and their Transition to Multilinearity

Authors: Libin Zhu, Parthe Pandit, Mikhail Belkin

Abstract: Randomly initialized wide neural networks transition to linear functions of weights as the width grows, in a ball of radius $O(1)$ around initialization. A necessary condition for this result is that all layers of the network are wide enough, i.e., all widths tend to infinity. However, the transition to linearity breaks down when this infinite width assumption is violated. In this work we show tha… ▽ More Randomly initialized wide neural networks transition to linear functions of weights as the width grows, in a ball of radius $O(1)$ around initialization. A necessary condition for this result is that all layers of the network are wide enough, i.e., all widths tend to infinity. However, the transition to linearity breaks down when this infinite width assumption is violated. In this work we show that linear networks with a bottleneck layer learn bilinear functions of the weights, in a ball of radius $O(1)$ around initialization. In general, for $B-1$ bottleneck layers, the network is a degree $B$ multilinear function of weights. Importantly, the degree only depends on the number of bottlenecks and not the total depth of the network. △ Less

Submitted 30 June, 2022; originally announced June 2022.

arXiv:2205.13525 [pdf, other]

On the Inconsistency of Kernel Ridgeless Regression in Fixed Dimensions

Authors: Daniel Beaglehole, Mikhail Belkin, Parthe Pandit

Abstract: ``Benign overfitting'', the ability of certain algorithms to interpolate noisy training data and yet perform well out-of-sample, has been a topic of considerable recent interest. We show, using a fixed design setup, that an important class of predictors, kernel machines with translation-invariant kernels, does not exhibit benign overfitting in fixed dimensions. In particular, the estimated predict… ▽ More ``Benign overfitting'', the ability of certain algorithms to interpolate noisy training data and yet perform well out-of-sample, has been a topic of considerable recent interest. We show, using a fixed design setup, that an important class of predictors, kernel machines with translation-invariant kernels, does not exhibit benign overfitting in fixed dimensions. In particular, the estimated predictor does not converge to the ground truth with increasing sample size, for any non-zero regression function and any (even adaptive) bandwidth selection. To prove these results, we give exact expressions for the generalization error, and its decomposition in terms of an approximation error and an estimation error that elicits a trade-off based on the selection of the kernel bandwidth. Our results apply to commonly used translation-invariant kernels such as Gaussian, Laplace, and Cauchy. △ Less

Submitted 12 April, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

arXiv:2201.08082 [pdf, other]

Kernel Methods and Multi-layer Perceptrons Learn Linear Models in High Dimensions

Authors: Mojtaba Sahraee-Ardakan, Melikasadat Emami, Parthe Pandit, Sundeep Rangan, Alyson K. Fletcher

Abstract: Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of sampl… ▽ More Empirical observation of high dimensional phenomena, such as the double descent behaviour, has attracted a lot of interest in understanding classical techniques such as kernel methods, and their implications to explain generalization properties of neural networks. Many recent works analyze such models in a certain high-dimensional regime where the covariates are independent and the number of samples and the number of covariates grow at a fixed ratio (i.e. proportional asymptotics). In this work we show that for a large class of kernels, including the neural tangent kernel of fully connected networks, kernel methods can only perform as well as linear models in this regime. More surprisingly, when the data is generated by a kernel model where the relationship between input and the response could be very nonlinear, we show that linear models are in fact optimal, i.e. linear models achieve the minimum risk among all models, linear or nonlinear. These results suggest that more complex models for the data other than independent features are needed for high-dimensional analysis. △ Less

Submitted 20 January, 2022; originally announced January 2022.

arXiv:2108.03760 [pdf]

Symptom based Hierarchical Classification of Diabetes and Thyroid disorders using Fuzzy Cognitive Maps

Authors: Anand M. Shukla, Pooja D. Pandit, Vasudev M. Purandare, Anuradha Srinivasaraghavan

Abstract: Fuzzy Cognitive Maps (FCMs) are soft computing technique that follows an approach similar to human reasoning and human decision-making process, making them a valuable modeling and simulation methodology. Medical Decision Systems are complex systems consisting of many factors that may be complementary, contradictory, and competitive; these factors influence each other and determine the overall diag… ▽ More Fuzzy Cognitive Maps (FCMs) are soft computing technique that follows an approach similar to human reasoning and human decision-making process, making them a valuable modeling and simulation methodology. Medical Decision Systems are complex systems consisting of many factors that may be complementary, contradictory, and competitive; these factors influence each other and determine the overall diagnosis with a different degree. Thus, FCMs are suitable to model Medical Decision Support Systems. The proposed work therefore uses FCMs arranged in hierarchical structure to classify between Diabetes, Thyroid disorders and their subtypes. Subtypes include type 1 and type 2 for diabetes and hyperthyroidism and hypothyroidism for thyroid. △ Less

Submitted 8 August, 2021; originally announced August 2021.

arXiv:2101.07833 [pdf, ps, other]

Implicit Bias of Linear RNNs

Authors: Melikasadat Emami, Mojtaba Sahraee-Ardakan, Parthe Pandit, Sundeep Rangan, Alyson K. Fletcher

Abstract: Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, precise reasoning for this behavior is still unknown. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditional… ▽ More Contemporary wisdom based on empirical studies suggests that standard recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory. However, precise reasoning for this behavior is still unknown. This paper provides a rigorous explanation of this property in the special case of linear RNNs. Although this work is limited to linear RNNs, even these systems have traditionally been difficult to analyze due to their non-linear parameterization. Using recently-developed kernel regime analysis, our main result shows that linear RNNs learned from random initializations are functionally equivalent to a certain weighted 1D-convolutional network. Importantly, the weightings in the equivalent model cause an implicit bias to elements with smaller time lags in the convolution and hence, shorter memory. The degree of this bias depends on the variance of the transition kernel matrix at initialization and is related to the classic exploding and vanishing gradients problem. The theory is validated in both synthetic and real data experiments. △ Less

Submitted 19 January, 2021; originally announced January 2021.

Comments: 30 pages, 4 figures

arXiv:2005.05053 [pdf, other]

Low-Rank Nonlinear Decoding of $μ$-ECoG from the Primary Auditory Cortex

Authors: Melikasadat Emami, Mojtaba Sahraee-Ardakan, Parthe Pandit, Alyson K. Fletcher, Sundeep Rangan, Michael Trumpis, Brinnae Bent, Chia-Han Chiang, Jonathan Viventi

Abstract: This paper considers the problem of neural decoding from parallel neural measurements systems such as micro-electrocorticography ($μ$-ECoG). In systems with large numbers of array elements at very high sampling rates, the dimension of the raw measurement data may be large. Learning neural decoders for this high-dimensional data can be challenging, particularly when the number of training samples i… ▽ More This paper considers the problem of neural decoding from parallel neural measurements systems such as micro-electrocorticography ($μ$-ECoG). In systems with large numbers of array elements at very high sampling rates, the dimension of the raw measurement data may be large. Learning neural decoders for this high-dimensional data can be challenging, particularly when the number of training samples is limited. To address this challenge, this work presents a novel neural network decoder with a low-rank structure in the first hidden layer. The low-rank constraints dramatically reduce the number of parameters in the decoder while still enabling a rich class of nonlinear decoder maps. The low-rank decoder is illustrated on $μ$-ECoG data from the primary auditory cortex (A1) of awake rats. This decoding problem is particularly challenging due to the complexity of neural responses in the auditory cortex and the presence of confounding signals in awake animals. It is shown that the proposed low-rank decoder significantly outperforms models using standard dimensionality reduction techniques such as principal component analysis (PCA). △ Less

Submitted 6 May, 2020; originally announced May 2020.

Comments: 4 pages, 3 figures

arXiv:2005.00180 [pdf, other]

Generalization Error of Generalized Linear Models in High Dimensions

Authors: Melikasadat Emami, Mojtaba Sahraee-Ardakan, Parthe Pandit, Sundeep Rangan, Alyson K. Fletcher

Abstract: At the heart of machine learning lies the question of generalizability of learned rules over previously unseen data. While over-parameterized models based on neural networks are now ubiquitous in machine learning applications, our understanding of their generalization capabilities is incomplete. This task is made harder by the non-convexity of the underlying learning problems. We provide a general… ▽ More At the heart of machine learning lies the question of generalizability of learned rules over previously unseen data. While over-parameterized models based on neural networks are now ubiquitous in machine learning applications, our understanding of their generalization capabilities is incomplete. This task is made harder by the non-convexity of the underlying learning problems. We provide a general framework to characterize the asymptotic generalization error for single-layer neural networks (i.e., generalized linear models) with arbitrary non-linearities, making it applicable to regression as well as classification problems. This framework enables analyzing the effect of (i) over-parameterization and non-linearity during modeling; and (ii) choices of loss function, initialization, and regularizer during learning. Our model also captures mismatch between training and test distributions. As examples, we analyze a few special cases, namely linear regression and logistic regression. We are also able to rigorously and analytically explain the \emph{double descent} phenomenon in generalized linear models. △ Less

Submitted 30 April, 2020; originally announced May 2020.

Comments: 20 pages, 4 figures

arXiv:2001.09396 [pdf, other]

Inference in Multi-Layer Networks with Matrix-Valued Unknowns

Authors: Parthe Pandit, Mojtaba Sahraee-Ardakan, Sundeep Rangan, Philip Schniter, Alyson K. Fletcher

Abstract: We consider the problem of inferring the input and hidden variables of a stochastic multi-layer neural network from an observation of the output. The hidden variables in each layer are represented as matrices. This problem applies to signal recovery via deep generative prior models, multi-task and mixed regression and learning certain classes of two-layer neural networks. A unified approximation a… ▽ More We consider the problem of inferring the input and hidden variables of a stochastic multi-layer neural network from an observation of the output. The hidden variables in each layer are represented as matrices. This problem applies to signal recovery via deep generative prior models, multi-task and mixed regression and learning certain classes of two-layer neural networks. A unified approximation algorithm for both MAP and MMSE inference is proposed by extending a recently-developed Multi-Layer Vector Approximate Message Passing (ML-VAMP) algorithm to handle matrix-valued unknowns. It is shown that the performance of the proposed Multi-Layer Matrix VAMP (ML-Mat-VAMP) algorithm can be exactly predicted in a certain random large-system limit, where the dimensions $N\times d$ of the unknown quantities grow as $N\rightarrow\infty$ with $d$ fixed. In the two-layer neural-network learning problem, this scaling corresponds to the case where the number of input features and training samples grow to infinity but the number of hidden nodes stays fixed. The analysis enables a precise prediction of the parameter and test error of the learning. △ Less

Submitted 25 January, 2020; originally announced January 2020.

Comments: 3 figures, 6 pages (two-column) + Appendix. arXiv admin note: text overlap with arXiv:1911.03409

arXiv:1911.03409 [pdf, other]

Inference with Deep Generative Priors in High Dimensions

Authors: Parthe Pandit, Mojtaba Sahraee-Ardakan, Sundeep Rangan, Philip Schniter, Alyson K. Fletcher

Abstract: Deep generative priors offer powerful models for complex-structured data, such as images, audio, and text. Using these priors in inverse problems typically requires estimating the input and/or hidden signals in a multi-layer deep neural network from observation of its output. While these approaches have been successful in practice, rigorous performance analysis is complicated by the non-convex nat… ▽ More Deep generative priors offer powerful models for complex-structured data, such as images, audio, and text. Using these priors in inverse problems typically requires estimating the input and/or hidden signals in a multi-layer deep neural network from observation of its output. While these approaches have been successful in practice, rigorous performance analysis is complicated by the non-convex nature of the underlying optimization problems. This paper presents a novel algorithm, Multi-Layer Vector Approximate Message Passing (ML-VAMP), for inference in multi-layer stochastic neural networks. ML-VAMP can be configured to compute maximum a priori (MAP) or approximate minimum mean-squared error (MMSE) estimates for these networks. We show that the performance of ML-VAMP can be exactly predicted in a certain high-dimensional random limit. Furthermore, under certain conditions, ML-VAMP yields estimates that achieve the minimum (i.e., Bayes-optimal) MSE as predicted by the replica method. In this way, ML-VAMP provides a computationally efficient method for multi-layer inference with an exact performance characterization and testable conditions for optimality in the large-system limit. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Comments: 50 pages, double-spaced

arXiv:1903.09631 [pdf, other]

High-Dimensional Bernoulli Autoregressive Process with Long-Range Dependence

Authors: Parthe Pandit, Mojtaba Sahraee-Ardakan, Arash A. Amini, Sundeep Rangan, Alyson K. Fletcher

Abstract: We consider the problem of estimating the parameters of a multivariate Bernoulli process with auto-regressive feedback in the high-dimensional setting where the number of samples available is much less than the number of parameters. This problem arises in learning interconnections of networks of dynamical systems with spiking or binary-valued data. We allow the process to depend on its past up to… ▽ More We consider the problem of estimating the parameters of a multivariate Bernoulli process with auto-regressive feedback in the high-dimensional setting where the number of samples available is much less than the number of parameters. This problem arises in learning interconnections of networks of dynamical systems with spiking or binary-valued data. We allow the process to depend on its past up to a lag $p$, for a general $p \ge 1$, allowing for more realistic modeling in many applications. We propose and analyze an $\ell_1$-regularized maximum likelihood estimator (MLE) under the assumption that the parameter tensor is approximately sparse. Rigorous analysis of such estimators is made challenging by the dependent and non-Gaussian nature of the process as well as the presence of the nonlinearities and multi-level feedback. We derive precise upper bounds on the mean-squared estimation error in terms of the number of samples, dimensions of the process, the lag $p$ and other key statistical properties of the model. The ideas presented can be used in the high-dimensional analysis of regularized $M$-estimators for other sparse nonlinear and non-Gaussian processes with long-range dependence. △ Less

Submitted 19 March, 2019; originally announced March 2019.

Comments: To appear at AISTATS 2019 titled "Sparse Multivariate Bernoulli Processes in High Dimensions"

Journal ref: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019, Naha, Okinawa, Japan. PMLR: Volume 89

arXiv:1903.01293 [pdf, other]

Asymptotics of MAP Inference in Deep Networks

Authors: Parthe Pandit, Mojtaba Sahraee, Sundeep Rangan, Alyson K. Fletcher

Abstract: Deep generative priors are a powerful tool for reconstruction problems with complex data such as images and text. Inverse problems using such models require solving an inference problem of estimating the input and hidden units of the multi-layer network from its output. Maximum a priori (MAP) estimation is a widely-used inference method as it is straightforward to implement, and has been successfu… ▽ More Deep generative priors are a powerful tool for reconstruction problems with complex data such as images and text. Inverse problems using such models require solving an inference problem of estimating the input and hidden units of the multi-layer network from its output. Maximum a priori (MAP) estimation is a widely-used inference method as it is straightforward to implement, and has been successful in practice. However, rigorous analysis of MAP inference in multi-layer networks is difficult. This work considers a recently-developed method, multi-layer vector approximate message passing (ML-VAMP), to study MAP inference in deep networks. It is shown that the mean squared error of the ML-VAMP estimate can be exactly and rigorously characterized in a certain high-dimensional random limit. The proposed method thus provides a tractable method for MAP inference with exact performance guarantees. △ Less

Submitted 1 March, 2019; originally announced March 2019.

Comments: 11 pages. arXiv admin note: text overlap with arXiv:1706.06549

arXiv:1608.06627 [pdf]

Artificial Neural Networks for Detection of Malaria in RBCs

Authors: Purnima Pandit, A. Anand

Abstract: Malaria is one of the most common diseases caused by mosquitoes and is a great public health problem worldwide. Currently, for malaria diagnosis the standard technique is microscopic examination of a stained blood film. We propose use of Artificial Neural Networks (ANN) for the diagnosis of the disease in the red blood cell. For this purpose features / parameters are computed from the data obtaine… ▽ More Malaria is one of the most common diseases caused by mosquitoes and is a great public health problem worldwide. Currently, for malaria diagnosis the standard technique is microscopic examination of a stained blood film. We propose use of Artificial Neural Networks (ANN) for the diagnosis of the disease in the red blood cell. For this purpose features / parameters are computed from the data obtained by the digital holographic images of the blood cells and is given as input to ANN which classifies the cell as the infected one or otherwise. △ Less

Submitted 23 August, 2016; originally announced August 2016.

MSC Class: 62M45

arXiv:1607.02037 [pdf, other]

doi 10.1016/j.jmateco.2018.04.002

Refinement of the Equilibrium of Public Goods Games over Networks: Efficiency and Effort of Specialized Equilibria

Authors: Parthe Pandit, Ankur A. Kulkarni

Abstract: Recently Bramoulle and Kranton presented a model for the provision of public goods over a network and showed the existence of a class of Nash equilibria called specialized equilibria wherein some agents exert maximum effort while other agents free ride. We examine the efficiency, effort and cost of specialized equilibria in comparison to other equilibria. Our main results show that the welfare of… ▽ More Recently Bramoulle and Kranton presented a model for the provision of public goods over a network and showed the existence of a class of Nash equilibria called specialized equilibria wherein some agents exert maximum effort while other agents free ride. We examine the efficiency, effort and cost of specialized equilibria in comparison to other equilibria. Our main results show that the welfare of a particular specialized equilibrium approaches the maximum welfare amongst all equilibria as the concavity of the benefit function tends to unity. For forest networks a similar result also holds as the concavity approaches zero. Moreover, without any such concavity conditions, there exists for any network a specialized equilibrium that requires the maximum weighted effort amongst all equilibria. When the network is a forest, a specialized equilibrium also incurs the minimum total cost amongst all equilibria. For well-covered forest networks we show that all welfare maximizing equilibria are specialized and all equilibria incur the same total cost. Thus we argue that specialized equilibria may be considered as a refinement of the equilibrium of the public goods game. We show several results on the structure and efficiency of equilibria that highlight the role of dependants in the network. △ Less

Submitted 23 January, 2022; v1 submitted 7 July, 2016; originally announced July 2016.

MSC Class: 91A43; 05C57; 91D30; 90C35

Journal ref: Journal of Mathematical Economics, Available online 16 April 2018

arXiv:1603.05075 [pdf, ps, other]

doi 10.1016/j.dam.2018.02.022

A linear complementarity based characterization of the weighted independence number and the independent domination number in graphs

Authors: Parthe Pandit, Ankur A. Kulkarni

Abstract: The linear complementarity problem is a continuous optimization problem that generalizes convex quadratic programming, Nash equilibria of bimatrix games and several such problems. This paper presents a continuous optimization formulation for the weighted independence number of a graph by characterizing it as the maximum weighted $\ell_1$ norm over the solution set of a linear complementarity probl… ▽ More The linear complementarity problem is a continuous optimization problem that generalizes convex quadratic programming, Nash equilibria of bimatrix games and several such problems. This paper presents a continuous optimization formulation for the weighted independence number of a graph by characterizing it as the maximum weighted $\ell_1$ norm over the solution set of a linear complementarity problem (LCP). The minimum $\ell_1$ norm of solutions of this LCP is a lower bound on the independent domination number of the graph. Unlike the case of the maximum $\ell_1$ norm, this lower bound is in general weak, but we show it to be tight if the graph is a forest. Using methods from the theory of LCPs, we obtain a few graph theoretic results. In particular, we provide a stronger variant of the Lovász theta of a graph. We then provide sufficient conditions for a graph to be well-covered, i.e., for all maximal independent sets to also be maximum. This condition is also shown to be necessary for well-coveredness if the graph is a forest. Finally, the reduction of the maximum independent set problem to a linear program with (linear) complementarity constraints (LPCC) shows that LPCCs are hard to approximate. △ Less

Submitted 16 March, 2016; originally announced March 2016.

Comments: 16 pages

MSC Class: 05C69; 68R10; 90C33; 90C27; 90C26

Journal ref: Discrete Applied Mathematics, Volume 244, 31 July 2018, Pages 155-169

Showing 1–21 of 21 results for author: Pandit, P