Search | arXiv e-print repository

Stochastic Variance-Reduced Majorization-Minimization Algorithms

Authors: Duy-Nhat Phan, Sedi Bartz, Nilabja Guha, Hung M. Phan

Abstract: We study a class of nonconvex nonsmooth optimization problems in which the objective is a sum of two functions: One function is the average of a large number of differentiable functions, while the other function is proper, lower semicontinuous and has a surrogate function that satisfies standard assumptions. Such problems arise in machine learning and regularized empirical risk minimization applic… ▽ More We study a class of nonconvex nonsmooth optimization problems in which the objective is a sum of two functions: One function is the average of a large number of differentiable functions, while the other function is proper, lower semicontinuous and has a surrogate function that satisfies standard assumptions. Such problems arise in machine learning and regularized empirical risk minimization applications. However, nonconvexity and the large-sum structure are challenging for the design of new algorithms. Consequently, effective algorithms for such scenarios are scarce. We introduce and study three stochastic variance-reduced majorization-minimization (MM) algorithms, combining the general MM principle with new variance-reduced techniques. We provide almost surely subsequential convergence of the generated sequence to a stationary point. We further show that our algorithms possess the best-known complexity bounds in terms of gradient evaluations. We demonstrate the effectiveness of our algorithms on sparse binary classification problems, sparse multi-class logistic regressions, and neural networks by employing several widely-used and publicly available data sets. △ Less

Submitted 11 May, 2023; originally announced May 2023.

MSC Class: 90C26; 65K05

arXiv:2102.12938 [pdf, other]

On Posterior consistency of Bayesian Changepoint models

Authors: Nilabja Guha, Jyotishka Datta

Abstract: While there have been a lot of recent developments in the context of Bayesian model selection and variable selection for high dimensional linear models, there is not much work in the presence of change point in literature, unlike the frequentist counterpart. We consider a hierarchical Bayesian linear model where the active set of covariates that affects the observations through a mean model can va… ▽ More While there have been a lot of recent developments in the context of Bayesian model selection and variable selection for high dimensional linear models, there is not much work in the presence of change point in literature, unlike the frequentist counterpart. We consider a hierarchical Bayesian linear model where the active set of covariates that affects the observations through a mean model can vary between different time segments. Such structure may arise in social sciences/ economic sciences, such as sudden change of house price based on external economic factor, crime rate changes based on social and built-environment factors, and others. Using an appropriate adaptive prior, we outline the development of a hierarchical Bayesian methodology that can select the true change point as well as the true covariates, with high probability. We provide the first detailed theoretical analysis for posterior consistency with or without covariates, under suitable conditions. Gibbs sampling techniques provide an efficient computational strategy. We also consider small sample simulation study as well as application to crime forecasting applications. △ Less

Submitted 25 February, 2021; originally announced February 2021.

arXiv:2010.14638 [pdf, ps, other]

Bayesian Variable Selection in Multivariate Nonlinear Regression with Graph Structures

Authors: Yabo Niu, Nilabja Guha, Debkumar De, Anindya Bhadra, Veerabhadran Baladandayuthapani, Bani K. Mallick

Abstract: Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices. We develop a Bayesian method to incorporate covariate information in this GGMs setup in a nonlinear seemingly unrelated regression framework. We propose a joint predictor and graph selection model and develop an efficient collapsed Gibbs sampler algorithm to… ▽ More Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices. We develop a Bayesian method to incorporate covariate information in this GGMs setup in a nonlinear seemingly unrelated regression framework. We propose a joint predictor and graph selection model and develop an efficient collapsed Gibbs sampler algorithm to search the joint model space. Furthermore, we investigate its theoretical variable selection properties. We demonstrate our method on a variety of simulated data, concluding with a real data set from the TCPA project. △ Less

Submitted 30 July, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

arXiv:2006.14734 [pdf, other]

Stochastic Approximation Algorithm for Estimating Mixing Distribution for Dependent Observations

Authors: Nilabja Guha, Anindya Roy

Abstract: Estimating the mixing density of a mixture distribution remains an interesting problem in statistics literature. Using a stochastic approximation method, Newton and Zhang (1999) introduced a fast recursive algorithm for estimating the mixing density of a mixture. Under suitably chosen weights the stochastic approximation estimator converges to the true solution. In Tokdar et. al. (2009) the consis… ▽ More Estimating the mixing density of a mixture distribution remains an interesting problem in statistics literature. Using a stochastic approximation method, Newton and Zhang (1999) introduced a fast recursive algorithm for estimating the mixing density of a mixture. Under suitably chosen weights the stochastic approximation estimator converges to the true solution. In Tokdar et. al. (2009) the consistency of this recursive estimation method was established. However, the proof of consistency of the resulting estimator used independence among observations as an assumption. Here, we extend the investigation of performance of Newton's algorithm to several dependent scenarios. We prove that the original algorithm under certain conditions remains consistent even when the observations are arising from a weakly dependent stationary process with the target mixture as the marginal density. We show consistency under a decay condition on the dependence among observations when the dependence is characterized by a quantity similar to mutual information between the observations. △ Less

Submitted 26 March, 2022; v1 submitted 25 June, 2020; originally announced June 2020.

arXiv:1806.05832 [pdf, other]

Dynamic Data-driven Bayesian GMsFEM

Authors: Siu Wun Cheung, Nilabja Guha

Abstract: In this paper, we propose a Bayesian approach for multiscale problems with the availability of dynamic observational data. Our method selects important degrees of freedom probabilistically in a Generalized multiscale finite element method framework. Due to scale disparity in many multiscale applications, computational models can not resolve all scales. Dominant modes in the Generalized Multiscale… ▽ More In this paper, we propose a Bayesian approach for multiscale problems with the availability of dynamic observational data. Our method selects important degrees of freedom probabilistically in a Generalized multiscale finite element method framework. Due to scale disparity in many multiscale applications, computational models can not resolve all scales. Dominant modes in the Generalized Multiscale Finite Element Method are used as "permanent" basis functions, which we use to compute an inexpensive multiscale solution and the associated uncertainties. Through our Bayesian framework, we can model approximate solutions by selecting the unresolved scales probabilistically. We consider parabolic equations in heterogeneous media. The temporal domain is partitioned into subintervals. Using residual information and given dynamic data, we design appropriate prior distribution for modeling missing subgrid information. The likelihood is designed to minimize the residual in the underlying PDE problem and the mismatch of observational data. Using the resultant posterior distribution, the sampling process identifies important degrees of freedom beyond permanent basis functions. The method adds important degrees of freedom in resolving subgrid information and ensuring the accuracy of the observations. △ Less

Submitted 15 June, 2018; originally announced June 2018.

arXiv:1702.02973 [pdf, other]

Bayesian Multiscale Finite Element Methods. Modeling missing subgrid information probabilistically

Authors: Y. Efendiev, W. T. Leung, S. W. Cheung, N. Guha, V. H. Hoang, B. Mallick

Abstract: In this paper, we develop a Bayesian multiscale approach based on a multiscale finite element method. Because of scale disparity in many multiscale applications, computational models can not resolve all scales. Various subgrid models are proposed to represent un-resolved scales. Here, we consider a probabilistic approach for modeling un-resolved scales using the Multiscale Finite Element Method (c… ▽ More In this paper, we develop a Bayesian multiscale approach based on a multiscale finite element method. Because of scale disparity in many multiscale applications, computational models can not resolve all scales. Various subgrid models are proposed to represent un-resolved scales. Here, we consider a probabilistic approach for modeling un-resolved scales using the Multiscale Finite Element Method (cf., [1, 2]). By representing dominant modes using the Generalized Multiscale Finite Element, we propose a Bayesian framework, which provides multiple inexpensive (computable) solutions for a deterministic problem. These approximate probabilistic solutions may not be very close to the exact solutions and, thus, many realizations are needed. In this way, we obtain a rigorous probabilistic description of approximate solutions. In the paper, we consider parabolic and wave equations in heterogeneous media. In each time interval, the domain is divided into subregions. Using residual information, we design appropriate prior and posterior distributions. The likelihood consists of the residual minimization. To sample from the resulting posterior distribution, we consider several sampling strategies. The sampling involves identifying important regions and important degrees of freedom beyond permanent basis functions, which are used in residual computation. Numerical results are presented. We consider two sampling algorithms. The first algorithm uses sequential sampling and is inexpensive. In the second algorithm, we perform full sampling using the Gibbs sampling algorithm, which is more accurate compared to the sequential sampling. The main novel ingredients of our approach consist of: defining appropriate permanent basis functions and the corresponding residual; setting up a proper posterior distribution; and sampling the posteriors. △ Less

Submitted 9 February, 2017; originally announced February 2017.

Showing 1–6 of 6 results for author: Guha, N