Search | arXiv e-print repository

Generalization error of min-norm interpolators in transfer learning

Authors: Yanke Song, Sohom Bhattacharya, Pragya Sur

Abstract: This paper establishes the generalization error of pooled min-$\ell_2$-norm interpolation in transfer learning where data from diverse distributions are available. Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms. Previous work characterized their out-of-distribution risk when samples from the test distribution are unavailable during trai… ▽ More This paper establishes the generalization error of pooled min-$\ell_2$-norm interpolation in transfer learning where data from diverse distributions are available. Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms. Previous work characterized their out-of-distribution risk when samples from the test distribution are unavailable during training. However, in many applications, a limited amount of test data may be available during training, yet properties of min-norm interpolation in this setting are not well-understood. We address this gap by characterizing the bias and variance of pooled min-$\ell_2$-norm interpolation under covariate and model shifts. The pooled interpolator captures both early fusion and a form of intermediate fusion. Our results have several implications: under model shift, for low signal-to-noise ratio (SNR), adding data always hurts. For higher SNR, transfer learning helps as long as the shift-to-signal (SSR) ratio lies below a threshold that we characterize explicitly. By consistently estimating these ratios, we provide a data-driven method to determine: (i) when the pooled interpolator outperforms the target-based interpolator, and (ii) the optimal number of target samples that minimizes the generalization error. Under covariate shift, if the source sample size is small relative to the dimension, heterogeneity between between domains improves the risk, and vice versa. We establish a novel anisotropic local law to achieve these characterizations, which may be of independent interest in random matrix theory. We supplement our theoretical characterizations with comprehensive simulations that demonstrate the finite-sample efficacy of our results. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 53 pages, 2 figures

arXiv:2308.14988 [pdf, other]

Inferences on Mixing Probabilities and Ranking in Mixed-Membership Models

Authors: Sohom Bhattacharya, Jianqing Fan, Jikai Hou

Abstract: Network data is prevalent in numerous big data applications including economics and health networks where it is of prime importance to understand the latent structure of network. In this paper, we model the network using the Degree-Corrected Mixed Membership (DCMM) model. In DCMM model, for each node $i$, there exists a membership vector… ▽ More Network data is prevalent in numerous big data applications including economics and health networks where it is of prime importance to understand the latent structure of network. In this paper, we model the network using the Degree-Corrected Mixed Membership (DCMM) model. In DCMM model, for each node $i$, there exists a membership vector $\boldsymbolπ_ i = (\boldsymbolπ_i(1), \boldsymbolπ_i(2),\ldots, \boldsymbolπ_i(K))$, where $\boldsymbolπ_i(k)$ denotes the weight that node $i$ puts in community $k$. We derive novel finite-sample expansion for the $\boldsymbolπ_i(k)$s which allows us to obtain asymptotic distributions and confidence interval of the membership mixing probabilities and other related population quantities. This fills an important gap on uncertainty quantification on the membership profile. We further develop a ranking scheme of the vertices based on the membership mixing probabilities on certain communities and perform relevant statistical inferences. A multiplier bootstrap method is proposed for ranking inference of individual member's profile with respect to a given community. The validity of our theoretical results is further demonstrated by via numerical experiments in both real and synthetic data examples. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.09104 [pdf, other]

A comprehensive study of spike and slab shrinkage priors for structurally sparse Bayesian neural networks

Authors: Sanket Jantre, Shrijita Bhattacharya, Tapabrata Maiti

Abstract: Network complexity and computational efficiency have become increasingly significant aspects of deep learning. Sparse deep learning addresses these challenges by recovering a sparse representation of the underlying target function by reducing heavily over-parameterized deep neural networks. Specifically, deep neural architectures compressed via structured sparsity (e.g. node sparsity) provide low… ▽ More Network complexity and computational efficiency have become increasingly significant aspects of deep learning. Sparse deep learning addresses these challenges by recovering a sparse representation of the underlying target function by reducing heavily over-parameterized deep neural networks. Specifically, deep neural architectures compressed via structured sparsity (e.g. node sparsity) provide low latency inference, higher data throughput, and reduced energy consumption. In this paper, we explore two well-established shrinkage techniques, Lasso and Horseshoe, for model compression in Bayesian neural networks. To this end, we propose structurally sparse Bayesian neural networks which systematically prune excessive nodes with (i) Spike-and-Slab Group Lasso (SS-GL), and (ii) Spike-and-Slab Group Horseshoe (SS-GHS) priors, and develop computationally tractable variational inference including continuous relaxation of Bernoulli variables. We establish the contraction rates of the variational posterior of our proposed models as a function of the network topology, layer-wise node cardinalities, and bounds on the network weights. We empirically demonstrate the competitive performance of our models compared to the baseline models in prediction accuracy, model compression, and inference latency. △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2302.05851 [pdf, other]

Deep Neural Networks for Nonparametric Interaction Models with Diverging Dimension

Authors: Sohom Bhattacharya, Jianqing Fan, Debarghya Mukherjee

Abstract: Deep neural networks have achieved tremendous success due to their representation power and adaptation to low-dimensional structures. Their potential for estimating structured regression functions has been recently established in the literature. However, most of the studies require the input dimension to be fixed and consequently ignore the effect of dimension on the rate of convergence and hamper… ▽ More Deep neural networks have achieved tremendous success due to their representation power and adaptation to low-dimensional structures. Their potential for estimating structured regression functions has been recently established in the literature. However, most of the studies require the input dimension to be fixed and consequently ignore the effect of dimension on the rate of convergence and hamper their applications to modern big data with high dimensionality. In this paper, we bridge this gap by analyzing a $k^{th}$ order nonparametric interaction model in both growing dimension scenarios ($d$ grows with $n$ but at a slower rate) and in high dimension ($d \gtrsim n$). In the latter case, sparsity assumptions and associated regularization are required in order to obtain optimal rates of convergence. A new challenge in diverging dimension setting is in calculation mean-square error, the covariance terms among estimated additive components are an order of magnitude larger than those of the variances and they can deteriorate statistical properties without proper care. We introduce a critical debiasing technique to amend the problem. We show that under certain standard assumptions, debiased deep neural networks achieve a minimax optimal rate both in terms of $(n, d)$. Our proof techniques rely crucially on a novel debiasing technique that makes the covariances of additive components negligible in the mean-square error calculation. In addition, we establish the matching lower bounds. △ Less

Submitted 11 February, 2023; originally announced February 2023.

Comments: 46 pages, 2 figures

arXiv:2211.13478 [pdf, other]

A New Spatio-Temporal Model Exploiting Hamiltonian Equations

Authors: Satyaki Mazumder, Sayantan Banerjee, Sourabh Bhattacharya

Abstract: The solutions of Hamiltonian equations are known to describe the underlying phase space of the mechanical system. Hamiltonian Monte Carlo is the sole use of the properties of solutions to the Hamiltonian equations in Bayesian statistics. In this article, we propose a novel spatio-temporal model using a strategic modification of the Hamiltonian equations, incorporating appropriate stochasticity via… ▽ More The solutions of Hamiltonian equations are known to describe the underlying phase space of the mechanical system. Hamiltonian Monte Carlo is the sole use of the properties of solutions to the Hamiltonian equations in Bayesian statistics. In this article, we propose a novel spatio-temporal model using a strategic modification of the Hamiltonian equations, incorporating appropriate stochasticity via Gaussian processes. The resultant sptaio-temporal process, continuously varying with time, turns out to be nonparametric, nonstationary, nonseparable and non-Gaussian. Additionally, as the spatio-temporal lag goes to infinity, the lagged correlations converge to zero. We investigate the theoretical properties of the new spatio-temporal process, including its continuity and smoothness properties. In the Bayesian paradigm, we derive methods for complete Bayesian inference using MCMC techniques. The performance of our method has been compared with that of non-stationary Gaussian process (GP) using two simulation studies, where our method shows a significant improvement over the non-stationary GP. Further, application of our new model to two real data sets revealed encouraging performance. △ Less

Submitted 23 November, 2023; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: An updated version, demonstrating superiority of our ideas over existing ones

arXiv:2206.09233 [pdf, other]

IID Sampling from Posterior Dirichlet Process Mixtures

Authors: Sourabh Bhattacharya

Abstract: The influence of Dirichlet process mixture is ubiquitous in the Bayesian nonparametrics literature. But sampling from its posterior distribution remains a challenge, despite the advent of various Markov chain Monte Carlo methods. The primary challenge is the infinite-dimensional setup, and even if the infinite-dimensional random measure is integrated out, high-dimensionality and discreteness still… ▽ More The influence of Dirichlet process mixture is ubiquitous in the Bayesian nonparametrics literature. But sampling from its posterior distribution remains a challenge, despite the advent of various Markov chain Monte Carlo methods. The primary challenge is the infinite-dimensional setup, and even if the infinite-dimensional random measure is integrated out, high-dimensionality and discreteness still remain difficult issues to deal with. In this article, exploiting the key ideas proposed in Bhattacharya (2021b), we propose a novel methodology for drawing iid realizations from posteriors of Dirichlet process mixtures. We focus in particular on the more general and flexible model of Bhattacharya (2008), so that the methods developed here are simply applicable to the traditional Dirichlet process mixture. We illustrate our ideas on the well-known enzyme, acidity and the galaxy datasets, which are usually considered benchmark datasets for mixture applications. Generating 10, 000 iid realizations from the Dirichlet process mixture posterior of Bhattacharya (2008) given these datasets took 19 minutes, 8 minutes and 5 minutes, respectively, in our parallel implementation. △ Less

Submitted 18 June, 2022; originally announced June 2022.

arXiv:2206.01446 [pdf, other]

Modified Bivariate Weibull Distribution Allowing Instantaneous and Early Failures

Authors: Sumangal Bhattacharya, Ishapathik Das, Muralidharan Kunnummal

Abstract: In reliability and life data analysis, the Weibull distribution is widely used to accommodate more data characteristics by changing the values of the parameters. We frequently observe many zeros or close to zero data points in reliability and life testing experiments. We call this phenomenon a nearly instantaneous failure. Many researchers modified the commonly used univariate parametric models su… ▽ More In reliability and life data analysis, the Weibull distribution is widely used to accommodate more data characteristics by changing the values of the parameters. We frequently observe many zeros or close to zero data points in reliability and life testing experiments. We call this phenomenon a nearly instantaneous failure. Many researchers modified the commonly used univariate parametric models such as exponential, gamma, Weibull, and log-normal distributions to appropriately fit such data having instantaneous failure observations. Researchers also find bivariate correlated life testing data having many observations near a particular point while the remaining observations follow some continuous distribution. This situation defines as responses having early failures for such bivariate responses. If the point is the origin, then we call the situation a nearly instantaneous failure for the responses. Here, we propose a modified bivariate Weibull distribution that allows early failure by combining bivariate uniform distribution and bivariate Weibull distribution. The bivariate Weibull distribution is constructed using a 2-dimensional copula, assuming the marginal distributions as two parametric Weibull distributions. We derive some properties of that modified bivariate Weibull distribution, mainly the joint probability density function, the survival (reliability) function, and the hazard (failure rate) function. The model's unknown parameters are estimated using the Maximum Likelihood Estimation (MLE) technique combined with a machine learning clustering algorithm. Numerical examples are provided using simulated data to illustrate and test the performance of the proposed methodologies. The method is also applied to real data and compared with existing approaches to model such data in the literature. △ Less

Submitted 3 June, 2022; originally announced June 2022.

Comments: 27 pages, 6 fgures, 7 Tables

arXiv:2206.00794 [pdf, other]

Sequential Bayesian Neural Subnetwork Ensembles

Authors: Sanket Jantre, Sandeep Madireddy, Shrijita Bhattacharya, Tapabrata Maiti, Prasanna Balaprakash

Abstract: Deep neural network ensembles that appeal to model diversity have been used successfully to improve predictive performance and model robustness in several applications. Whereas, it has recently been shown that sparse subnetworks of dense models can match the performance of their dense counterparts and increase their robustness while effectively decreasing the model complexity. However, most ensemb… ▽ More Deep neural network ensembles that appeal to model diversity have been used successfully to improve predictive performance and model robustness in several applications. Whereas, it has recently been shown that sparse subnetworks of dense models can match the performance of their dense counterparts and increase their robustness while effectively decreasing the model complexity. However, most ensembling techniques require multiple parallel and costly evaluations and have been proposed primarily with deterministic models, whereas sparsity induction has been mostly done through ad-hoc pruning. We propose sequential ensembling of dynamic Bayesian neural subnetworks that systematically reduce model complexity through sparsity-inducing priors and generate diverse ensembles in a single forward pass of the model. The ensembling strategy consists of an exploration phase that finds high-performing regions of the parameter space and multiple exploitation phases that effectively exploit the compactness of the sparse model to quickly converge to different minima in the energy landscape corresponding to high-performing subnetworks yielding diverse ensembles. We empirically demonstrate that our proposed approach surpasses the baselines of the dense frequentist and Bayesian ensemble models in prediction accuracy, uncertainty estimation, and out-of-distribution (OoD) robustness on CIFAR10, CIFAR100 datasets, and their out-of-distribution variants: CIFAR10-C, CIFAR100-C induced by corruptions. Furthermore, we found that our approach produced the most diverse ensembles compared to the approaches with a single forward pass and even compared to the approaches with multiple forward passes in some cases. △ Less

Submitted 1 June, 2022; originally announced June 2022.

arXiv:2112.07939 [pdf, other]

IID Sampling from Doubly Intractable Distributions

Authors: Sourabh Bhattacharya

Abstract: Intractable posterior distributions of parameters with intractable normalizing constants depending upon the parameters are known as doubly intractable posterior distributions. The terminology itself indicates that obtaining Bayesian inference from such posteriors is doubly difficult compared to traditional intractable posteriors where the normalizing constants are tractable and admit traditional M… ▽ More Intractable posterior distributions of parameters with intractable normalizing constants depending upon the parameters are known as doubly intractable posterior distributions. The terminology itself indicates that obtaining Bayesian inference from such posteriors is doubly difficult compared to traditional intractable posteriors where the normalizing constants are tractable and admit traditional Markov Chain Monte Carlo (MCMC) solutions. As can be anticipated, a plethora of MCMC-based methods have originated in the literature to deal with doubly intractable distributions. Yet, it remains very much unclear if any of the methods can satisfactorily sample from such posteriors, particularly in high-dimensional setups. In this article, we consider efficient Monte Carlo and importance sampling approximations of the intractable normalizing constant for a few values of the parameters, and Gaussian process interpolations for the remaining values of the parameters, using the approximations. We then incorporate this strategy within the exact iid sampling framework developed in Bhattacharya (2021a) and Bhattacharya (2021b), and illustrate the methodology with simulation experiments comprising a two-dimensional normal-gamma posterior, a two-dimensional Ising model posterior, a two-dimensional Strauss process posterior and a 100-dimensional autologistic model posterior. In each case we demonstrate great accuracy of our methodology, which is also computationally extremely efficient, often taking only a few minutes for generating 10, 000 iid realizations on 80 processors. △ Less

Submitted 15 December, 2021; originally announced December 2021.

arXiv:2111.12664 [pdf, other]

MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

Authors: Siladittya Manna, Umapada Pal, Saumik Bhattacharya

Abstract: Self-supervised contrastive learning frameworks have progressed rapidly over the last few years. In this paper, we propose a novel mutual information optimization-based loss function for contrastive learning. We model our pre-training task as a binary classification problem to induce an implicit contrastive effect and predict whether a pair is positive or negative. We further improve the näive los… ▽ More Self-supervised contrastive learning frameworks have progressed rapidly over the last few years. In this paper, we propose a novel mutual information optimization-based loss function for contrastive learning. We model our pre-training task as a binary classification problem to induce an implicit contrastive effect and predict whether a pair is positive or negative. We further improve the näive loss function using the Majorize-Minimizer principle and such improvement helps us to track the problem mathematically. Unlike the existing methods, the proposed loss function optimizes the mutual information in both positive and negative pairs. We also present a closed-form expression for the parameter gradient flow and compare the behavior of the proposed loss function using its Hessian eigen-spectrum to analytically study the convergence of SSL frameworks. The proposed method outperforms the SOTA contrastive self-supervised frameworks on benchmark datasets like CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet. After 200 epochs of pre-training with ResNet-18 as the backbone, the proposed model achieves an accuracy of 86.2\%, 58.18\%, 77.49\%, and 30.87\% on CIFAR-10, CIFAR-100, STL-10, and Tiny-ImageNet datasets, respectively, and surpasses the SOTA contrastive baseline by 1.23\%, 3.57\%, 2.00\%, and 0.33\%, respectively. △ Less

Submitted 9 March, 2023; v1 submitted 24 November, 2021; originally announced November 2021.

arXiv:2109.12633 [pdf, other]

IID Sampling from Intractable Multimodal and Variable-Dimensional Distributions

Authors: Sourabh Bhattacharya

Abstract: Bhattacharya (2021b) has introduced a novel methodology for generating iid realizations from any target distribution on the Euclidean space, irrespective of dimensionality. In this article, our purpose is two-fold. We first extend the method for obtaining iid realizations from general multimodal distributions, and illustrate with a mixture of two 50-dimensional normal distributions. Then we extend… ▽ More Bhattacharya (2021b) has introduced a novel methodology for generating iid realizations from any target distribution on the Euclidean space, irrespective of dimensionality. In this article, our purpose is two-fold. We first extend the method for obtaining iid realizations from general multimodal distributions, and illustrate with a mixture of two 50-dimensional normal distributions. Then we extend the iid sampling method for fixed-dimensional distributions to variable-dimensional situations and illustrate with a variable-dimensional normal mixture modeling of the well-known "acidity data", with further demonstration of the applicability of the iid sampling method developed for multimodal distributions. △ Less

Submitted 15 December, 2021; v1 submitted 26 September, 2021; originally announced September 2021.

Comments: An updated version after fixing some typos in the paper and code

arXiv:2109.01548 [pdf, other]

Variational Bayes algorithm and posterior consistency of Ising model parameter estimation

Authors: Minwoo Kim, Shrijita Bhattacharya, Tapabrata Maiti

Abstract: Ising models originated in statistical physics and are widely used in modeling spatial data and computer vision problems. However, statistical inference of this model remains challenging due to intractable nature of the normalizing constant in the likelihood. Here, we use a pseudo-likelihood instead to study the Bayesian estimation of two-parameter, inverse temperature, and magnetization, Ising mo… ▽ More Ising models originated in statistical physics and are widely used in modeling spatial data and computer vision problems. However, statistical inference of this model remains challenging due to intractable nature of the normalizing constant in the likelihood. Here, we use a pseudo-likelihood instead to study the Bayesian estimation of two-parameter, inverse temperature, and magnetization, Ising model with a fully specified coupling matrix. We develop a computationally efficient variational Bayes procedure for model estimation. Under the Gaussian mean-field variational family, we derive posterior contraction rates of the variational posterior obtained under the pseudo-likelihood. We also discuss the loss incurred due to variational posterior over true posterior for the pseudo-likelihood approach. Extensive simulation studies validate the efficacy of mean-field Gaussian and bivariate Gaussian families as the possible choices of the variational family for inference of Ising model parameters. △ Less

Submitted 3 September, 2021; originally announced September 2021.

Comments: 26 pages

arXiv:2108.11000 [pdf, other]

Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical Guarantees and Implementation Details

Authors: Sanket Jantre, Shrijita Bhattacharya, Tapabrata Maiti

Abstract: Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. Although several works have studied theoretical and numerical properties of sparse neural architectures, they have primarily focused on the edge selection. Sparsity through edge selection might be intuitively appealing; however, it does not necessarily reduce the structural complexity of a… ▽ More Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. Although several works have studied theoretical and numerical properties of sparse neural architectures, they have primarily focused on the edge selection. Sparsity through edge selection might be intuitively appealing; however, it does not necessarily reduce the structural complexity of a network. Instead pruning excessive nodes leads to a structurally sparse network with significant computational speedup during inference. To this end, we propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for automatic node selection during training. The use of spike-and-slab prior alleviates the need of an ad-hoc thresholding rule for pruning. In addition, we adopt a variational Bayes approach to circumvent the computational challenges of traditional Markov Chain Monte Carlo (MCMC) implementation. In the context of node selection, we establish the fundamental result of variational posterior consistency together with the characterization of prior parameters. In contrast to the previous works, our theoretical development relaxes the assumptions of the equal number of nodes and uniform bounds on all network weights, thereby accommodating sparse networks with layer-dependent node structures or coefficient bounds. With a layer-wise characterization of prior inclusion probabilities, we discuss the optimal contraction rates of the variational posterior. We empirically demonstrate that our proposed approach outperforms the edge selection method in computational complexity with similar or better predictive performance. Our experimental evidence further substantiates that our theoretical work facilitates layer-wise optimal node recovery. △ Less

Submitted 8 July, 2022; v1 submitted 24 August, 2021; originally announced August 2021.

arXiv:2107.05956 [pdf, other]

IID Sampling from Intractable Distributions

Authors: Sourabh Bhattacharya

Abstract: We propose a novel methodology for drawing iid realizations from any target distribution on the Euclidean space with arbitrary dimension. No assumption of compact support is necessary for the validity of our theory and method. Our idea is to construct an appropriate infinite sequence of concentric closed ellipsoids, represent the target distribution as an infinite mixture on the central ellipsoid… ▽ More We propose a novel methodology for drawing iid realizations from any target distribution on the Euclidean space with arbitrary dimension. No assumption of compact support is necessary for the validity of our theory and method. Our idea is to construct an appropriate infinite sequence of concentric closed ellipsoids, represent the target distribution as an infinite mixture on the central ellipsoid and the ellipsoidal annuli, and to construct efficient perfect samplers for the mixture components. In contrast with most of the existing works on perfect sampling, ours is not only a theoretically valid method, it is practically applicable to all target distributions on any dimensional Euclidean space and very much amenable to parallel computation. We validate the practicality and usefulness of our methodology by generating 10000 iid realizations from the standard distributions such as normal, Student's t with 5 degrees of freedom and Cauchy, for dimensions d = 1, 5, 10, 50, 100, as well as from a 50-dimensional mixture normal distribution. The implementation time in all the cases are very reasonable, and often less than a minute in our parallel implementation. The results turned out to be highly accurate. We also apply our method to draw 10000 iid realizations from the posterior distributions associated with the well-known Challenger data, a Salmonella data and the 160-dimensional challenging spatial example of the radionuclide count data on Rongelap Island. Again, we are able to obtain quite encouraging results with very reasonable computing time. △ Less

Submitted 15 December, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

Comments: An updated version with some typos in the paper and code fixed. Now the iid and TMCMC results are in close agreement for the Challenger and the Salmonella examples

arXiv:2107.01480 [pdf]

Assessing contribution of treatment phases through tip** point analyses via counterfactual elicitation using rank preserving structural failure time models

Authors: Sudipta Bhattacharya, Jyotirmoy Dey

Abstract: This article provides a novel approach to assess the importance of specific treatment phases within a treatment regimen through tip** point analyses (TPA) of a time-to-event endpoint using rank-preserving-structural-failure-time (RPSFT) modelling. In oncology clinical research, an experimental treatment is often added to the standard of care therapy in multiple treatment phases to improve patien… ▽ More This article provides a novel approach to assess the importance of specific treatment phases within a treatment regimen through tip** point analyses (TPA) of a time-to-event endpoint using rank-preserving-structural-failure-time (RPSFT) modelling. In oncology clinical research, an experimental treatment is often added to the standard of care therapy in multiple treatment phases to improve patient outcomes. When the resulting new regimen provides a meaningful benefit over standard of care, gaining insights into the contribution of each treatment phase becomes important to properly guide clinical practice. New statistical approaches are needed since traditional methods are inadequate in answering such questions. RPSFT modelling is an approach for causal inference, typically used to adjust for treatment switching in randomized clinical trials with time-to-event endpoints. A tip**-point analysis is commonly used in situations where a statistically significant treatment effect is suspected to be an artifact of missing or unobserved data rather than a real treatment difference. The methodology proposed in this article is an amalgamation of these two ideas to investigate the contribution of a specific component of a regimen comprising multiple treatment phases. We provide different variants of the method and construct indices of contribution of a treatment phase to the overall benefit of a regimen that facilitates interpretation of results. The proposed approaches are illustrated with findings from a recently concluded, real-life phase 3 cancer clinical trial. We conclude with several considerations and recommendations for practical implementation of this new methodology. △ Less

Submitted 3 July, 2021; originally announced July 2021.

Comments: 38 pages, 6 figures, 3 tables. arXiv admin note: text overlap with arXiv:2011.09070

arXiv:2106.12652 [pdf, ps, other]

Black Box Variational Bayesian Model Averaging

Authors: Vojtech Kejzlar, Shrijita Bhattacharya, Mookyong Son, Tapabrata Maiti

Abstract: For many decades now, Bayesian Model Averaging (BMA) has been a popular framework to systematically account for model uncertainty that arises in situations when multiple competing models are available to describe the same or similar physical process. The implementation of this framework, however, comes with a multitude of practical challenges including posterior approximation via Markov Chain Mont… ▽ More For many decades now, Bayesian Model Averaging (BMA) has been a popular framework to systematically account for model uncertainty that arises in situations when multiple competing models are available to describe the same or similar physical process. The implementation of this framework, however, comes with a multitude of practical challenges including posterior approximation via Markov Chain Monte Carlo and numerical integration. We present a Variational Bayesian Inference approach to BMA as a viable alternative to the standard solutions which avoids many of the aforementioned pitfalls. The proposed method is "black box" in the sense that it can be readily applied to many models with little to no model-specific derivation. We illustrate the utility of our variational approach on a suite of examples and discuss all the necessary implementation details. Fully documented Python code with all the examples is provided as well. △ Less

Submitted 28 March, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

arXiv:2106.02290 [pdf, other]

Matrix completion with data-dependent missingness probabilities

Authors: Sohom Bhattacharya, Sourav Chatterjee

Abstract: The problem of completing a large matrix with lots of missing entries has received widespread attention in the last couple of decades. Two popular approaches to the matrix completion problem are based on singular value thresholding and nuclear norm minimization. Most of the past works on this subject assume that there is a single number $p$ such that each entry of the matrix is available independe… ▽ More The problem of completing a large matrix with lots of missing entries has received widespread attention in the last couple of decades. Two popular approaches to the matrix completion problem are based on singular value thresholding and nuclear norm minimization. Most of the past works on this subject assume that there is a single number $p$ such that each entry of the matrix is available independently with probability $p$ and missing otherwise. This assumption may not be realistic for many applications. In this work, we replace it with the assumption that the probability that an entry is available is an unknown function $f$ of the entry itself. For example, if the entry is the rating given to a movie by a viewer, then it seems plausible that high value entries have greater probability of being available than low value entries. We propose two new estimators, based on singular value thresholding and nuclear norm minimization, to recover the matrix under this assumption. The estimators involve no tuning parameters, and are shown to be consistent under a low rank assumption. We also provide a consistent estimator of the unknown function $f$. △ Less

Submitted 22 April, 2022; v1 submitted 4 June, 2021; originally announced June 2021.

Comments: 28 pages, 9 figures. To appear in IEEE Trans. Inf. Theory

arXiv:2105.08451 [pdf, other]

Bayesian Levy-Dynamic Spatio-Temporal Process: Towards Big Data Analysis

Authors: Sourabh Bhattacharya

Abstract: In this era of big data, all scientific disciplines are evolving fast to cope up with the enormity of the available information. So is statistics, the queen of science. Big data are particularly relevant to spatio-temporal statistics, thanks to much-improved technology in satellite based remote sensing and Geographical Information Systems. However, none of the existing approaches seem to meet the… ▽ More In this era of big data, all scientific disciplines are evolving fast to cope up with the enormity of the available information. So is statistics, the queen of science. Big data are particularly relevant to spatio-temporal statistics, thanks to much-improved technology in satellite based remote sensing and Geographical Information Systems. However, none of the existing approaches seem to meet the simultaneous demand of reality emulation and cheap computation. In this article, with the Levy random fields as the starting point, e construct a new Bayesian nonparametric, nonstationary and nonseparable dynamic spatio- temporal model with the additional realistic property that the lagged spatio-temporal correlations converge to zero as the lag tends to infinity. Although our Bayesian model seems to be intricately structured and is variable-dimensional with respect to each time index, we are able to devise a fast and efficient parallel Markov Chain Monte Carlo (MCMC) algorithm for Bayesian inference. Our simulation experiment brings out quite encouraging performance from our Bayesian Levy-dynamic approach. We finally apply our Bayesian Levy-dynamic model and methods to a sea surface temperature dataset consisting of 139,300 data points in space and time. Although not big data in the true sense, this is a large and highly structured data by any standard. Even for this large and complex data, our parallel MCMC algorithm, implemented on 80 processors, generated 110,000 MCMC realizations from the Levy-dynamic posterior within a single day, and the resultant Bayesian posterior predictive analysis turned out to be encouraging. Thus, it is not unreasonable to expect that with significantly more computing resources, it is feasible to analyse terabytes of spatio-temporal data with our new model and methods. △ Less

Submitted 18 May, 2021; originally announced May 2021.

Comments: Feedback welcome

arXiv:2012.09746 [pdf]

Non-parametric estimation of Expectation and Variance of event count and of incidence rate in a recurrent process -- where intensity of event-occurrence changes with the occurrence of each higher order event

Authors: Sudipta Bhattacharya

Abstract: In this paper, a novel non-parametric method for estimation of expectation and maximum value of the variance function is proposed for recurrent events where intensity of event occurrence changes with the occurrence of each higher order event. These kinds of recurrent events are often observed in clinical trials for cardio-vascular events and also in many social experiments involving drug addiction… ▽ More In this paper, a novel non-parametric method for estimation of expectation and maximum value of the variance function is proposed for recurrent events where intensity of event occurrence changes with the occurrence of each higher order event. These kinds of recurrent events are often observed in clinical trials for cardio-vascular events and also in many social experiments involving drug addiction, armed robberies, etc. Simulated data is used to demonstrate the novel approach for estimating the mean and variance of such recurrent events and the results are compared with the result of Nelson Aalen estimator. △ Less

Submitted 17 December, 2020; originally announced December 2020.

Comments: 21 pages, 2 figures

arXiv:2011.09592 [pdf, other]

Variational Bayes Neural Network: Posterior Consistency, Classification Accuracy and Computational Challenges

Authors: Shrijita Bhattacharya, Zihuan Liu, Tapabrata Maiti

Abstract: Bayesian neural network models (BNN) have re-surged in recent years due to the advancement of scalable computations and its utility in solving complex prediction problems in a wide variety of applications. Despite the popularity and usefulness of BNN, the conventional Markov Chain Monte Carlo based implementation suffers from high computational cost, limiting the use of this powerful technique in… ▽ More Bayesian neural network models (BNN) have re-surged in recent years due to the advancement of scalable computations and its utility in solving complex prediction problems in a wide variety of applications. Despite the popularity and usefulness of BNN, the conventional Markov Chain Monte Carlo based implementation suffers from high computational cost, limiting the use of this powerful technique in large scale studies. The variational Bayes inference has become a viable alternative to circumvent some of the computational issues. Although the approach is popular in machine learning, its application in statistics is somewhat limited. This paper develops a variational Bayesian neural network estimation methodology and related statistical theory. The numerical algorithms and their implementational are discussed in detail. The theory for posterior consistency, a desirable property in nonparametric Bayesian statistics, is also developed. This theory provides an assessment of prediction accuracy and guidelines for characterizing the prior distributions and variational family. The loss of using a variational posterior over the true posterior has also been quantified. The development is motivated by an important biomedical engineering application, namely building predictive tools for the transition from mild cognitive impairment to Alzheimer's disease. The predictors are multi-modal and may involve complex interactive relations. △ Less

Submitted 18 November, 2020; originally announced November 2020.

arXiv:2011.09070 [pdf]

Assessing contribution of treatment phases through tip** point analyses using rank preserving structural failure time models

Authors: Sudipta Bhattacharya, Jyotirmoy Dey

Abstract: In clinical trials, an experimental treatment is sometimes added on to a standard of care or control therapy in multiple treatment phases (e.g., concomitant and maintenance phases) to improve patient outcomes. When the new regimen provides meaningful benefit over the control therapy in such cases, it proves difficult to separately assess the contribution of each phase to the overall effect observe… ▽ More In clinical trials, an experimental treatment is sometimes added on to a standard of care or control therapy in multiple treatment phases (e.g., concomitant and maintenance phases) to improve patient outcomes. When the new regimen provides meaningful benefit over the control therapy in such cases, it proves difficult to separately assess the contribution of each phase to the overall effect observed. This article provides an approach for assessing the importance of a specific treatment phase in such a situation through tip** point analyses of a time-to-event endpoint using rank-preserving-structural-failure-time (RPSFT) modeling. A tip**-point analysis is commonly used in situations where it is suspected that a statistically significant difference between treatment arms could be a result of missing or unobserved data instead of a real treatment effect. Rank-preserving-structural-failure-time modeling is an approach for causal inference that is typically used to adjust for treatment switching in clinical trials with time to event endpoints. The methodology proposed in this article is an amalgamation of these two ideas to investigate the contribution of a treatment phase of interest to the effect of a regimen comprising multiple treatment phases. We provide two different variants of the method corresponding to two different effects of interest. We provide two different tip** point thresholds depending on inferential goals. The proposed approaches are motivated and illustrated with data from a recently concluded, real-life phase 3 cancer clinical trial. We then conclude with several considerations and recommendations. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: 33 pages, 6 figures and 4 tables

arXiv:2010.13591 [pdf, other]

Function Optimization with Posterior Gaussian Derivative Process

Authors: Sucharita Roy, Sourabh Bhattacharya

Abstract: In this article, we propose and develop a novel Bayesian algorithm for optimization of functions whose first and second partial derivatives are known. The basic premise is the Gaussian process representation of the function which induces a first derivative process that is also Gaussian. The Bayesian posterior solutions of the derivative process set equal to zero, given data consisting of suitable… ▽ More In this article, we propose and develop a novel Bayesian algorithm for optimization of functions whose first and second partial derivatives are known. The basic premise is the Gaussian process representation of the function which induces a first derivative process that is also Gaussian. The Bayesian posterior solutions of the derivative process set equal to zero, given data consisting of suitable choices of input points in the function domain and their function values, emulate the stationary points of the function, which can be fine-tuned by setting restrictions on the prior in terms of the first and second derivatives of the objective function. These observations motivate us to propose a general and effective algorithm for function optimization that attempts to get closer to the true optima adaptively with in-built iterative stages. We provide theoretical foundation to this algorithm, proving almost sure convergence to the true optima as the number of iterative stages tends to infinity. The theoretical foundation hinges upon our proofs of almost sure uniform convergence of the posteriors associated with Gaussian and Gaussian derivative processes to the underlying function and its derivatives in appropriate fixed-domain infill asymptotics setups; rates of convergence are also available. We also provide Bayesian characterization of the number of optima using information inherent in our optimization algorithm. We illustrate our Bayesian optimization algorithm with five different examples involving maxima, minima, saddle points and even inconclusiveness. Our examples range from simple, one-dimensional problems to challenging 50 and 100-dimensional problems. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: Comments welcome

arXiv:2009.13591 [pdf, other]

doi 10.1007/s42519-021-00189-w

Quantile Regression Neural Networks: A Bayesian Approach

Authors: Sanket R. Jantre, Shrijita Bhattacharya, Tapabrata Maiti

Abstract: This article introduces a Bayesian neural network estimation method for quantile regression assuming an asymmetric Laplace distribution (ALD) for the response variable. It is shown that the posterior distribution for feedforward neural network quantile regression is asymptotically consistent under a misspecified ALD model. This consistency proof embeds the problem from density estimation domain an… ▽ More This article introduces a Bayesian neural network estimation method for quantile regression assuming an asymmetric Laplace distribution (ALD) for the response variable. It is shown that the posterior distribution for feedforward neural network quantile regression is asymptotically consistent under a misspecified ALD model. This consistency proof embeds the problem from density estimation domain and uses bounds on the bracketing entropy to derive the posterior consistency over Hellinger neighborhoods. This consistency result is shown in the setting where the number of hidden nodes grow with the sample size. The Bayesian implementation utilizes the normal-exponential mixture representation of the ALD density. The algorithm uses Markov chain Monte Carlo (MCMC) simulation technique - Gibbs sampling coupled with Metropolis-Hastings algorithm. We have addressed the issue of complexity associated with the afore-mentioned MCMC implementation in the context of chain convergence, choice of starting values, and step sizes. We have illustrated the proposed method with simulation studies and real data examples. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Journal ref: J Stat Theory Pract 15 (3), 1-34, 2021

arXiv:2009.06229 [pdf, other]

Bayesian Appraisal of Random Series Convergence with Application to Climate Change

Authors: Sucharita Roy, Sourabh Bhattacharya

Abstract: Roy and Bhattacharya (2020) provided Bayesian characterization of infinite series, and their most important application, namely, to the Dirichlet series characterizing the (in)famous Riemann Hypothesis, revealed insights that are not in support of the most celebrated conjecture for over 150 years. In contrast with deterministic series considered by Roy and Bhattacharya (2020), in this article we… ▽ More Roy and Bhattacharya (2020) provided Bayesian characterization of infinite series, and their most important application, namely, to the Dirichlet series characterizing the (in)famous Riemann Hypothesis, revealed insights that are not in support of the most celebrated conjecture for over 150 years. In contrast with deterministic series considered by Roy and Bhattacharya (2020), in this article we take up random infinite series for our investigation. Remarkably, our method does not require any simplifying assumption. Albeit the Bayesian characterization theory for random series is no different from that for the deterministic setup, construction of effective upper bounds for partial sums, required for implementation, turns out to be a challenging undertaking in the random setup. In this article, we construct parametric and nonparametric upper bound forms for the partial sums of random infinite series and demonstrate the generality of the latter in comparison to the former. Simulation studies exhibit high accuracy and efficiency of the nonparametric bound in all the setups that we consider. Finally, exploiting the property that the summands tend to zero in the case of series convergence, we consider application of our nonparametric bound driven Bayesian method to global climate change analysis. Specifically, analyzing the global average temperature record over the years 1850--2016 and Holocene global average temperature reconstruction data 12,000 years before present, we conclude, in spite of the current global warming situation, that global climate dynamics is subject to temporary variability only, the current global warming being an instance, and long term global warming or cooling either in the past or in the future, are highly unlikely. △ Less

Submitted 14 September, 2020; originally announced September 2020.

Comments: Comments welcome

arXiv:2008.11175 [pdf, other]

How Ominous is the Future Global Warming Premonition?

Authors: Debashis Chatterjee, Sourabh Bhattacharya

Abstract: Global warming, the phenomenon of increasing global average temperature in the recent decades, is receiving wide attention due to its very significant adverse effects on climate. Whether global warming will continue even in the future, is a question that is most important to investigate. In this regard, the so-called general circulation models (GCMs) have attempted to project the future climate, a… ▽ More Global warming, the phenomenon of increasing global average temperature in the recent decades, is receiving wide attention due to its very significant adverse effects on climate. Whether global warming will continue even in the future, is a question that is most important to investigate. In this regard, the so-called general circulation models (GCMs) have attempted to project the future climate, and nearly all of them exhibit alarming rates of global temperature rise in the future. Although global warming in the current time frame is undeniable, it is important to assess the validity of the future predictions of the GCMs. In this article, we attempt such a study using our recently-developed Bayesian multiple testing paradigm for model selection in inverse regression problems. The model we assume for the global temperature time series is based on Gaussian process emulation of the black box scenario, realistically treating the dynamic evolution of the time series as unknown. We apply our ideas to datasets available from the Intergovernmental Panel on Climate Change (IPCC) website. The best GCM models selected by our method under different assumptions on future climate change scenarios do not convincingly support the present global warming pattern when only the future predictions are considered known. Using our Gaussian process idea, we also forecast the future temperature time series given the current one. Interestingly, our results do not support drastic future global warming predicted by almost all the GCM models. △ Less

Submitted 25 August, 2020; originally announced August 2020.

Comments: Comments welcome

arXiv:2008.05021 [pdf, other]

doi 10.1007/s11222-021-10024-8

A Fast and Calibrated Computer Model Emulator: An Empirical Bayes Approach

Authors: Vojtech Kejzlar, Mookyong Son, Shrijita Bhattacharya, Tapabrata Maiti

Abstract: Mathematical models implemented on a computer have become the driving force behind the acceleration of the cycle of scientific processes. This is because computer models are typically much faster and economical to run than physical experiments. In this work, we develop an empirical Bayes approach to predictions of physical quantities using a computer model, where we assume that the computer model… ▽ More Mathematical models implemented on a computer have become the driving force behind the acceleration of the cycle of scientific processes. This is because computer models are typically much faster and economical to run than physical experiments. In this work, we develop an empirical Bayes approach to predictions of physical quantities using a computer model, where we assume that the computer model under consideration needs to be calibrated and is computationally expensive. We propose a Gaussian process emulator and a Gaussian process model for the systematic discrepancy between the computer model and the underlying physical process. This allows for closed-form and easy-to-compute predictions given by a conditional distribution induced by the Gaussian processes. We provide a rigorous theoretical justification of the proposed approach by establishing posterior consistency of the estimated physical process. The computational efficiency of the methods is demonstrated in an extensive simulation study and a real data example. The newly established approach makes enhanced use of computer models both from practical and theoretical standpoints. △ Less

Submitted 2 July, 2021; v1 submitted 11 August, 2020; originally announced August 2020.

Journal ref: Stat Comput 31, 49 (2021)

arXiv:2008.02897 [pdf, other]

Iterative Compression of End-to-End ASR Model using AutoML

Authors: Abhinav Mehrotra, Łukasz Dudziak, **su Yeo, Young-yoon Lee, Ravichander Vipperla, Mohamed S. Abdelfattah, Sourav Bhattacharya, Samin Ishtiaq, Alberto Gil C. P. Ramos, SangJeong Lee, Daehyun Kim, Nicholas D. Lane

Abstract: Increasing demand for on-device Automatic Speech Recognition (ASR) systems has resulted in renewed interests in develo** automatic model compression techniques. Past research have shown that AutoML-based Low Rank Factorization (LRF) technique, when applied to an end-to-end Encoder-Attention-Decoder style ASR model, can achieve a speedup of up to 3.7x, outperforming laborious manual rank-selectio… ▽ More Increasing demand for on-device Automatic Speech Recognition (ASR) systems has resulted in renewed interests in develo** automatic model compression techniques. Past research have shown that AutoML-based Low Rank Factorization (LRF) technique, when applied to an end-to-end Encoder-Attention-Decoder style ASR model, can achieve a speedup of up to 3.7x, outperforming laborious manual rank-selection approaches. However, we show that current AutoML-based search techniques only work up to a certain compression level, beyond which they fail to produce compressed models with acceptable word error rates (WER). In this work, we propose an iterative AutoML-based LRF approach that achieves over 5x compression without degrading the WER, thereby advancing the state-of-the-art in ASR compression. △ Less

Submitted 6 August, 2020; originally announced August 2020.

Journal ref: INTERSPEECH 2020

arXiv:2007.07847 [pdf, other]

A Bayesian Multiple Testing Paradigm for Model Selection in Inverse Regression Problems

Authors: Debashis Chatterjee, Sourabh Bhattacharya

Abstract: In this article, we propose a novel Bayesian multiple testing formulation for model and variable selection in inverse setups, judiciously embedding the idea of inverse reference distributions proposed by Bhattacharya (2013) in a mixture framework consisting of the competing models. We develop the theory and methods in the general context encompassing parametric and nonparametric competing models,… ▽ More In this article, we propose a novel Bayesian multiple testing formulation for model and variable selection in inverse setups, judiciously embedding the idea of inverse reference distributions proposed by Bhattacharya (2013) in a mixture framework consisting of the competing models. We develop the theory and methods in the general context encompassing parametric and nonparametric competing models, dependent data, as well as misspecifications. Our investigation shows that asymptotically the multiple testing procedure almost surely selects the best possible inverse model that minimizes the minimum Kullback-Leibler divergence from the true model. We also show that the error rates, namely, versions of the false discovery rate and the false non-discovery rate converge to zero almost surely as the sample size goes to infinity. Asymptotic α-control of versions of the false discovery rate and its impact on the convergence of false non-discovery rate versions, are also investigated. Our simulation experiments involve small sample based selection among inverse Poisson log regression and inverse geometric logit and probit regression, where the regressions are either linear or based on Gaussian processes. Additionally, variable selection is also considered. Our multiple testing results turn out to be very encouraging in the sense of selecting the best models in all the non-misspecified and misspecified cases. △ Less

Submitted 15 July, 2020; originally announced July 2020.

Comments: Comments welcome

arXiv:2006.15786 [pdf, ps, other]

Statistical Foundation of Variational Bayes Neural Networks

Authors: Shrijita Bhattacharya, Tapabrata Maiti

Abstract: Despite the popularism of Bayesian neural networks in recent years, its use is somewhat limited in complex and big data situations due to the computational cost associated with full posterior evaluations. Variational Bayes (VB) provides a useful alternative to circumvent the computational cost and time complexity associated with the generation of samples from the true posterior using Markov Chain… ▽ More Despite the popularism of Bayesian neural networks in recent years, its use is somewhat limited in complex and big data situations due to the computational cost associated with full posterior evaluations. Variational Bayes (VB) provides a useful alternative to circumvent the computational cost and time complexity associated with the generation of samples from the true posterior using Markov Chain Monte Carlo (MCMC) techniques. The efficacy of the VB methods is well established in machine learning literature. However, its potential broader impact is hindered due to a lack of theoretical validity from a statistical perspective. However there are few results which revolve around the theoretical properties of VB, especially in non-parametric problems. In this paper, we establish the fundamental result of posterior consistency for the mean-field variational posterior (VP) for a feed-forward artificial neural network model. The paper underlines the conditions needed to guarantee that the VP concentrates around Hellinger neighborhoods of the true density function. Additionally, the role of the scale parameter and its influence on the convergence rates has also been discussed. The paper mainly relies on two results (1) the rate at which the true posterior grows (2) the rate at which the KL-distance between the posterior and variational posterior grows. The theory provides a guideline of building prior distributions for Bayesian NN models along with an assessment of accuracy of the corresponding VB implementation. △ Less

Submitted 28 June, 2020; originally announced June 2020.

arXiv:2006.07405 [pdf, other]

O(1) Communication for Distributed SGD through Two-Level Gradient Averaging

Authors: Subhadeep Bhattacharya, Weikuan Yu, Fahim Tahmid Chowdhury

Abstract: Large neural network models present a hefty communication challenge to distributed Stochastic Gradient Descent (SGD), with a communication complexity of O(n) per worker for a model of n parameters. Many sparsification and quantization techniques have been proposed to compress the gradients, some reducing the communication complexity to O(k), where k << n. In this paper, we introduce a strategy cal… ▽ More Large neural network models present a hefty communication challenge to distributed Stochastic Gradient Descent (SGD), with a communication complexity of O(n) per worker for a model of n parameters. Many sparsification and quantization techniques have been proposed to compress the gradients, some reducing the communication complexity to O(k), where k << n. In this paper, we introduce a strategy called two-level gradient averaging (A2SGD) to consolidate all gradients down to merely two local averages per worker before the computation of two global averages for an updated model. A2SGD also retains local errors to maintain the variance for fast convergence. Our theoretical analysis shows that A2SGD converges similarly like the default distributed SGD algorithm. Our evaluation validates the theoretical conclusion and demonstrates that A2SGD significantly reduces the communication traffic per worker, and improves the overall training time of LSTM-PTB by 3.2x and 23.2x, respectively, compared to Top-K and QSGD. To the best of our knowledge, A2SGD is the first to achieve O(1) communication complexity per worker for distributed SGD. △ Less

Submitted 15 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

arXiv:2006.06020 [pdf, ps, other]

Convergence of Pseudo-Bayes Factors in Forward and Inverse Regression Problems

Authors: Debashis Chatterjee, Sourabh Bhattacharya

Abstract: In the Bayesian literature on model comparison, Bayes factors play the leading role. In the classical statistical literature, model selection criteria are often devised used cross-validation ideas. Amalgamating the ideas of Bayes factor and cross-validation Geisser and Eddy (1979) created the pseudo-Bayes factor. The usage of cross-validation inculcates several theoretical advantages, computationa… ▽ More In the Bayesian literature on model comparison, Bayes factors play the leading role. In the classical statistical literature, model selection criteria are often devised used cross-validation ideas. Amalgamating the ideas of Bayes factor and cross-validation Geisser and Eddy (1979) created the pseudo-Bayes factor. The usage of cross-validation inculcates several theoretical advantages, computational simplicity and numerical stability in Bayes factors as the marginal density of the entire dataset is replaced with products of cross-validation densities of individual data points. However, the popularity of pseudo-Bayes factors is still negligible in comparison with Bayes factors, with respect to both theoretical investigations and practical applications. In this article, we establish almost sure exponential convergence of pseudo-Bayes factors for large samples under a general setup consisting of dependent data and model misspecifications. We particularly focus on general parametric and nonparametric regression setups in both forward and inverse contexts. We illustrate our theoretical results with various examples, providing explicit calculations. We also supplement our asymptotic theory with simulation experiments in small sample situations of Poisson log regression and geometric logit and probit regression, additionally addressing the variable selection problem. We consider both linear and nonparametric regression modeled by Gaussian processes for our purposes. Our simulation results provide quite interesting insights into the usage of pseudo-Bayes factors in forward and inverse setups. △ Less

Submitted 10 June, 2020; originally announced June 2020.

Comments: Comments welcome

arXiv:2005.07468 [pdf, ps, other]

Hierarchical Bayesian state-space modeling of age- and sex-structured wildlife population dynamics

Authors: Sabyasachi Mukhopadhyay, Hans-Peter Piepho, Sourabh Bhattacharya, Holly T. Dublin, Joseph O. Ogutu

Abstract: Biodiversity is declining at alarming rates worldwide, including for large wild mammals. It is therefore imperative to develop effective population conservation and recovery strategies. Population dynamics models can provide insights into processes driving declines of particular populations of a species and their relative importance. We develop an integrated Bayesian state-space population dynamic… ▽ More Biodiversity is declining at alarming rates worldwide, including for large wild mammals. It is therefore imperative to develop effective population conservation and recovery strategies. Population dynamics models can provide insights into processes driving declines of particular populations of a species and their relative importance. We develop an integrated Bayesian state-space population dynamics model for wildlife populations and illustrate it using a topi population inhabiting the Masai Mara Ecosystem in Kenya. The model is general and integrates ground demographic survey with aerial survey monitoring data. It incorporates population age- and sex-structure and life-history traits and relates birth rates, age-specific survival rates and sex ratio with meteorological covariates, prior population density, environmental seasonality and predation risk. The model runs on a monthly time step, enabling accurate characterization of reproductive seasonality, phenology, synchrony and prolificacy of births and juvenile recruitment. Model performance is evaluated using balanced bootstrap sampling and comparing predictions with aerial population size estimates. The model is implemented using MCMC methods and reproduces several well-known features of the Mara topi population, including striking and persistent population decline, seasonality of births and juvenile recruitment. It can be readily adapted for other wildlife species and extended to incorporate several additional useful features. △ Less

Submitted 19 December, 2021; v1 submitted 15 May, 2020; originally announced May 2020.

arXiv:1912.02595 [pdf, other]

Outlier detection and a tail-adjusted boxplot based on extreme value theory

Authors: Shrijita Bhattacharya, Jan Beirlant

Abstract: Whether an extreme observation is an outlier or not, depends strongly on the corresponding tail behaviour of the underlying distribution. We develop an automatic, data-driven method to identify extreme tail behaviour that deviates from the intermediate and central characteristics. This allows for detecting extreme outliers or sets of extreme data that show less spread than the bulk of the data. To… ▽ More Whether an extreme observation is an outlier or not, depends strongly on the corresponding tail behaviour of the underlying distribution. We develop an automatic, data-driven method to identify extreme tail behaviour that deviates from the intermediate and central characteristics. This allows for detecting extreme outliers or sets of extreme data that show less spread than the bulk of the data. To this end we extend a testing method proposed in Bhattacharya et al 2019 for the specific case of heavy tailed models, to all max-domains of attraction. Consequently we propose a tail-adjusted boxplot which yields a more accurate representation of possible outliers. Several examples and simulation results illustrate the finite sample behaviour of this approach. △ Less

Submitted 5 December, 2019; originally announced December 2019.

arXiv:1911.02623 [pdf, ps, other]

Map Enhanced Route Travel Time Prediction using Deep Neural Networks

Authors: Soumi Das, Rajath Nandan Kalava, Kolli Kiran Kumar, Akhil Kandregula, Kalpam Suhaas, Sourangshu Bhattacharya, Niloy Ganguly

Abstract: Travel time estimation is a fundamental problem in transportation science with extensive literature. The study of these techniques has intensified due to availability of many publicly available large trip datasets. Recently developed deep learning based models have improved the generality and performance and have focused on estimating times for individual sub-trajectories and aggregating them to p… ▽ More Travel time estimation is a fundamental problem in transportation science with extensive literature. The study of these techniques has intensified due to availability of many publicly available large trip datasets. Recently developed deep learning based models have improved the generality and performance and have focused on estimating times for individual sub-trajectories and aggregating them to predict the travel time of the entire trajectory. However, these techniques ignore the road network information. In this work, we propose and study techniques for incorporating road networks along with historical trips' data into travel time prediction. We incorporate both node embeddings as well as road distance into the existing model. Experiments on large real-world benchmark datasets suggest improved performance, especially when the train data is small. As expected, the proposed method performs better than the baseline when there is a larger difference between road distance and Vincenty distance between start and end points. △ Less

Submitted 6 November, 2019; originally announced November 2019.

arXiv:1911.00915 [pdf, other]

Estimating accuracy of the MCMC variance estimator: a central limit theorem for batch means estimators

Authors: Saptarshi Chakraborty, Suman K. Bhattacharya, Kshitij Khare

Abstract: The batch means estimator of the MCMC variance is a simple and effective measure of accuracy for MCMC based ergodic averages. Under various regularity conditions, the estimator has been shown to be consistent for the true variance. However, the estimator can be unstable in practice as it depends directly on the raw MCMC output. A measure of accuracy of the batch means estimator itself, ideally in… ▽ More The batch means estimator of the MCMC variance is a simple and effective measure of accuracy for MCMC based ergodic averages. Under various regularity conditions, the estimator has been shown to be consistent for the true variance. However, the estimator can be unstable in practice as it depends directly on the raw MCMC output. A measure of accuracy of the batch means estimator itself, ideally in the form of a confidence interval, is therefore desirable. The asymptotic variance of the batch means estimator is known; however, without any knowledge of asymptotic distribution, asymptotic variances are in general insufficient to describe variability. In this article we prove a central limit theorem for the batch means estimator that allows for the construction of asymptotically accurate confidence intervals for the batch means estimator. Additionally, our results provide a Markov chain analogue of the classical CLT for the sample variance parameter for i.i.d. observations. Our result assumes standard regularity conditions similar to the ones assumed in the literature for proving consistency. Simulated and real data examples are included as illustrations and applications of the CLT. △ Less

Submitted 3 November, 2019; originally announced November 2019.

Comments: 28 pages, 2 figures

MSC Class: 60J22 (Primary); 62F15 (secondary)

arXiv:1810.10495 [pdf, ps, other]

Posterior Convergence of Gaussian and General Stochastic Process Regression Under Possible Misspecifications

Authors: Debashis Chatterjee, Sourabh Bhattacharya

Abstract: In this article, we investigate posterior convergence in nonparametric regression models where the unknown regression function is modeled by some appropriate stochastic process. In this regard, we consider two setups. The first setup is based on Gaussian processes, where the covariates are either random or non-random and the noise may be either normally or double-exponentially distributed. In the… ▽ More In this article, we investigate posterior convergence in nonparametric regression models where the unknown regression function is modeled by some appropriate stochastic process. In this regard, we consider two setups. The first setup is based on Gaussian processes, where the covariates are either random or non-random and the noise may be either normally or double-exponentially distributed. In the second setup, we assume that the underlying regression function is modeled by some reasonably smooth, but unspecified stochastic process satisfying reasonable conditions. The distribution of the noise is also left unspecified, but assumed to be thick-tailed. As in the previous studies regarding the same problems, we do not assume that the truth lies in the postulated parameter space, thus explicitly allowing the possibilities of misspecification. We exploit the general results of Shalizi (2009) for our purpose and establish not only posterior consistency, but also the rates at which the posterior probabilities converge, which turns out to be the Kullback-Leibler divergence rate. We also investigate the more familiar posterior convergence rates. Interestingly, we show that the posterior predictive distribution can accurately approximate the best possible predictive distribution in the sense that the Hellinger distance, as well as the total variation distance between the two distributions can tend to zero, in spite of misspecifications. △ Less

Submitted 1 May, 2020; v1 submitted 24 October, 2018; originally announced October 2018.

Comments: An updated version

arXiv:1810.09909 [pdf, other]

Bayes Factor Asymptotics for Variable Selection in the Gaussian Process Framework

Authors: Minerva Mukhopadhyay, Sourabh Bhattacharya

Abstract: Although variable selection is one of the most popular areas of modern statistical research, much of its development has taken place in the classical paradigm compared to the Bayesian counterpart. Somewhat surprisingly, both the paradigms have focussed almost completely on linear models, in spite of the vast scope offered by the model liberation movement brought about by modern advancements in stu… ▽ More Although variable selection is one of the most popular areas of modern statistical research, much of its development has taken place in the classical paradigm compared to the Bayesian counterpart. Somewhat surprisingly, both the paradigms have focussed almost completely on linear models, in spite of the vast scope offered by the model liberation movement brought about by modern advancements in studying real, complex phenomena. In this article, we investigate general Bayesian variable selection in models driven by Gaussian processes, which allows us to treat linear, non-linear and nonparametric models, in conjunction with even dependent setups, in the same vein. We consider the Bayes factor route to variable selection, and develop a general asymptotic theory for the Gaussian process framework in the "large p, large n" settings even with p>>n, establishing almost sure exponential convergence of the Bayes factor under appropriately mild conditions. The fixed p setup is included as a special case. To illustrate, we apply our general result to variable selection in linear regression, Gaussian process model with squared exponential covariance function accommodating the covariates, and a first order autoregressive process with time-varying covariates. We also follow up our theoretical investigations with ample simulation experiments in the above regression contexts and variable selection in a real, riboflavin data consisting of 71 observations but 4088 covariates. For implementation of variable selection using Bayes factors, we develop a novel and effective general-purpose transdimensional, transformation based Markov chain Monte Carlo algorithm, which has played a crucial role in our simulated and real data applications. △ Less

Submitted 26 May, 2021; v1 submitted 23 October, 2018; originally announced October 2018.

Comments: A very significantly updated version, with extensive treatment of the "large p, large n" paradigm, even when p>>n. Substantial methodological development added with TTMCMC based Bayes factor oriented variable selection, along with ample simulation experiments and a real data analysis in the bona fide "large p, small n" premise

arXiv:1808.07704 [pdf, other]

Data-adaptive trimming of the Hill estimator and detection of outliers in the extremes of heavy-tailed data

Authors: Shrijita Bhattacharya, Michael Kallitsis, Stilian Stoev

Abstract: We introduce a trimmed version of the Hill estimator for the index of a heavy-tailed distribution, which is robust to perturbations in the extreme order statistics. In the ideal Pareto setting, the estimator is essentially finite-sample efficient among all unbiased estimators with a given strict upper break-down point. For general heavy-tailed models, we establish the asymptotic normality of the e… ▽ More We introduce a trimmed version of the Hill estimator for the index of a heavy-tailed distribution, which is robust to perturbations in the extreme order statistics. In the ideal Pareto setting, the estimator is essentially finite-sample efficient among all unbiased estimators with a given strict upper break-down point. For general heavy-tailed models, we establish the asymptotic normality of the estimator under second order regular variation conditions and also show it is minimax rate-optimal in the Hall class of distributions. We also develop an automatic, data-driven method for the choice of the trimming parameter which yields a new type of robust estimator that can adapt to the unknown level of contamination in the extremes. This adaptive robustness property makes our estimator particularly appealing and superior to other robust estimators in the setting where the extremes of the data are contaminated. As an important application of the data-driven selection of the trimming parameters, we obtain a methodology for the principled identification of extreme outliers in heavy tailed data. Indeed, the method has been shown to correctly identify the number of outliers in the previously explored Condroz data set. △ Less

Submitted 23 August, 2018; originally announced August 2018.

arXiv:1711.03758 [pdf, other]

A Novel Bayesian Multiple Testing Approach to Deregulated miRNA Discovery Harnessing Positional Clustering

Authors: Noirrit Kiran Chandra, Richa Singh, Sourabh Bhattacharya

Abstract: MicroRNAs (miRNAs) are small non-coding RNAs that function as regulators of gene expression. In recent years, there has been a tremendous and growing interest among researchers to investigate the role of miRNAs in normal cellular as well as in disease processes. Thus to investigate the role of miRNAs in oral cancer, we analyse the expression levels of miRNAs to identify miRNAs with statistically s… ▽ More MicroRNAs (miRNAs) are small non-coding RNAs that function as regulators of gene expression. In recent years, there has been a tremendous and growing interest among researchers to investigate the role of miRNAs in normal cellular as well as in disease processes. Thus to investigate the role of miRNAs in oral cancer, we analyse the expression levels of miRNAs to identify miRNAs with statistically significant differential expression in cancer tissues. In this article, we propose a novel Bayesian hierarchical model of miRNA expression data. Compelling evidences have demonstrated that the transcription process of miRNAs in human genome is a latent process instrumental for the observed expression levels. We take into account positional clustering of the miRNAs in the analysis and model the latent transcription phenomenon nonparametrically by an appropriate Gaussian process. For the testing purpose we employ a novel Bayesian multiple testing method where we mainly focus on utilizing the dependence structure between the hypotheses for better results, while also ensuring optimality in many respects. Indeed, our non-marginal method yielded results in accordance with the underlying scientific knowledge which are found to be missed by the very popular Benjamini-Hochberg method. △ Less

Submitted 11 April, 2018; v1 submitted 10 November, 2017; originally announced November 2017.

Comments: An updated version

arXiv:1711.02068 [pdf, other]

From Multimodal to Unimodal Webpages for Develo** Countries

Authors: Vidyapu Sandeep, V Vijaya Saradhi, Samit Bhattacharya

Abstract: The multimodal web elements such as text and images are associated with inherent memory costs to store and transfer over the Internet. With the limited network connectivity in develo** countries, webpage rendering gets delayed in the presence of high-memory demanding elements such as images (relative to text). To overcome this limitation, we propose a Canonical Correlation Analysis (CCA) based c… ▽ More The multimodal web elements such as text and images are associated with inherent memory costs to store and transfer over the Internet. With the limited network connectivity in develo** countries, webpage rendering gets delayed in the presence of high-memory demanding elements such as images (relative to text). To overcome this limitation, we propose a Canonical Correlation Analysis (CCA) based computational approach to replace high-cost modality with an equivalent low-cost modality. Our model learns a common subspace for low-cost and high-cost modalities that maximizes the correlation between their visual features. The obtained common subspace is used for determining the low-cost (text) element of a given high-cost (image) element for the replacement. We analyze the cost-saving performance of the proposed approach through an eye-tracking experiment conducted on real-world webpages. Our approach reduces the memory-cost by at least 83.35% by replacing images with text. △ Less

Submitted 6 November, 2017; originally announced November 2017.

Comments: Presented at NIPS 2017 Workshop on Machine Learning for the Develo** World

arXiv:1709.08073 [pdf, other]

doi 10.1145/3240925.3240937

Cross-modal Recurrent Models for Weight Objective Prediction from Multimodal Time-series Data

Authors: Petar Veličković, Laurynas Karazija, Nicholas D. Lane, Sourav Bhattacharya, Edgar Liberis, Pietro Liò, Angela Chieh, Otmane Bellahsen, Matthieu Vegreville

Abstract: We analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over baseline approaches. The X-LSTM improves parameter eff… ▽ More We analyse multimodal time-series data corresponding to weight, sleep and steps measurements. We focus on predicting whether a user will successfully achieve his/her weight objective. For this, we design several deep long short-term memory (LSTM) architectures, including a novel cross-modal LSTM (X-LSTM), and demonstrate their superiority over baseline approaches. The X-LSTM improves parameter efficiency by processing each modality separately and allowing for information flow between them by way of recurrent cross-connections. We present a general hyperparameter optimisation technique for X-LSTMs, which allows us to significantly improve on the LSTM and a prior state-of-the-art cross-modal approach, using a comparable number of parameters. Finally, we visualise the model's predictions, revealing implications about latent variables in this task. △ Less

Submitted 29 November, 2017; v1 submitted 23 September, 2017; originally announced September 2017.

Comments: To appear in NIPS ML4H 2017 and NIPS TSW 2017

arXiv:1707.06852 [pdf, other]

A Statistical Perspective on Inverse and Inverse Regression Problems

Authors: Debashis Chatterjee, Sourabh Bhattacharya

Abstract: Inverse problems, where in broad sense the task is to learn from the noisy response about some unknown function, usually represented as the argument of some known functional form, has received wide attention in the general scientific disciplines. How- ever, in mainstream statistics such inverse problem paradigm does not seem to be as popular. In this article we provide a brief overview of such pro… ▽ More Inverse problems, where in broad sense the task is to learn from the noisy response about some unknown function, usually represented as the argument of some known functional form, has received wide attention in the general scientific disciplines. How- ever, in mainstream statistics such inverse problem paradigm does not seem to be as popular. In this article we provide a brief overview of such problems from a statistical, particularly Bayesian, perspective. We also compare and contrast the above class of problems with the perhaps more statistically familiar inverse regression problems, arguing that this class of problems contains the traditional class of inverse problems. In course of our review we point out that the statistical literature is very scarce with respect to both the inverse paradigms, and substantial research work is still necessary to develop the fields. △ Less

Submitted 21 July, 2017; originally announced July 2017.

Comments: To appear in RASHI

arXiv:1706.03842 [pdf, other]

Approximate Structure Construction Using Large Statistical Swarms

Authors: Subhrajit Bhattacharya

Abstract: In this paper we describe a novel local algorithm for large statistical swarms using "harmonic attractor dynamics", by means of which a swarm can construct harmonics of the environment. This in turn allows the swarm to approximately reconstruct desired structures in the environment. The robots navigate in a discrete environment, completely free of localization, being able to communicate with other… ▽ More In this paper we describe a novel local algorithm for large statistical swarms using "harmonic attractor dynamics", by means of which a swarm can construct harmonics of the environment. This in turn allows the swarm to approximately reconstruct desired structures in the environment. The robots navigate in a discrete environment, completely free of localization, being able to communicate with other robots in its own discrete cell only, and being able to sense or take reliable action within a disk of radius $r$ around itself. We present the mathematics that underlie such dynamics and present initial results demonstrating the proposed algorithm. △ Less

Submitted 12 June, 2017; originally announced June 2017.

Comments: 9 pages, 7 figures

arXiv:1705.03088 [pdf, other]

Trimming the Hill estimator: robustness, optimality and adaptivity

Authors: Shrijita Bhattacharya, Michael Kallitsis, Stilian Stoev

Abstract: We introduce a trimmed version of the Hill estimator for the index of a heavy-tailed distribution, which is robust to perturbations in the extreme order statistics. In the ideal Pareto setting, the estimator is essentially finite-sample efficient among all unbiased estimators with a given strict upper break-down point. For general heavy-tailed models, we establish the asymptotic normality of the e… ▽ More We introduce a trimmed version of the Hill estimator for the index of a heavy-tailed distribution, which is robust to perturbations in the extreme order statistics. In the ideal Pareto setting, the estimator is essentially finite-sample efficient among all unbiased estimators with a given strict upper break-down point. For general heavy-tailed models, we establish the asymptotic normality of the estimator under second order conditions and discuss its minimax optimal rate in the Hall class. We introduce the so-called trimmed Hill plot, which can be used to select the number of top order statistics to trim. We also develop an automatic, data-driven procedure for the choice of trimming. This results in a new type of robust estimator that can {\em adapt} to the unknown level of contamination in the extremes. As a by-product we also obtain a methodology for identifying extreme outliers in heavy tailed data. The competitive performance of the trimmed Hill and adaptive trimmed Hill estimators is illustrated with simulations. △ Less

Submitted 14 November, 2017; v1 submitted 8 May, 2017; originally announced May 2017.

arXiv:1704.07349 [pdf, other]

A Non-Gaussian, Nonparametric Structure for Gene-Gene and Gene-Environment Interactions in Case-Control Studies Based on Hierarchies of Dirichlet Processes

Authors: Durba Bhattacharya, Sourabh Bhattacharya

Abstract: It is becoming increasingly clear that complex interactions among genes and environmental factors play crucial roles in triggering complex diseases. Thus, understanding such interactions is vital, which is possible only through statistical models that adequately account for such intricate, albeit unknown, dependence structures. Bhattacharya & Bhattacharya (2016b) attempt such modeling, relating fi… ▽ More It is becoming increasingly clear that complex interactions among genes and environmental factors play crucial roles in triggering complex diseases. Thus, understanding such interactions is vital, which is possible only through statistical models that adequately account for such intricate, albeit unknown, dependence structures. Bhattacharya & Bhattacharya (2016b) attempt such modeling, relating finite mixtures composed of Dirichlet processes that represent unknown number of genetic sub-populations through a hierarchical matrix-normal structure that incorporates gene-gene interactions, and possible mutations, induced by environmental variables. However, the product dependence structure implied by their matrix-normal model seems to be too simple to be appropriate for general complex, realistic situations. In this article, we propose and develop a novel nonparametric Bayesian model for case-control genotype data using hierarchies of Dirichlet processes that offers a more realistic and nonparametric dependence structure between the genes, induced by the environmental variables. In this regard, we propose a novel and highly parallelisable MCMC algorithm that is rendered quite efficient by the combination of modern parallel computing technology, effective Gibbs sampling steps, retrospective sampling and Transformation based Markov Chain Monte Carlo (TMCMC). We use appropriate Bayesian hypothesis testing procedures to detect the roles of genes and environment in case-control studies. We apply our ideas to 5 biologically realistic case-control genotype datasets simulated under distinct set-ups, and obtain encouraging results in each case. We finally apply our ideas to a real, myocardial infarction dataset, and obtain interesting results on gene-gene and gene-environment interaction, while broadly agreeing with the results reported in the literature. △ Less

Submitted 1 May, 2020; v1 submitted 24 April, 2017; originally announced April 2017.

Comments: An updated version

arXiv:1703.04956 [pdf, ps, other]

A Short Note on Almost Sure Convergence of Bayes Factors in the General Set-Up

Authors: Debashis Chatterjee, Trisha Maitra, Sourabh Bhattacharya

Abstract: Although there is a significant literature on the asymptotic theory of Bayes factor, the set-ups considered are usually specialized and often involves independent and identically distributed data. Even in such specialized cases, mostly weak consistency results are available. In this article, for the first time ever, we derive the almost sure convergence theory of Bayes factor in the general set-up… ▽ More Although there is a significant literature on the asymptotic theory of Bayes factor, the set-ups considered are usually specialized and often involves independent and identically distributed data. Even in such specialized cases, mostly weak consistency results are available. In this article, for the first time ever, we derive the almost sure convergence theory of Bayes factor in the general set-up that includes even dependent data and misspecified models. Somewhat surprisingly, the key to the proof of such a general theory is a simple application of a result of Shalizi (2009) to a well-known identity satisfied by the Bayes factor. △ Less

Submitted 17 April, 2018; v1 submitted 15 March, 2017; originally announced March 2017.

Comments: To appear in The American Statistician

arXiv:1610.08367 [pdf, other]

Nonparametric Dynamic State Space Modeling of Observed Circular Time Series with Circular Latent States: A Bayesian Perspective

Authors: Satyaki Mazumder, Sourabh Bhattacharya

Abstract: Circular time series has received relatively little attention in statistics and modeling complex circular time series using the state space approach is non-existent in the literature. In this article we introduce a flexible Bayesian nonparametric approach to state space modeling of observed circular time series where even the latent states are circular random variables. Crucially, we assume that t… ▽ More Circular time series has received relatively little attention in statistics and modeling complex circular time series using the state space approach is non-existent in the literature. In this article we introduce a flexible Bayesian nonparametric approach to state space modeling of observed circular time series where even the latent states are circular random variables. Crucially, we assume that the forms of both observational and evolutionary functions, both of which are circular in nature, are unknown and time-varying. We model these unknown circular functions by appropriate wrapped Gaussian processes having desirable properties. We develop an effective Markov chain Monte Carlo strategy for implementing our Bayesian model, by judiciously combining Gibbs sampling and Metropolis-Hastings methods. Validation of our ideas with a simulation study and two real bivariate circular time series data sets, where we assume one of the variables to be unobserved, revealed very encouraging performance of our model and methods. We finally analyse a data consisting of directions of whale migration, considering the unobserved ocean current direction as the latent circular process of interest. The results that we obtain are encouraging, and the posterior predictive distribution of the observed process correctly predicts the observed whale movement. △ Less

Submitted 15 March, 2017; v1 submitted 26 October, 2016; originally announced October 2016.

Comments: This significantly updated version will appear in Journal of Statistical Theory and Practice

arXiv:1610.01712 [pdf, other]

A Methodology for Customizing Clinical Tests for Esophageal Cancer based on Patient Preferences

Authors: Asis Roy, Sourangshu Bhattacharya, Kalyan Guin

Abstract: Tests for Esophageal cancer can be expensive, uncomfortable and can have side effects. For many patients, we can predict non-existence of disease with 100% certainty, just using demographics, lifestyle, and medical history information. Our objective is to devise a general methodology for customizing tests using user preferences so that expensive or uncomfortable tests can be avoided. We propose to… ▽ More Tests for Esophageal cancer can be expensive, uncomfortable and can have side effects. For many patients, we can predict non-existence of disease with 100% certainty, just using demographics, lifestyle, and medical history information. Our objective is to devise a general methodology for customizing tests using user preferences so that expensive or uncomfortable tests can be avoided. We propose to use classifiers trained from electronic health records (EHR) for selection of tests. The key idea is to design classifiers with 100% false normal rates, possibly at the cost higher false abnormals. We compare Naive Bayes classification (NB), Random Forests (RF), Support Vector Machines (SVM) and Logistic Regression (LR), and find kernel Logistic regression to be most suitable for the task. We propose an algorithm for finding the best probability threshold for kernel LR, based on test set accuracy. Using the proposed algorithm, we describe schemes for selecting tests, which appear as features in the automatic classification algorithm, using preferences on costs and discomfort of the users. We test our methodology with EHRs collected for more than 3000 patients, as a part of project carried out by a reputed hospital in Mumbai, India. Kernel SVM and kernel LR with a polynomial kernel of degree 3, yields an accuracy of 99.8% and sensitivity 100%, without the MP features, i.e. using only clinical tests. We demonstrate our test selection algorithm using two case studies, one using cost of clinical tests, and other using "discomfort" values for clinical tests. We compute the test sets corresponding to the lowest false abnormals for each criterion described above, using exhaustive enumeration of 15 clinical tests. The sets turn out to different, substantiating our claim that one can customize test sets based on user preferences. △ Less

Submitted 5 October, 2016; originally announced October 2016.

arXiv:1602.07280 [pdf, other]

A Statistical Model for Stroke Outcome Prediction and Treatment Planning

Authors: Abhishek Sengupta, Vaibhav Rajan, Sakyajit Bhattacharya, G R K Sarma

Abstract: Stroke is a major cause of mortality and long--term disability in the world. Predictive outcome models in stroke are valuable for personalized treatment, rehabilitation planning and in controlled clinical trials. In this paper we design a new model to predict outcome in the short-term, the putative therapeutic window for several treatments. Our regression-based model has a parametric form that is… ▽ More Stroke is a major cause of mortality and long--term disability in the world. Predictive outcome models in stroke are valuable for personalized treatment, rehabilitation planning and in controlled clinical trials. In this paper we design a new model to predict outcome in the short-term, the putative therapeutic window for several treatments. Our regression-based model has a parametric form that is designed to address many challenges common in medical datasets like highly correlated variables and class imbalance. Empirically our model outperforms the best--known previous models in predicting short--term outcomes and in inferring the most effective treatments that improve outcome. △ Less

Submitted 22 February, 2016; originally announced February 2016.

arXiv:1601.03519 [pdf, other]

Effects of Gene-Environment and Gene-Gene Interactions in Case-Control Studies: A Novel Bayesian Semiparametric Approach

Authors: Durba Bhattacharya, Sourabh Bhattacharya

Abstract: Cognizance of gene-environment interactions may help prevent or detain the onset of complex diseases like cardiovascular disease, cancer, type2 diabetes, autism or asthma by adjustments to lifestyle. In this regard, we extend the Bayesian semiparametric gene-gene interaction model of Bhattacharya & Bhattacharya (2015) to include the possibility of influencing gene-gene interactions by environmenta… ▽ More Cognizance of gene-environment interactions may help prevent or detain the onset of complex diseases like cardiovascular disease, cancer, type2 diabetes, autism or asthma by adjustments to lifestyle. In this regard, we extend the Bayesian semiparametric gene-gene interaction model of Bhattacharya & Bhattacharya (2015) to include the possibility of influencing gene-gene interactions by environmental variables and possible mutations caused by the environment. Our model accounts for the unknown number of genetic sub-populations via finite mixtures composed of Dirichlet processes, which are related to each other through a hierarchical matrix normal structure responsible for inducing gene-gene interactions and possible mutations in association with environmental variables. We also extend the Bayesian hypotheses testing procedures of Bhattacharya & Bhattacharya (2015) to detect the roles of genes and their interactions, environment and the influence of environment on gene-gene interactions, in case-control studies. We develop an effective parallel computing methodology, which harnesses the power of parallel processing technology to the efficiencies of our conditionally independent Gibbs sampling and Transformation based MCMC (TMCMC) methods.Applications of our model and methods to simulation studies with biologically realistic case-control genotype datasets obtained under five distinct set-ups yield encouraging results in each case. We followed these up by application of our ideas to a real, case-control based genotype dataset on early onset of myocardial infarction. Beside being in broad agreement with the reported literature on this dataset, the results obtained give some interesting insights to the differential effect of gender on MI. △ Less

Submitted 21 July, 2017; v1 submitted 14 January, 2016; originally announced January 2016.

Comments: The latest version. arXiv admin note: text overlap with arXiv:1411.7571

Showing 1–50 of 76 results for author: Bhattacharya, S