Search | arXiv e-print repository

Gaussian Process Neural Additive Models

Authors: Wei Zhang, Brian Barr, John Paisley

Abstract: Deep neural networks have revolutionized many fields, but their black-box nature also occasionally prevents their wider adoption in fields such as healthcare and finance, where interpretable and explainable models are required. The recent development of Neural Additive Models (NAMs) is a significant step in the direction of interpretable deep learning for tabular datasets. In this paper, we propos… ▽ More Deep neural networks have revolutionized many fields, but their black-box nature also occasionally prevents their wider adoption in fields such as healthcare and finance, where interpretable and explainable models are required. The recent development of Neural Additive Models (NAMs) is a significant step in the direction of interpretable deep learning for tabular datasets. In this paper, we propose a new subclass of NAMs that use a single-layer neural network construction of the Gaussian process via random Fourier features, which we call Gaussian Process Neural Additive Models (GP-NAM). GP-NAMs have the advantage of a convex objective function and number of trainable parameters that grows linearly with feature dimensionality. It suffers no loss in performance compared to deeper NAM approaches because GPs are well-suited for learning complex non-parametric univariate functions. We demonstrate the performance of GP-NAM on several tabular datasets, showing that it achieves comparable or better performance in both classification and regression tasks with a large reduction in the number of parameters. △ Less

Submitted 19 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Appears at AAAI 2024

arXiv:2309.09222 [pdf, other]

Data-driven Modeling and Inference for Bayesian Gaussian Process ODEs via Double Normalizing Flows

Authors: Jian Xu, Shian Du, Junmei Yang, Xinghao Ding, John Paisley, Delu Zeng

Abstract: Recently, Gaussian processes have been used to model the vector field of continuous dynamical systems, referred to as GPODEs, which are characterized by a probabilistic ODE equation. Bayesian inference for these models has been extensively studied and applied in tasks such as time series prediction. However, the use of standard GPs with basic kernels like squared exponential kernels has been commo… ▽ More Recently, Gaussian processes have been used to model the vector field of continuous dynamical systems, referred to as GPODEs, which are characterized by a probabilistic ODE equation. Bayesian inference for these models has been extensively studied and applied in tasks such as time series prediction. However, the use of standard GPs with basic kernels like squared exponential kernels has been common in GPODE research, limiting the model's ability to represent complex scenarios. To address this limitation, we introduce normalizing flows to reparameterize the ODE vector field, resulting in a data-driven prior distribution, thereby increasing flexibility and expressive power. We develop a data-driven variational learning algorithm that utilizes analytically tractable probability density functions of normalizing flows, enabling simultaneous learning and inference of unknown continuous dynamics. Additionally, we also apply normalizing flows to the posterior inference of GP ODEs to resolve the issue of strong mean-field assumptions in posterior inference. By applying normalizing flows in both these ways, our model improves accuracy and uncertainty estimates for Bayesian Gaussian Process ODEs. We validate the effectiveness of our approach on simulated dynamical systems and real-world human motion data, including time series prediction and missing data recovery tasks. Experimental results show that our proposed method effectively captures model uncertainty while improving accuracy. △ Less

Submitted 2 January, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

arXiv:2303.08230 [pdf, other]

Bayesian Beta-Bernoulli Process Sparse Coding with Deep Neural Networks

Authors: Arunesh Mittal, Kai Yang, Paul Sajda, John Paisley

Abstract: Several approximate inference methods have been proposed for deep discrete latent variable models. However, non-parametric methods which have previously been successfully employed for classical sparse coding models have largely been unexplored in the context of deep models. We propose a non-parametric iterative algorithm for learning discrete latent representations in such deep models. Additionall… ▽ More Several approximate inference methods have been proposed for deep discrete latent variable models. However, non-parametric methods which have previously been successfully employed for classical sparse coding models have largely been unexplored in the context of deep models. We propose a non-parametric iterative algorithm for learning discrete latent representations in such deep models. Additionally, to learn scale invariant discrete features, we propose local data scaling variables. Lastly, to encourage sparsity in our representations, we propose a Beta-Bernoulli process prior on the latent factors. We evaluate our spare coding model coupled with different likelihood models. We evaluate our method across datasets with varying characteristics and compare our results to current amortized approximate inference methods. △ Less

Submitted 14 March, 2023; originally announced March 2023.

arXiv:2303.04450 [pdf, other]

Nonlinear Kalman Filtering with Reparametrization Gradients

Authors: San Gultekin, Brendan Kitts, Aaron Flores, John Paisley

Abstract: We introduce a novel nonlinear Kalman filter that utilizes reparametrization gradients. The widely used parametric approximation is based on a jointly Gaussian assumption of the state-space model, which is in turn equivalent to minimizing an approximation to the Kullback-Leibler divergence. It is possible to obtain better approximations using the alpha divergence, but the resulting problem is subs… ▽ More We introduce a novel nonlinear Kalman filter that utilizes reparametrization gradients. The widely used parametric approximation is based on a jointly Gaussian assumption of the state-space model, which is in turn equivalent to minimizing an approximation to the Kullback-Leibler divergence. It is possible to obtain better approximations using the alpha divergence, but the resulting problem is substantially more complex. In this paper, we introduce an alternate formulation based on an energy function, which can be optimized instead of the alpha divergence. The optimization can be carried out using reparametrization gradients, a technique that has recently been utilized in a number of deep learning models. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2111.00666 [pdf, other]

Self-Verification in Image Denoising

Authors: Huangxing Lin, Yihong Zhuang, Delu Zeng, Yue Huang, Xinghao Ding, John Paisley

Abstract: We devise a new regularization, called self-verification, for image denoising. This regularization is formulated using a deep image prior learned by the network, rather than a traditional predefined prior. Specifically, we treat the output of the network as a ``prior'' that we denoise again after ``re-noising''. The comparison between the again denoised image and its prior can be interpreted as a… ▽ More We devise a new regularization, called self-verification, for image denoising. This regularization is formulated using a deep image prior learned by the network, rather than a traditional predefined prior. Specifically, we treat the output of the network as a ``prior'' that we denoise again after ``re-noising''. The comparison between the again denoised image and its prior can be interpreted as a self-verification of the network's denoising ability. We demonstrate that self-verification encourages the network to capture low-level image statistics needed to restore the image. Based on this self-verification regularization, we further show that the network can learn to denoise even if it has not seen any clean images. This learning strategy is self-supervised, and we refer to it as Self-Verification Image Denoising (SVID). SVID can be seen as a mixture of learning-based methods and traditional model-based denoising methods, in which regularization is adaptively formulated using the output of the network. We show the application of SVID to various denoising tasks using only observed corrupted data. It can achieve the denoising performance close to supervised CNNs. △ Less

Submitted 31 October, 2021; originally announced November 2021.

arXiv:2108.01727 [pdf, other]

Scalable Community Detection in Massive Networks Using Aggregated Relational Data

Authors: Timothy Jones, Owen G. Ward, Yiran Jiang, John Paisley, Tian Zheng

Abstract: The mixed membership stochastic blockmodel (MMSB) is a popular Bayesian network model for community detection. Fitting such large Bayesian network models quickly becomes computationally infeasible when the number of nodes grows into hundreds of thousands and millions. In this paper we propose a novel mini-batch strategy based on aggregated relational data that leverages nodal information to fit MM… ▽ More The mixed membership stochastic blockmodel (MMSB) is a popular Bayesian network model for community detection. Fitting such large Bayesian network models quickly becomes computationally infeasible when the number of nodes grows into hundreds of thousands and millions. In this paper we propose a novel mini-batch strategy based on aggregated relational data that leverages nodal information to fit MMSB to massive networks. We describe a scalable inference method that can utilize nodal information that often accompanies real-world networks. Conditioning on this extra information leads to a model that admits a parallel stochastic variational inference algorithm, utilizing stochastic gradients of bipartite graph formed from aggregated network ties between node subpopulations. We apply our method to a citation network with over two million nodes and 25 million edges, capturing explainable structure in this network. Our method recovers parameters and achieves better convergence on simulated networks generated according to the MMSB. △ Less

Submitted 23 May, 2024; v1 submitted 22 July, 2021; originally announced August 2021.

arXiv:2011.14512 [pdf, other]

Adaptive noise imitation for image denoising

Authors: Huangxing Lin, Yihong Zhuang, Yue Huang, Xinghao Ding, Yizhou Yu, Xiaoqing Liu, John Paisley

Abstract: The effectiveness of existing denoising algorithms typically relies on accurate pre-defined noise statistics or plenty of paired data, which limits their practicality. In this work, we focus on denoising in the more common case where noise statistics and paired data are unavailable. Considering that denoising CNNs require supervision, we develop a new \textbf{adaptive noise imitation (ADANI)} algo… ▽ More The effectiveness of existing denoising algorithms typically relies on accurate pre-defined noise statistics or plenty of paired data, which limits their practicality. In this work, we focus on denoising in the more common case where noise statistics and paired data are unavailable. Considering that denoising CNNs require supervision, we develop a new \textbf{adaptive noise imitation (ADANI)} algorithm that can synthesize noisy data from naturally noisy images. To produce realistic noise, a noise generator takes unpaired noisy/clean images as input, where the noisy image is a guide for noise generation. By imposing explicit constraints on the type, level and gradient of noise, the output noise of ADANI will be similar to the guided noise, while kee** the original clean background of the image. Coupling the noisy data output from ADANI with the corresponding ground-truth, a denoising CNN is then trained in a fully-supervised manner. Experiments show that the noisy data produced by ADANI are visually and statistically similar to real ones so that the denoising CNN in our method is competitive to other networks trained with external paired data. △ Less

Submitted 29 November, 2020; originally announced November 2020.

arXiv:2011.07365 [pdf, other]

Bayesian recurrent state space model for rs-fMRI

Authors: Arunesh Mittal, Scott Linderman, John Paisley, Paul Sajda

Abstract: We propose a hierarchical Bayesian recurrent state space model for modeling switching network connectivity in resting state fMRI data. Our model allows us to uncover shared network patterns across disease conditions. We evaluate our method on the ADNI2 dataset by inferring latent state patterns corresponding to altered neural circuits in individuals with Mild Cognitive Impairment (MCI). In additio… ▽ More We propose a hierarchical Bayesian recurrent state space model for modeling switching network connectivity in resting state fMRI data. Our model allows us to uncover shared network patterns across disease conditions. We evaluate our method on the ADNI2 dataset by inferring latent state patterns corresponding to altered neural circuits in individuals with Mild Cognitive Impairment (MCI). In addition to states shared across healthy and individuals with MCI, we discover latent states that are predominantly observed in individuals with MCI. Our model outperforms current state of the art deep learning method on ADNI2 dataset. △ Less

Submitted 14 November, 2020; originally announced November 2020.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

arXiv:2011.04770 [pdf, other]

Deep Bayesian Nonparametric Factor Analysis

Authors: Arunesh Mittal, Paul Sajda, John Paisley

Abstract: We propose a deep generative factor analysis model with beta process prior that can approximate complex non-factorial distributions over the latent codes. We outline a stochastic EM algorithm for scalable inference in a specific instantiation of this model and present some preliminary results. We propose a deep generative factor analysis model with beta process prior that can approximate complex non-factorial distributions over the latent codes. We outline a stochastic EM algorithm for scalable inference in a specific instantiation of this model and present some preliminary results. △ Less

Submitted 9 November, 2020; originally announced November 2020.

arXiv:1912.01158 [pdf, other]

Noise2Blur: Online Noise Extraction and Denoising

Authors: Huangxing Lin, Weihong Zeng, Xinghao Ding, Xueyang Fu, Yue Huang, John Paisley

Abstract: We propose a new framework called Noise2Blur (N2B) for training robust image denoising models without pre-collected paired noisy/clean images. The training of the model requires only some (or even one) noisy images, some random unpaired clean images, and noise-free but blurred labels obtained by predefined filtering of the noisy images. The N2B model consists of two parts: a denoising network and… ▽ More We propose a new framework called Noise2Blur (N2B) for training robust image denoising models without pre-collected paired noisy/clean images. The training of the model requires only some (or even one) noisy images, some random unpaired clean images, and noise-free but blurred labels obtained by predefined filtering of the noisy images. The N2B model consists of two parts: a denoising network and a noise extraction network. First, the noise extraction network learns to output a noise map using the noise information from the denoising network under the guidence of the blurred labels. Then, the noise map is added to a clean image to generate a new "noisy/clean" image pair. Using the new image pair, the denoising network learns to generate clean and high-quality images from noisy observations. These two networks are trained simultaneously and mutually aid each other to learn the map**s of noise to clean/blur. Experiments on several denoising tasks show that the denoising performance of N2B is close to that of other denoising CNNs trained with pre-collected paired data. △ Less

Submitted 14 May, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

arXiv:1912.00537 [pdf, other]

Risk Bounds for Low Cost Bipartite Ranking

Authors: San Gultekin, John Paisley

Abstract: Bipartite ranking is an important supervised learning problem; however, unlike regression or classification, it has a quadratic dependence on the number of samples. To circumvent the prohibitive sample cost, many recent work focus on stochastic gradient-based methods. In this paper we consider an alternative approach, which leverages the structure of the widely-adopted pairwise squared loss, to ob… ▽ More Bipartite ranking is an important supervised learning problem; however, unlike regression or classification, it has a quadratic dependence on the number of samples. To circumvent the prohibitive sample cost, many recent work focus on stochastic gradient-based methods. In this paper we consider an alternative approach, which leverages the structure of the widely-adopted pairwise squared loss, to obtain a stochastic and low cost algorithm that does not require stochastic gradients or learning rates. Using a novel uniform risk bound---based on matrix and vector concentration inequalities---we show that the sample size required for competitive performance against the all-pairs batch algorithm does not have a quadratic dependence. Generalization bounds for both the batch and low cost stochastic algorithms are presented. Experimental results show significant speed gain against the batch algorithm, as well as competitive performance against state-of-the-art bipartite ranking algorithms on real datasets. △ Less

Submitted 1 December, 2019; originally announced December 2019.

arXiv:1912.00144 [pdf, other]

Learning Rate Dropout

Authors: Huangxing Lin, Weihong Zeng, Xinghao Ding, Yue Huang, Chenxi Huang, John Paisley

Abstract: The performance of a deep neural network is highly dependent on its training, and finding better local optimal solutions is the goal of many optimization algorithms. However, existing optimization algorithms show a preference for descent paths that converge slowly and do not seek to avoid bad local optima. In this work, we propose Learning Rate Dropout (LRD), a simple gradient descent technique fo… ▽ More The performance of a deep neural network is highly dependent on its training, and finding better local optimal solutions is the goal of many optimization algorithms. However, existing optimization algorithms show a preference for descent paths that converge slowly and do not seek to avoid bad local optima. In this work, we propose Learning Rate Dropout (LRD), a simple gradient descent technique for training related to coordinate descent. LRD empirically aids the optimizer to actively explore in the parameter space by randomly setting some learning rates to zero; at each iteration, only parameters whose learning rate is not 0 are updated. As the learning rate of different parameters is dropped, the optimizer will sample a new loss descent path for the current update. The uncertainty of the descent path helps the model avoid saddle points and bad local minima. Experiments show that LRD is surprisingly effective in accelerating training while preventing overfitting. △ Less

Submitted 5 December, 2019; v1 submitted 30 November, 2019; originally announced December 2019.

arXiv:1911.04061 [pdf, other]

Accurate Uncertainty Estimation and Decomposition in Ensemble Learning

Authors: Jeremiah Zhe Liu, John Paisley, Marianthi-Anna Kioumourtzoglou, Brent Coull

Abstract: Ensemble learning is a standard approach to building machine learning systems that capture complex phenomena in real-world data. An important aspect of these systems is the complete and valid quantification of model uncertainty. We introduce a Bayesian nonparametric ensemble (BNE) approach that augments an existing ensemble model to account for different sources of model uncertainty. BNE augments… ▽ More Ensemble learning is a standard approach to building machine learning systems that capture complex phenomena in real-world data. An important aspect of these systems is the complete and valid quantification of model uncertainty. We introduce a Bayesian nonparametric ensemble (BNE) approach that augments an existing ensemble model to account for different sources of model uncertainty. BNE augments a model's prediction and distribution functions using Bayesian nonparametric machinery. It has a theoretical guarantee in that it robustly estimates the uncertainty patterns in the data distribution, and can decompose its overall predictive uncertainty into distinct components that are due to different sources of noise and error. We show that our method achieves accurate uncertainty estimates under complex observational noise, and illustrate its real-world utility in terms of uncertainty decomposition and model bias detection for an ensemble in predict air pollution exposures in Eastern Massachusetts, USA. △ Less

Submitted 10 November, 2019; originally announced November 2019.

arXiv:1906.05850 [pdf, other]

Reweighted Expectation Maximization

Authors: Adji B. Dieng, John Paisley

Abstract: Training deep generative models with maximum likelihood remains a challenge. The typical workaround is to use variational inference (VI) and maximize a lower bound to the log marginal likelihood of the data. Variational auto-encoders (VAEs) adopt this approach. They further amortize the cost of inference by using a recognition network to parameterize the variational family. Amortized VI scales app… ▽ More Training deep generative models with maximum likelihood remains a challenge. The typical workaround is to use variational inference (VI) and maximize a lower bound to the log marginal likelihood of the data. Variational auto-encoders (VAEs) adopt this approach. They further amortize the cost of inference by using a recognition network to parameterize the variational family. Amortized VI scales approximate posterior inference in deep generative models to large datasets. However it introduces an amortization gap and leads to approximate posteriors of reduced expressivity due to the problem known as posterior collapse. In this paper, we consider expectation maximization (EM) as a paradigm for fitting deep generative models. Unlike VI, EM directly maximizes the log marginal likelihood of the data. We rediscover the importance weighted auto-encoder (IWAE) as an instance of EM and propose a new EM-based algorithm for fitting deep generative models called reweighted expectation maximization (REM). REM learns better generative models than the IWAE by decoupling the learning dynamics of the generative model and the recognition network using a separate expressive proposal found by moment matching. We compared REM to the VAE and the IWAE on several density estimation benchmarks and found it leads to significantly better performance as measured by log-likelihood. △ Less

Submitted 10 August, 2019; v1 submitted 13 June, 2019; originally announced June 2019.

Comments: Code can be found at https://github.com/adjidieng/REM

arXiv:1905.03826 [pdf, other]

Random Function Priors for Correlation Modeling

Authors: Aonan Zhang, John Paisley

Abstract: The likelihood model of high dimensional data $X_n$ can often be expressed as $p(X_n|Z_n,θ)$, where $θ\mathrel{\mathop:}=(θ_k)_{k\in[K]}$ is a collection of hidden features shared across objects, indexed by $n$, and $Z_n$ is a non-negative factor loading vector with $K$ entries where $Z_{nk}$ indicates the strength of $θ_k$ used to express $X_n$. In this paper, we introduce random function priors… ▽ More The likelihood model of high dimensional data $X_n$ can often be expressed as $p(X_n|Z_n,θ)$, where $θ\mathrel{\mathop:}=(θ_k)_{k\in[K]}$ is a collection of hidden features shared across objects, indexed by $n$, and $Z_n$ is a non-negative factor loading vector with $K$ entries where $Z_{nk}$ indicates the strength of $θ_k$ used to express $X_n$. In this paper, we introduce random function priors for $Z_n$ for modeling correlations among its $K$ dimensions $Z_{n1}$ through $Z_{nK}$, which we call \textit{population random measure embedding} (PRME). Our model can be viewed as a generalized paintbox model~\cite{Broderick13} using random functions, and can be learned efficiently with neural networks via amortized variational inference. We derive our Bayesian nonparametric method by applying a representation theorem on separately exchangeable discrete random measures. △ Less

Submitted 12 May, 2019; v1 submitted 9 May, 2019; originally announced May 2019.

Comments: Accepted by ICML 2019. 13 pages, 6 figures

arXiv:1904.04605 [pdf, other]

doi 10.1109/TIP.2020.3005517

Rain O'er Me: Synthesizing real rain to derain with data distillation

Authors: Huangxing Lin, Yanlong Li, Xinghao Ding, Weihong Zeng, Yue Huang, John Paisley

Abstract: We present a supervised technique for learning to remove rain from images without using synthetic rain software. The method is based on a two-stage data distillation approach: 1) A rainy image is first paired with a coarsely derained version using on a simple filtering technique ("rain-to-clean"). 2) Then a clean image is randomly matched with the rainy soft-labeled pair. Through a shared deep neu… ▽ More We present a supervised technique for learning to remove rain from images without using synthetic rain software. The method is based on a two-stage data distillation approach: 1) A rainy image is first paired with a coarsely derained version using on a simple filtering technique ("rain-to-clean"). 2) Then a clean image is randomly matched with the rainy soft-labeled pair. Through a shared deep neural network, the rain that is removed from the first image is then added to the clean image to generate a second pair ("clean-to-rain"). The neural network simultaneously learns to map both images such that high resolution structure in the clean images can inform the deraining of the rainy images. Demonstrations show that this approach can address those visual characteristics of rain not easily synthesized by software in the usual way. △ Less

Submitted 9 April, 2019; v1 submitted 9 April, 2019; originally announced April 2019.

arXiv:1902.02384 [pdf, other]

Global Explanations of Neural Networks: Map** the Landscape of Predictions

Authors: Mark Ibrahim, Melissa Louie, Ceena Modarres, John Paisley

Abstract: A barrier to the wider adoption of neural networks is their lack of interpretability. While local explanation methods exist for one prediction, most global attributions still reduce neural network decisions to a single set of features. In response, we present an approach for generating global attributions called GAM, which explains the landscape of neural network predictions across subpopulations.… ▽ More A barrier to the wider adoption of neural networks is their lack of interpretability. While local explanation methods exist for one prediction, most global attributions still reduce neural network decisions to a single set of features. In response, we present an approach for generating global attributions called GAM, which explains the landscape of neural network predictions across subpopulations. GAM augments global explanations with the proportion of samples that each attribution best explains and specifies which samples are described by each attribution. Global explanations also have tunable granularity to detect more or fewer subpopulations. We demonstrate that GAM's global explanations 1) yield the known feature importances of simulated data, 2) match feature weights of interpretable statistical models on real data, and 3) are intuitive to practitioners through user studies. With more transparent predictions, GAM can help ensure neural network decisions are generated for the right reasons. △ Less

Submitted 6 February, 2019; originally announced February 2019.

Comments: published at ACM/AAAI AIES

arXiv:1812.09645 [pdf, other]

Mixed Membership Recurrent Neural Networks

Authors: Ghazal Fazelnia, Mark Ibrahim, Ceena Modarres, Kevin Wu, John Paisley

Abstract: Models for sequential data such as the recurrent neural network (RNN) often implicitly model a sequence as having a fixed time interval between observations and do not account for group-level effects when multiple sequences are observed. We propose a model for grouped sequential data based on the RNN that accounts for varying time intervals between observations in a sequence by learning a group-le… ▽ More Models for sequential data such as the recurrent neural network (RNN) often implicitly model a sequence as having a fixed time interval between observations and do not account for group-level effects when multiple sequences are observed. We propose a model for grouped sequential data based on the RNN that accounts for varying time intervals between observations in a sequence by learning a group-level base parameter to which each sequence can revert. Our approach is motivated by the mixed membership framework, and we show how it can be used for dynamic topic modeling in which the distribution on topics (not the topics themselves) are evolving in time. We demonstrate our approach on a dataset of 3.4 million online grocery shop** orders made by 206K customers. △ Less

Submitted 22 December, 2018; originally announced December 2018.

arXiv:1812.03350 [pdf, other]

Adaptive and Calibrated Ensemble Learning with Dependent Tail-free Process

Authors: Jeremiah Zhe Liu, John Paisley, Marianthi-Anna Kioumourtzoglou, Brent A. Coull

Abstract: Ensemble learning is a mainstay in modern data science practice. Conventional ensemble algorithms assigns to base models a set of deterministic, constant model weights that (1) do not fully account for variations in base model accuracy across subgroups, nor (2) provide uncertainty estimates for the ensemble prediction, which could result in mis-calibrated (i.e. precise but biased) predictions that… ▽ More Ensemble learning is a mainstay in modern data science practice. Conventional ensemble algorithms assigns to base models a set of deterministic, constant model weights that (1) do not fully account for variations in base model accuracy across subgroups, nor (2) provide uncertainty estimates for the ensemble prediction, which could result in mis-calibrated (i.e. precise but biased) predictions that could in turn negatively impact the algorithm performance in real-word applications. In this work, we present an adaptive, probabilistic approach to ensemble learning using dependent tail-free process as ensemble weight prior. Given input feature $\mathbf{x}$, our method optimally combines base models based on their predictive accuracy in the feature space $\mathbf{x} \in \mathcal{X}$, and provides interpretable uncertainty estimates both in model selection and in ensemble prediction. To encourage scalable and calibrated inference, we derive a structured variational inference algorithm that jointly minimize KL objective and the model's calibration score (i.e. Continuous Ranked Probability Score (CRPS)). We illustrate the utility of our method on both a synthetic nonlinear function regression task, and on the real-world application of spatio-temporal integration of particle pollution prediction models in New England. △ Less

Submitted 19 December, 2018; v1 submitted 8 December, 2018; originally announced December 2018.

Comments: Work-in-progress manuscript appeared at Bayesian Nonparametrics Workshop, Neural Information Processing Systems 2018

arXiv:1811.08632 [pdf, other]

A Deep Tree-Structured Fusion Model for Single Image Deraining

Authors: Xueyang Fu, Qi Qi, Yue Huang, Xinghao Ding, Feng Wu, John Paisley

Abstract: We propose a simple yet effective deep tree-structured fusion model based on feature aggregation for the deraining problem. We argue that by effectively aggregating features, a relatively simple network can still handle tough image deraining problems well. First, to capture the spatial structure of rain we use dilated convolutions as our basic network block. We then design a tree-structured fusion… ▽ More We propose a simple yet effective deep tree-structured fusion model based on feature aggregation for the deraining problem. We argue that by effectively aggregating features, a relatively simple network can still handle tough image deraining problems well. First, to capture the spatial structure of rain we use dilated convolutions as our basic network block. We then design a tree-structured fusion architecture which is deployed within each block (spatial information) and across all blocks (content information). Our method is based on the assumption that adjacent features contain redundant information. This redundancy obstructs generation of new representations and can be reduced by hierarchically fusing adjacent features. Thus, the proposed model is more compact and can effectively use spatial and content information. Experiments on synthetic and real-world datasets show that our network achieves better deraining results with fewer parameters. △ Less

Submitted 21 November, 2018; originally announced November 2018.

arXiv:1811.06471 [pdf, other]

Towards Explainable Deep Learning for Credit Lending: A Case Study

Authors: Ceena Modarres, Mark Ibrahim, Melissa Louie, John Paisley

Abstract: Deep learning adoption in the financial services industry has been limited due to a lack of model interpretability. However, several techniques have been proposed to explain predictions made by a neural network. We provide an initial investigation into these techniques for the assessment of credit risk with neural networks. Deep learning adoption in the financial services industry has been limited due to a lack of model interpretability. However, several techniques have been proposed to explain predictions made by a neural network. We provide an initial investigation into these techniques for the assessment of credit risk with neural networks. △ Less

Submitted 30 November, 2018; v1 submitted 15 November, 2018; originally announced November 2018.

Comments: Accepted NIPS 2018 FEAP

arXiv:1810.10850 [pdf, other]

An Adversarial Learning Approach to Medical Image Synthesis for Lesion Detection

Authors: Liyan Sun, Jiexiang Wang, Yue Huang, Xinghao Ding, Hayit Greenspan, John Paisley

Abstract: The identification of lesion within medical image data is necessary for diagnosis, treatment and prognosis. Segmentation and classification approaches are mainly based on supervised learning with well-paired image-level or voxel-level labels. However, labeling the lesion in medical images is laborious requiring highly specialized knowledge. We propose a medical image synthesis model named abnormal… ▽ More The identification of lesion within medical image data is necessary for diagnosis, treatment and prognosis. Segmentation and classification approaches are mainly based on supervised learning with well-paired image-level or voxel-level labels. However, labeling the lesion in medical images is laborious requiring highly specialized knowledge. We propose a medical image synthesis model named abnormal-to-normal translation generative adversarial network (ANT-GAN) to generate a normal-looking medical image based on its abnormal-looking counterpart without the need for paired training data. Unlike typical GANs, whose aim is to generate realistic samples with variations, our more restrictive model aims at producing a normal-looking image corresponding to one containing lesions, and thus requires a special design. Being able to provide a "normal" counterpart to a medical image can provide useful side information for medical imaging tasks like lesion segmentation or classification validated by our experiments. In the other aspect, the ANT-GAN model is also capable of producing highly realistic lesion-containing image corresponding to the healthy one, which shows the potential in data augmentation verified in our experiments. △ Less

Submitted 8 April, 2019; v1 submitted 25 October, 2018; originally announced October 2018.

Comments: 10 pages, 13 figures

arXiv:1810.04719 [pdf, other]

Fully Supervised Speaker Diarization

Authors: Aonan Zhang, Quan Wang, Zhenyao Zhu, John Paisley, Chong Wang

Abstract: In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker-discriminative embeddings (a.k.a. d-vectors) from input utterances, each individual speaker is modeled by a parameter-sharing RNN, while the RNN states for different speakers interleave in the time domain. This RNN is naturally in… ▽ More In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker-discriminative embeddings (a.k.a. d-vectors) from input utterances, each individual speaker is modeled by a parameter-sharing RNN, while the RNN states for different speakers interleave in the time domain. This RNN is naturally integrated with a distance-dependent Chinese restaurant process (ddCRP) to accommodate an unknown number of speakers. Our system is fully supervised and is able to learn from examples where time-stamped speaker labels are annotated. We achieved a 7.6% diarization error rate on NIST SRE 2000 CALLHOME, which is better than the state-of-the-art method using spectral clustering. Moreover, our method decodes in an online fashion while most state-of-the-art systems rely on offline clustering. △ Less

Submitted 19 February, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

Comments: Accepted by ICASSP 2019

arXiv:1805.11221 [pdf, other]

MBA: Mini-Batch AUC Optimization

Authors: San Gultekin, Avishek Saha, Adwait Ratnaparkhi, John Paisley

Abstract: Area under the receiver operating characteristics curve (AUC) is an important metric for a wide range of signal processing and machine learning problems, and scalable methods for optimizing AUC have recently been proposed. However, handling very large datasets remains an open challenge for this problem. This paper proposes a novel approach to AUC maximization, based on sampling mini-batches of pos… ▽ More Area under the receiver operating characteristics curve (AUC) is an important metric for a wide range of signal processing and machine learning problems, and scalable methods for optimizing AUC have recently been proposed. However, handling very large datasets remains an open challenge for this problem. This paper proposes a novel approach to AUC maximization, based on sampling mini-batches of positive/negative instance pairs and computing U-statistics to approximate a global risk minimization problem. The resulting algorithm is simple, fast, and learning-rate free. We show that the number of samples required for good performance is independent of the number of pairs available, which is a quadratic function of the positive and negative instances. Extensive experiments show the practical utility of the proposed method. △ Less

Submitted 31 May, 2018; v1 submitted 28 May, 2018; originally announced May 2018.

arXiv:1805.06173 [pdf, other]

Lightweight Pyramid Networks for Image Deraining

Authors: Xueyang Fu, Borong Liang, Yue Huang, Xinghao Ding, John Paisley

Abstract: Existing deep convolutional neural networks have found major success in image deraining, but at the expense of an enormous number of parameters. This limits their potential application, for example in mobile devices. In this paper, we propose a lightweight pyramid of networks (LPNet) for single image deraining. Instead of designing a complex network structures, we use domain-specific knowledge to… ▽ More Existing deep convolutional neural networks have found major success in image deraining, but at the expense of an enormous number of parameters. This limits their potential application, for example in mobile devices. In this paper, we propose a lightweight pyramid of networks (LPNet) for single image deraining. Instead of designing a complex network structures, we use domain-specific knowledge to simplify the learning process. Specifically, we find that by introducing the mature Gaussian-Laplacian image pyramid decomposition technology to the neural network, the learning problem at each pyramid level is greatly simplified and can be handled by a relatively shallow network with few parameters. We adopt recursive and residual network structures to build the proposed LPNet, which has less than 8K parameters while still achieving state-of-the-art performance on rain removal. We also discuss the potential value of LPNet for other low- and high-level vision tasks. △ Less

Submitted 16 May, 2018; originally announced May 2018.

Comments: submitted to IEEE Transactions on Neural Networks and Learning Systems

arXiv:1805.05638 [pdf, other]

Ro-SOS: Metric Expression Network (MEnet) for Robust Salient Object Segmentation

Authors: Delu Zeng, Yixuan He, Li Liu, Zhihong Chen, Jiabin Huang, Jie Chen, John Paisley

Abstract: Although deep CNNs have brought significant improvement to image saliency detection, most CNN based models are sensitive to distortion such as compression and noise. In this paper, we propose an end-to-end generic salient object segmentation model called Metric Expression Network (MEnet) to deal with saliency detection with the tolerance of distortion. Within MEnet, a new topological metric space… ▽ More Although deep CNNs have brought significant improvement to image saliency detection, most CNN based models are sensitive to distortion such as compression and noise. In this paper, we propose an end-to-end generic salient object segmentation model called Metric Expression Network (MEnet) to deal with saliency detection with the tolerance of distortion. Within MEnet, a new topological metric space is constructed, whose implicit metric is determined by the deep network. As a result, we manage to group all the pixels in the observed image semantically within this latent space into two regions: a salient region and a non-salient region. With this architecture, all feature extractions are carried out at the pixel level, enabling fine granularity of output boundaries of the salient objects. What's more, we try to give a general analysis for the noise robustness of the network in the sense of Lipschitz and Jacobian literature. Experiments demonstrate that robust salient maps facilitating object segmentation can be generated by the proposed metric. Tests on several public benchmarks show that MEnet has achieved desirable performance. Furthermore, by direct computation and measuring the robustness, the proposed method outperforms previous CNN-based methods on distorted inputs. △ Less

Submitted 21 January, 2020; v1 submitted 15 May, 2018; originally announced May 2018.

Comments: This version: 11 pages (12 with reference), 12 figures, 5 table; Version 1: 7 pages,7 figures, 4 tables; The paper for version 1 has been accepted by International Joint Conference on Artificial Intelligence (IJCAI),2018

arXiv:1805.02165 [pdf, other]

Joint CS-MRI Reconstruction and Segmentation with a Unified Deep Network

Authors: Liyan Sun, Zhiwen Fan, Yue Huang, Xinghao Ding, John Paisley

Abstract: The need for fast acquisition and automatic analysis of MRI data is growing in the age of big data. Although compressed sensing magnetic resonance imaging (CS-MRI) has been studied to accelerate MRI by reducing k-space measurements, in current CS-MRI techniques MRI applications such as segmentation are overlooked when doing image reconstruction. In this paper, we test the utility of CS-MRI methods… ▽ More The need for fast acquisition and automatic analysis of MRI data is growing in the age of big data. Although compressed sensing magnetic resonance imaging (CS-MRI) has been studied to accelerate MRI by reducing k-space measurements, in current CS-MRI techniques MRI applications such as segmentation are overlooked when doing image reconstruction. In this paper, we test the utility of CS-MRI methods in automatic segmentation models and propose a unified deep neural network architecture called SegNetMRI which we apply to the combined CS-MRI reconstruction and segmentation problem. SegNetMRI is built upon a MRI reconstruction network with multiple cascaded blocks each containing an encoder-decoder unit and a data fidelity unit, and MRI segmentation networks having the same encoder-decoder structure. The two subnetworks are pre-trained and fine-tuned with shared reconstruction encoders. The outputs are merged into the final segmentation. Our experiments show that SegNetMRI can improve both the reconstruction and segmentation performance when using compressive measurements. △ Less

Submitted 6 May, 2018; originally announced May 2018.

Comments: 8 pages, 6 figures, 3 tables

arXiv:1804.03596 [pdf, other]

doi 10.1109/TIP.2019.2925288

A Deep Information Sharing Network for Multi-contrast Compressed Sensing MRI Reconstruction

Authors: Liyan Sun, Zhiwen Fan, Yue Huang, Xinghao Ding, John Paisley

Abstract: In multi-contrast magnetic resonance imaging (MRI), compressed sensing theory can accelerate imaging by sampling fewer measurements within each contrast. The conventional optimization-based models suffer several limitations: strict assumption of shared sparse support, time-consuming optimization and "shallow" models with difficulties in encoding the rich patterns hiding in massive MRI data. In thi… ▽ More In multi-contrast magnetic resonance imaging (MRI), compressed sensing theory can accelerate imaging by sampling fewer measurements within each contrast. The conventional optimization-based models suffer several limitations: strict assumption of shared sparse support, time-consuming optimization and "shallow" models with difficulties in encoding the rich patterns hiding in massive MRI data. In this paper, we propose the first deep learning model for multi-contrast MRI reconstruction. We achieve information sharing through feature sharing units, which significantly reduces the number of parameters. The feature sharing unit is combined with a data fidelity unit to comprise an inference block. These inference blocks are cascaded with dense connections, which allows for information transmission across different depths of the network efficiently. Our extensive experiments on various multi-contrast MRI datasets show that proposed model outperforms both state-of-the-art single-contrast and multi-contrast MRI methods in accuracy and efficiency. We show the improved reconstruction quality can bring great benefits for the later medical image analysis stage. Furthermore, the robustness of the proposed model to the non-registration environment shows its potential in real MRI applications. △ Less

Submitted 10 April, 2018; originally announced April 2018.

Comments: 13 pages, 16 figures, 3 tables

arXiv:1804.01210 [pdf, other]

A Segmentation-aware Deep Fusion Network for Compressed Sensing MRI

Authors: Zhiwen Fan, Liyan Sun, Xinghao Ding, Yue Huang, Congbo Cai, John Paisley

Abstract: Compressed sensing MRI is a classic inverse problem in the field of computational imaging, accelerating the MR imaging by measuring less k-space data. The deep neural network models provide the stronger representation ability and faster reconstruction compared with "shallow" optimization-based methods. However, in the existing deep-based CS-MRI models, the high-level semantic supervision informati… ▽ More Compressed sensing MRI is a classic inverse problem in the field of computational imaging, accelerating the MR imaging by measuring less k-space data. The deep neural network models provide the stronger representation ability and faster reconstruction compared with "shallow" optimization-based methods. However, in the existing deep-based CS-MRI models, the high-level semantic supervision information from massive segmentation-labels in MRI dataset is overlooked. In this paper, we proposed a segmentation-aware deep fusion network called SADFN for compressed sensing MRI. The multilayer feature aggregation (MLFA) method is introduced here to fuse all the features from different layers in the segmentation network. Then, the aggregated feature maps containing semantic information are provided to each layer in the reconstruction network with a feature fusion strategy. This guarantees the reconstruction network is aware of the different regions in the image it reconstructs, simplifying the function map**. We prove the utility of the cross-layer and cross-task information fusion strategy by comparative study. Extensive experiments on brain segmentation benchmark MRBrainS validated that the proposed SADFN model achieves state-of-the-art accuracy in compressed sensing MRI. This paper provides a novel approach to guide the low-level visual task using the information from mid- or high-level task. △ Less

Submitted 3 April, 2018; originally announced April 2018.

arXiv:1803.09909 [pdf, other]

A Divide-and-Conquer Approach to Compressed Sensing MRI

Authors: Liyan Sun, Zhiwen Fan, Xinghao Ding, Congbo Cai, Yue Huang, John Paisley

Abstract: Compressed sensing (CS) theory assures us that we can accurately reconstruct magnetic resonance images using fewer k-space measurements than the Nyquist sampling rate requires. In traditional CS-MRI inversion methods, the fact that the energy within the Fourier measurement domain is distributed non-uniformly is often neglected during reconstruction. As a result, more densely sampled low-frequency… ▽ More Compressed sensing (CS) theory assures us that we can accurately reconstruct magnetic resonance images using fewer k-space measurements than the Nyquist sampling rate requires. In traditional CS-MRI inversion methods, the fact that the energy within the Fourier measurement domain is distributed non-uniformly is often neglected during reconstruction. As a result, more densely sampled low-frequency information tends to dominate penalization schemes for reconstructing MRI at the expense of high-frequency details. In this paper, we propose a new framework for CS-MRI inversion in which we decompose the observed k-space data into "subspaces" via sets of filters in a lossless way, and reconstruct the images in these various spaces individually using off-the-shelf algorithms. We then fuse the results to obtain the final reconstruction. In this way we are able to focus reconstruction on frequency information within the entire k-space more equally, preserving both high and low frequency details. We demonstrate that the proposed framework is competitive with state-of-the-art methods in CS-MRI in terms of quantitative performance, and often improves an algorithm's results qualitatively compared with it's direct application to k-space. △ Less

Submitted 27 March, 2018; originally announced March 2018.

Comments: 37 pages, 20 figures, 2 tables

arXiv:1803.08763 [pdf, other]

A Deep Error Correction Network for Compressed Sensing MRI

Authors: Liyan Sun, Zhiwen Fan, Yue Huang, Xinghao Ding, John Paisley

Abstract: Compressed sensing for magnetic resonance imaging (CS-MRI) exploits image sparsity properties to reconstruct MRI from very few Fourier k-space measurements. The goal is to minimize any structural errors in the reconstruction that could have a negative impact on its diagnostic quality. To this end, we propose a deep error correction network (DECN) for CS-MRI. The DECN model consists of three parts,… ▽ More Compressed sensing for magnetic resonance imaging (CS-MRI) exploits image sparsity properties to reconstruct MRI from very few Fourier k-space measurements. The goal is to minimize any structural errors in the reconstruction that could have a negative impact on its diagnostic quality. To this end, we propose a deep error correction network (DECN) for CS-MRI. The DECN model consists of three parts, which we refer to as modules: a guide, or template, module, an error correction module, and a data fidelity module. Existing CS-MRI algorithms can serve as the template module for guiding the reconstruction. Using this template as a guide, the error correction module learns a convolutional neural network (CNN) to map the k-space data in a way that adjusts for the reconstruction error of the template image. Our experimental results show the proposed DECN CS-MRI reconstruction framework can considerably improve upon existing inversion algorithms by supplementing with an error-correcting CNN. △ Less

Submitted 23 March, 2018; originally announced March 2018.

Comments: 7 pages, 7 figures, 2 tables

arXiv:1712.08734 [pdf, other]

Online Forecasting Matrix Factorization

Authors: San Gultekin, John Paisley

Abstract: In this paper the problem of forecasting high dimensional time series is considered. Such time series can be modeled as matrices where each column denotes a measurement. In addition, when missing values are present, low rank matrix factorization approaches are suitable for predicting future values. This paper formally defines and analyzes the forecasting problem in the online setting, i.e. where t… ▽ More In this paper the problem of forecasting high dimensional time series is considered. Such time series can be modeled as matrices where each column denotes a measurement. In addition, when missing values are present, low rank matrix factorization approaches are suitable for predicting future values. This paper formally defines and analyzes the forecasting problem in the online setting, i.e. where the data arrives as a stream and only a single pass is allowed. We present and analyze novel matrix factorization techniques which can learn low-dimensional embeddings effectively in an online manner. Based on these embeddings a recursive minimum mean square error estimator is derived, which learns an autoregressive model on them. Experiments with two real datasets with tens of millions of measurements show the benefits of the proposed approach. △ Less

Submitted 23 December, 2017; originally announced December 2017.

arXiv:1707.00260 [pdf, ps, other]

Location Dependent Dirichlet Processes

Authors: Shiliang Sun, John Paisley, Qiuyang Liu

Abstract: Dirichlet processes (DP) are widely applied in Bayesian nonparametric modeling. However, in their basic form they do not directly integrate dependency information among data arising from space and time. In this paper, we propose location dependent Dirichlet processes (LDDP) which incorporate nonparametric Gaussian processes in the DP modeling framework to model such dependencies. We develop the LD… ▽ More Dirichlet processes (DP) are widely applied in Bayesian nonparametric modeling. However, in their basic form they do not directly integrate dependency information among data arising from space and time. In this paper, we propose location dependent Dirichlet processes (LDDP) which incorporate nonparametric Gaussian processes in the DP modeling framework to model such dependencies. We develop the LDDP in the context of mixture modeling, and develop a mean field variational inference algorithm for this mixture model. The effectiveness of the proposed modeling framework is shown on an image segmentation task. △ Less

Submitted 2 July, 2017; originally announced July 2017.

arXiv:1705.00727 [pdf, other]

doi 10.1109/TIP.2018.2799324

Hyperspectral Image Classification with Markov Random Fields and a Convolutional Neural Network

Authors: Xiangyong Cao, Feng Zhou, Lin Xu, Deyu Meng, Zongben Xu, John Paisley

Abstract: This paper presents a new supervised classification algorithm for remotely sensed hyperspectral image (HSI) which integrates spectral and spatial information in a unified Bayesian framework. First, we formulate the HSI classification problem from a Bayesian perspective. Then, we adopt a convolutional neural network (CNN) to learn the posterior class distributions using a patch-wise training strate… ▽ More This paper presents a new supervised classification algorithm for remotely sensed hyperspectral image (HSI) which integrates spectral and spatial information in a unified Bayesian framework. First, we formulate the HSI classification problem from a Bayesian perspective. Then, we adopt a convolutional neural network (CNN) to learn the posterior class distributions using a patch-wise training strategy to better use the spatial information. Next, spatial information is further considered by placing a spatial smoothness prior on the labels. Finally, we iteratively update the CNN parameters using stochastic gradient decent (SGD) and update the class labels of all pixel vectors using an alpha-expansion min-cut-based algorithm. Compared with other state-of-the-art methods, the proposed classification method achieves better performance on one synthetic dataset and two benchmark HSI datasets in a number of experimental settings. △ Less

Submitted 12 November, 2017; v1 submitted 1 May, 2017; originally announced May 2017.

arXiv:1611.01702 [pdf, other]

TopicRNN: A Recurrent Neural Network with Long-Range Semantic Dependency

Authors: Adji B. Dieng, Chong Wang, Jianfeng Gao, John Paisley

Abstract: In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. Because of their sequential nature, RNNs are good at capturing the local structure of a word sequence - both semantic and syntactic - but might face difficulty remembering long-range dependencies. Intuitiv… ▽ More In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. Because of their sequential nature, RNNs are good at capturing the local structure of a word sequence - both semantic and syntactic - but might face difficulty remembering long-range dependencies. Intuitively, these long-range dependencies are of semantic nature. In contrast, latent topic models are able to capture the global underlying semantic structure of a document but do not account for word ordering. The proposed TopicRNN model integrates the merits of RNNs and latent topic models: it captures local (syntactic) dependencies using an RNN and global (semantic) dependencies using latent topics. Unlike previous work on contextual RNN language modeling, our model is learned end-to-end. Empirical results on word prediction show that TopicRNN outperforms existing contextual RNN baselines. In addition, TopicRNN can be used as an unsupervised feature extractor for documents. We do this for sentiment analysis on the IMDB movie review dataset and report an error rate of $6.28\%$. This is comparable to the state-of-the-art $5.91\%$ resulting from a semi-supervised approach. Finally, TopicRNN also yields sensible topics, making it a useful alternative to document models such as latent Dirichlet allocation. △ Less

Submitted 26 February, 2017; v1 submitted 5 November, 2016; originally announced November 2016.

Comments: International Conference on Learning Representations

arXiv:1611.00328 [pdf, other]

Variational Inference via $χ$-Upper Bound Minimization

Authors: Adji B. Dieng, Dustin Tran, Rajesh Ranganath, John Paisley, David M. Blei

Abstract: Variational inference (VI) is widely used as an efficient alternative to Markov chain Monte Carlo. It posits a family of approximating distributions $q$ and finds the closest member to the exact posterior $p$. Closeness is usually measured via a divergence $D(q || p)$ from $q$ to $p$. While successful, this approach also has problems. Notably, it typically leads to underestimation of the posterior… ▽ More Variational inference (VI) is widely used as an efficient alternative to Markov chain Monte Carlo. It posits a family of approximating distributions $q$ and finds the closest member to the exact posterior $p$. Closeness is usually measured via a divergence $D(q || p)$ from $q$ to $p$. While successful, this approach also has problems. Notably, it typically leads to underestimation of the posterior variance. In this paper we propose CHIVI, a black-box variational inference algorithm that minimizes $D_χ(p || q)$, the $χ$-divergence from $p$ to $q$. CHIVI minimizes an upper bound of the model evidence, which we term the $χ$ upper bound (CUBO). Minimizing the CUBO leads to improved posterior uncertainty, and it can also be used with the classical VI lower bound (ELBO) to provide a sandwich estimate of the model evidence. We study CHIVI on three models: probit regression, Gaussian process classification, and a Cox process model of basketball plays. When compared to expectation propagation and classical VI, CHIVI produces better error rates and more accurate estimates of posterior variance. △ Less

Submitted 12 November, 2017; v1 submitted 1 November, 2016; originally announced November 2016.

Comments: Neural Information Processing Systems, 2017

arXiv:1609.02087 [pdf, other]

doi 10.1109/TIP.2017.2691802

Clearing the Skies: A deep network architecture for single-image rain removal

Authors: Xueyang Fu, Jiabin Huang, Xinghao Ding, Yinghao Liao, John Paisley

Abstract: We introduce a deep network architecture called DerainNet for removing rain streaks from an image. Based on the deep convolutional neural network (CNN), we directly learn the map** relationship between rainy and clean image detail layers from data. Because we do not possess the ground truth corresponding to real-world rainy images, we synthesize images with rain for training. In contrast to othe… ▽ More We introduce a deep network architecture called DerainNet for removing rain streaks from an image. Based on the deep convolutional neural network (CNN), we directly learn the map** relationship between rainy and clean image detail layers from data. Because we do not possess the ground truth corresponding to real-world rainy images, we synthesize images with rain for training. In contrast to other common strategies that increase depth or breadth of the network, we use image processing domain knowledge to modify the objective function and improve deraining with a modestly-sized CNN. Specifically, we train our DerainNet on the detail (high-pass) layer rather than in the image domain. Though DerainNet is trained on synthetic data, we find that the learned network translates very effectively to real-world images for testing. Moreover, we augment the CNN framework with image enhancement to improve the visual results. Compared with state-of-the-art single image de-raining methods, our method has improved rain removal and much faster computation time after network training. △ Less

Submitted 6 February, 2017; v1 submitted 7 September, 2016; originally announced September 2016.

arXiv:1506.03493 [pdf, other]

Bayesian Poisson Tensor Factorization for Inferring Multilateral Relations from Sparse Dyadic Event Counts

Authors: Aaron Schein, John Paisley, David M. Blei, Hanna Wallach

Abstract: We present a Bayesian tensor factorization model for inferring latent group structures from dynamic pairwise interaction patterns. For decades, political scientists have collected and analyzed records of the form "country $i$ took action $a$ toward country $j$ at time $t$"---known as dyadic events---in order to form and test theories of international relations. We represent these event data as a t… ▽ More We present a Bayesian tensor factorization model for inferring latent group structures from dynamic pairwise interaction patterns. For decades, political scientists have collected and analyzed records of the form "country $i$ took action $a$ toward country $j$ at time $t$"---known as dyadic events---in order to form and test theories of international relations. We represent these event data as a tensor of counts and develop Bayesian Poisson tensor factorization to infer a low-dimensional, interpretable representation of their salient patterns. We demonstrate that our model's predictive performance is better than that of standard non-negative tensor factorization methods. We also provide a comparison of our variational updates to their maximum likelihood counterparts. In doing so, we identify a better way to form point estimates of the latent factors than that typically used in Bayesian Poisson matrix factorization. Finally, we showcase our model as an exploratory analysis tool for political scientists. We show that the inferred latent factor matrices capture interpretable multilateral relations that both conform to and inform our knowledge of international affairs. △ Less

Submitted 10 June, 2015; originally announced June 2015.

Comments: To appear in Proceedings of the 21st ACM SIGKDD Conference of Knowledge Discovery and Data Mining (KDD 2015)

arXiv:1501.05624 [pdf, other]

A Collaborative Kalman Filter for Time-Evolving Dyadic Processes

Authors: San Gultekin, John Paisley

Abstract: We present the collaborative Kalman filter (CKF), a dynamic model for collaborative filtering and related factorization models. Using the matrix factorization approach to collaborative filtering, the CKF accounts for time evolution by modeling each low-dimensional latent embedding as a multidimensional Brownian motion. Each observation is a random variable whose distribution is parameterized by th… ▽ More We present the collaborative Kalman filter (CKF), a dynamic model for collaborative filtering and related factorization models. Using the matrix factorization approach to collaborative filtering, the CKF accounts for time evolution by modeling each low-dimensional latent embedding as a multidimensional Brownian motion. Each observation is a random variable whose distribution is parameterized by the dot product of the relevant Brownian motions at that moment in time. This is naturally interpreted as a Kalman filter with multiple interacting state space vectors. We also present a method for learning a dynamically evolving drift parameter for each location by modeling it as a geometric Brownian motion. We handle posterior intractability via a mean-field variational approximation, which also preserves tractability for downstream calculations in a manner similar to the Kalman filter. We evaluate the model on several large datasets, providing quantitative evaluation on the 10 million Movielens and 100 million Netflix datasets and qualitative evaluation on a set of 39 million stock returns divided across roughly 6,500 companies from the years 1962-2014. △ Less

Submitted 22 January, 2015; originally announced January 2015.

Comments: Appeared at 2014 IEEE International Conference on Data Mining (ICDM)

arXiv:1302.2712 [pdf, other]

doi 10.1109/TIP.2014.2360122

Bayesian Nonparametric Dictionary Learning for Compressed Sensing MRI

Authors: Yue Huang, John Paisley, Qin Lin, Xinghao Ding, Xueyang Fu, ** Zhang

Abstract: We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRI) from highly undersampled k-space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and… ▽ More We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRI) from highly undersampled k-space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and the patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov Chain Monte Carlo (MCMC) for the Bayesian model, and use the alternating direction method of multipliers (ADMM) for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods. △ Less

Submitted 26 July, 2014; v1 submitted 12 February, 2013; originally announced February 2013.

arXiv:1210.6738 [pdf, other]

doi 10.1109/TPAMI.2014.2318728

Nested Hierarchical Dirichlet Processes

Authors: John Paisley, Chong Wang, David M. Blei, Michael I. Jordan

Abstract: We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express… ▽ More We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We derive a stochastic variational inference algorithm for the model, in addition to a greedy subtree selection method for each document, which allows for efficient inference using massive collections of text documents. We demonstrate our algorithm on 1.8 million documents from The New York Times and 3.3 million documents from Wikipedia. △ Less

Submitted 2 May, 2014; v1 submitted 25 October, 2012; originally announced October 2012.

Comments: To appear in IEEE Transactions on Pattern Analysis and Machine Intelligence, Special Issue on Bayesian Nonparametrics

arXiv:1206.7051 [pdf, ps, other]

Stochastic Variational Inference

Authors: Matt Hoffman, David M. Blei, Chong Wang, John Paisley

Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of docu… ▽ More We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets. △ Less

Submitted 22 April, 2013; v1 submitted 29 June, 2012; originally announced June 2012.

arXiv:1206.6430 [pdf]

Variational Bayesian Inference with Stochastic Search

Authors: John Paisley, David Blei, Michael Jordan

Abstract: Mean-field variational inference is a method for approximate Bayesian posterior inference. It approximates a full posterior distribution with a factorized set of distributions by maximizing a lower bound on the marginal likelihood. This requires the ability to integrate a sum of terms in the log joint likelihood using this factorized distribution. Often not all integrals are in closed form, which… ▽ More Mean-field variational inference is a method for approximate Bayesian posterior inference. It approximates a full posterior distribution with a factorized set of distributions by maximizing a lower bound on the marginal likelihood. This requires the ability to integrate a sum of terms in the log joint likelihood using this factorized distribution. Often not all integrals are in closed form, which is typically handled by using a lower bound. We present an alternative algorithm based on stochastic optimization that allows for direct optimization of the variational lower bound. This method uses control variates to reduce the variance of the stochastic search gradient, in which existing lower bounds can play an important role. We demonstrate the approach on two non-conjugate models: logistic regression and an approximation to the HDP. △ Less

Submitted 27 June, 2012; originally announced June 2012.

Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Showing 1–43 of 43 results for author: Paisley, J