-
Hybrid$^2$ Neural ODE Causal Modeling and an Application to Glycemic Response
Authors:
Bob Junyi Zou,
Matthew E. Levine,
Dessi P. Zaharieva,
Ramesh Johari,
Emily B. Fox
Abstract:
Hybrid models composing mechanistic ODE-based dynamics with flexible and expressive neural network components have grown rapidly in popularity, especially in scientific domains where such ODE-based modeling offers important interpretability and validated causal grounding (e.g., for counterfactual reasoning). The incorporation of mechanistic models also provides inductive bias in standard blackbox…
▽ More
Hybrid models composing mechanistic ODE-based dynamics with flexible and expressive neural network components have grown rapidly in popularity, especially in scientific domains where such ODE-based modeling offers important interpretability and validated causal grounding (e.g., for counterfactual reasoning). The incorporation of mechanistic models also provides inductive bias in standard blackbox modeling approaches, critical when learning from small datasets or partially observed, complex systems. Unfortunately, as the hybrid models become more flexible, the causal grounding provided by the mechanistic model can quickly be lost. We address this problem by leveraging another common source of domain knowledge: \emph{ranking} of treatment effects for a set of interventions, even if the precise treatment effect is unknown. We encode this information in a \emph{causal loss} that we combine with the standard predictive loss to arrive at a \emph{hybrid loss} that biases our learning towards causally valid hybrid models. We demonstrate our ability to achieve a win-win, state-of-the-art predictive performance \emph{and} causal validity, in the challenging task of modeling glucose dynamics post-exercise in individuals with type 1 diabetes.
△ Less
Submitted 11 June, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild
Authors:
Ke Alexander Wang,
Emily B. Fox
Abstract:
Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose…
▽ More
Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose a hybrid variational autoencoder to learn interpretable representations of CGM and meal data. Our method grounds the latent space to the inputs of a mechanistic differential equation, producing embeddings that reflect physiological quantities, such as insulin sensitivity, glucose effectiveness, and basal glucose levels. Moreover, we introduce a novel method to infer the glucose appearance rate, making the mechanistic model robust to unreliable meal logs. On a dataset of CGM and self-reported meals from individuals with type-2 diabetes and pre-diabetes, our unsupervised representation discovers a separation between individuals proportional to their disease severity. Our embeddings produce clusters that are up to 4x better than naive, expert, black-box, and pure mechanistic features. Our method provides a nuanced, yet interpretable, embedding space to compare glycemic control within and across individuals, directly learnable from in-the-wild data.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Sequence Modeling with Multiresolution Convolutional Memory
Authors:
Jiaxin Shi,
Ke Alexander Wang,
Emily B. Fox
Abstract:
Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural n…
▽ More
Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural networks, or the parameter burden of convolutional networks with many or large filters. We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer. The key component of our model is the multiresolution convolution, capturing multiscale trends in the input sequence. Our MultiresConv can be implemented with shared filters across a dilated causal convolution tree. Thus it garners the computational advantages of convolutional networks and the principled theoretical motivation of wavelet decompositions. Our MultiresLayer is straightforward to implement, requires significantly fewer parameters, and maintains at most a $\mathcal{O}(N\log N)$ memory footprint for a length $N$ sequence. Yet, by stacking such layers, our model yields state-of-the-art performance on a number of sequence classification and autoregressive density estimation tasks using CIFAR-10, ListOps, and PTB-XL datasets.
△ Less
Submitted 1 November, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Granger Causality: A Review and Recent Advances
Authors:
Ali Shojaie,
Emily B. Fox
Abstract:
Introduced more than a half century ago, Granger causality has become a popular tool for analyzing time series data in many application domains, from economics and finance to genomics and neuroscience. Despite this popularity, the validity of this notion for inferring causal relationships among time series has remained the topic of continuous debate. Moreover, while the original definition was gen…
▽ More
Introduced more than a half century ago, Granger causality has become a popular tool for analyzing time series data in many application domains, from economics and finance to genomics and neuroscience. Despite this popularity, the validity of this notion for inferring causal relationships among time series has remained the topic of continuous debate. Moreover, while the original definition was general, limitations in computational tools have primarily limited the applications of Granger causality to simple bivariate vector auto-regressive processes or pairwise relationships among a set of variables. Starting with a review of early developments and debates, this paper discusses recent advances that address various shortcomings of the earlier approaches, from models for high-dimensional time series to more recent developments that account for nonlinear and non-Gaussian observations and allow for sub-sampled and mixed frequency time series.
△ Less
Submitted 6 May, 2021; v1 submitted 5 May, 2021;
originally announced May 2021.
-
Model-based metrics: Sample-efficient estimates of predictive model subpopulation performance
Authors:
Andrew C. Miller,
Leon A. Gatys,
Joseph Futoma,
Emily B. Fox
Abstract:
Machine learning models $-$ now commonly developed to screen, diagnose, or predict health conditions $-$ are evaluated with a variety of performance metrics. An important first step in assessing the practical utility of a model is to evaluate its average performance over an entire population of interest. In many settings, it is also critical that the model makes good predictions within predefined…
▽ More
Machine learning models $-$ now commonly developed to screen, diagnose, or predict health conditions $-$ are evaluated with a variety of performance metrics. An important first step in assessing the practical utility of a model is to evaluate its average performance over an entire population of interest. In many settings, it is also critical that the model makes good predictions within predefined subpopulations. For instance, showing that a model is fair or equitable requires evaluating the model's performance in different demographic subgroups. However, subpopulation performance metrics are typically computed using only data from that subgroup, resulting in higher variance estimates for smaller groups. We devise a procedure to measure subpopulation performance that can be more sample-efficient than the typical subsample estimates. We propose using an evaluation model $-$ a model that describes the conditional distribution of the predictive model score $-$ to form model-based metric (MBM) estimates. Our procedure incorporates model checking and validation, and we propose a computationally efficient approximation of the traditional nonparametric bootstrap to form confidence intervals. We evaluate MBMs on two main tasks: a semi-synthetic setting where ground truth metrics are available and a real-world hospital readmission prediction task. We find that MBMs consistently produce more accurate and lower variance estimates of model performance for small subpopulations.
△ Less
Submitted 25 April, 2021;
originally announced April 2021.
-
Breiman's two cultures: You don't have to choose sides
Authors:
Andrew C. Miller,
Nicholas J. Foti,
Emily B. Fox
Abstract:
Breiman's classic paper casts data analysis as a choice between two cultures: data modelers and algorithmic modelers. Stated broadly, data modelers use simple, interpretable models with well-understood theoretical properties to analyze data. Algorithmic modelers prioritize predictive accuracy and use more flexible function approximations to analyze data. This dichotomy overlooks a third set of mod…
▽ More
Breiman's classic paper casts data analysis as a choice between two cultures: data modelers and algorithmic modelers. Stated broadly, data modelers use simple, interpretable models with well-understood theoretical properties to analyze data. Algorithmic modelers prioritize predictive accuracy and use more flexible function approximations to analyze data. This dichotomy overlooks a third set of models $-$ mechanistic models derived from scientific theories (e.g., ODE/SDE simulators). Mechanistic models encode application-specific scientific knowledge about the data. And while these categories represent extreme points in model space, modern computational and algorithmic tools enable us to interpolate between these points, producing flexible, interpretable, and scientifically-informed hybrids that can enjoy accurate and robust predictions, and resolve issues with data analysis that Breiman describes, such as the Rashomon effect and Occam's dilemma. Challenges still remain in finding an appropriate point in model space, with many choices on how to compose model components and the degree to which each component informs inferences.
△ Less
Submitted 25 April, 2021;
originally announced April 2021.
-
Representing and Denoising Wearable ECG Recordings
Authors:
Jeffrey Chan,
Andrew C. Miller,
Emily B. Fox
Abstract:
Modern wearable devices are embedded with a range of noninvasive biomarker sensors that hold promise for improving detection and treatment of disease. One such sensor is the single-lead electrocardiogram (ECG) which measures electrical signals in the heart. The benefits of the sheer volume of ECG measurements with rich longitudinal structure made possible by wearables come at the price of potentia…
▽ More
Modern wearable devices are embedded with a range of noninvasive biomarker sensors that hold promise for improving detection and treatment of disease. One such sensor is the single-lead electrocardiogram (ECG) which measures electrical signals in the heart. The benefits of the sheer volume of ECG measurements with rich longitudinal structure made possible by wearables come at the price of potentially noisier measurements compared to clinical ECGs, e.g., due to movement. In this work, we develop a statistical model to simulate a structured noise process in ECGs derived from a wearable sensor, design a beat-to-beat representation that is conducive for analyzing variation, and devise a factor analysis-based method to denoise the ECG. We study synthetic data generated using a realistic ECG simulator and a structured noise model. At varying levels of signal-to-noise, we quantitatively measure an upper bound on performance and compare estimates from linear and non-linear models. Finally, we apply our method to a set of ECGs collected by wearables in a mobile health study.
△ Less
Submitted 30 November, 2020;
originally announced December 2020.
-
Modeling patterns of smartphone usage and their relationship to cognitive health
Authors:
Jonas Rauber,
Emily B. Fox,
Leon A. Gatys
Abstract:
The ubiquity of smartphone usage in many people's lives make it a rich source of information about a person's mental and cognitive state. In this work we analyze 12 weeks of phone usage data from 113 older adults, 31 with diagnosed cognitive impairment and 82 without. We develop structured models of users' smartphone interactions to reveal differences in phone usage patterns between people with an…
▽ More
The ubiquity of smartphone usage in many people's lives make it a rich source of information about a person's mental and cognitive state. In this work we analyze 12 weeks of phone usage data from 113 older adults, 31 with diagnosed cognitive impairment and 82 without. We develop structured models of users' smartphone interactions to reveal differences in phone usage patterns between people with and without cognitive impairment. In particular, we focus on inferring specific types of phone usage sessions that are predictive of cognitive impairment. Our model achieves an AUROC of 0.79 when discriminating between healthy and symptomatic subjects, and its interpretability enables novel insights into which aspects of phone usage strongly relate with cognitive health in our dataset.
△ Less
Submitted 13 November, 2019;
originally announced November 2019.
-
Adaptively Truncating Backpropagation Through Time to Control Gradient Bias
Authors:
Christopher Aicher,
Nicholas J. Foti,
Emily B. Fox
Abstract:
Truncated backpropagation through time (TBPTT) is a popular method for learning in recurrent neural networks (RNNs) that saves computation and memory at the cost of bias by truncating backpropagation after a fixed number of lags. In practice, choosing the optimal truncation length is difficult: TBPTT will not converge if the truncation length is too small, or will converge slowly if it is too larg…
▽ More
Truncated backpropagation through time (TBPTT) is a popular method for learning in recurrent neural networks (RNNs) that saves computation and memory at the cost of bias by truncating backpropagation after a fixed number of lags. In practice, choosing the optimal truncation length is difficult: TBPTT will not converge if the truncation length is too small, or will converge slowly if it is too large. We propose an adaptive TBPTT scheme that converts the problem from choosing a temporal lag to one of choosing a tolerable amount of gradient bias. For many realistic RNNs, the TBPTT gradients decay geometrically in expectation for large lags; under this condition, we can control the bias by varying the truncation length adaptively. For RNNs with smooth activation functions, we prove that this bias controls the convergence rate of SGD with biased gradients for our non-convex loss. Using this theory, we develop a practical method for adaptively estimating the truncation length during training. We evaluate our adaptive TBPTT method on synthetic data and language modeling tasks and find that our adaptive TBPTT ameliorates the computational pitfalls of fixed TBPTT.
△ Less
Submitted 1 July, 2019; v1 submitted 17 May, 2019;
originally announced May 2019.
-
Stochastic Gradient MCMC for Nonlinear State Space Models
Authors:
Christopher Aicher,
Srshti Putcha,
Christopher Nemeth,
Paul Fearnhead,
Emily B. Fox
Abstract:
State space models (SSMs) provide a flexible framework for modeling complex time series via a latent stochastic process. Inference for nonlinear, non-Gaussian SSMs is often tackled with particle methods that do not scale well to long time series. The challenge is two-fold: not only do computations scale linearly with time, as in the linear case, but particle filters additionally suffer from increa…
▽ More
State space models (SSMs) provide a flexible framework for modeling complex time series via a latent stochastic process. Inference for nonlinear, non-Gaussian SSMs is often tackled with particle methods that do not scale well to long time series. The challenge is two-fold: not only do computations scale linearly with time, as in the linear case, but particle filters additionally suffer from increasing particle degeneracy with longer series. Stochastic gradient MCMC methods have been developed to scale Bayesian inference for finite-state hidden Markov models and linear SSMs using buffered stochastic gradient estimates to account for temporal dependencies. We extend these stochastic gradient estimators to nonlinear SSMs using particle methods. We present error bounds that account for both buffering error and particle error in the case of nonlinear SSMs that are log-concave in the latent process. We evaluate our proposed particle buffered stochastic gradient using stochastic gradient MCMC for inference on both long sequential synthetic and minute-resolution financial returns data, demonstrating the importance of this class of methods.
△ Less
Submitted 16 July, 2023; v1 submitted 29 January, 2019;
originally announced January 2019.
-
Stochastic Gradient MCMC for State Space Models
Authors:
Christopher Aicher,
Yi-An Ma,
Nicholas J. Foti,
Emily B. Fox
Abstract:
State space models (SSMs) are a flexible approach to modeling complex time series. However, inference in SSMs is often computationally prohibitive for long time series. Stochastic gradient MCMC (SGMCMC) is a popular method for scalable Bayesian inference for large independent data. Unfortunately when applied to dependent data, such as in SSMs, SGMCMC's stochastic gradient estimates are biased as t…
▽ More
State space models (SSMs) are a flexible approach to modeling complex time series. However, inference in SSMs is often computationally prohibitive for long time series. Stochastic gradient MCMC (SGMCMC) is a popular method for scalable Bayesian inference for large independent data. Unfortunately when applied to dependent data, such as in SSMs, SGMCMC's stochastic gradient estimates are biased as they break crucial temporal dependencies. To alleviate this, we propose stochastic gradient estimators that control this bias by performing additional computation in a `buffer' to reduce breaking dependencies. Furthermore, we derive error bounds for this bias and show a geometric decay under mild conditions. Using these estimators, we develop novel SGMCMC samplers for discrete, continuous and mixed-type SSMs with analytic message passing. Our experiments on real and synthetic data demonstrate the effectiveness of our SGMCMC algorithms compared to batch MCMC, allowing us to scale inference to long time series with millions of time points.
△ Less
Submitted 9 July, 2019; v1 submitted 22 October, 2018;
originally announced October 2018.
-
Approximate Collapsed Gibbs Clustering with Expectation Propagation
Authors:
Christopher Aicher,
Emily B. Fox
Abstract:
We develop a framework for approximating collapsed Gibbs sampling in generative latent variable cluster models. Collapsed Gibbs is a popular MCMC method, which integrates out variables in the posterior to improve mixing. Unfortunately for many complex models, integrating out these variables is either analytically or computationally intractable. We efficiently approximate the necessary collapsed Gi…
▽ More
We develop a framework for approximating collapsed Gibbs sampling in generative latent variable cluster models. Collapsed Gibbs is a popular MCMC method, which integrates out variables in the posterior to improve mixing. Unfortunately for many complex models, integrating out these variables is either analytically or computationally intractable. We efficiently approximate the necessary collapsed Gibbs integrals by borrowing ideas from expectation propagation. We present two case studies where exact collapsed Gibbs sampling is intractable: mixtures of Student-t's and time series clustering. Our experiments on real and synthetic data show that our approximate sampler enables a runtime-accuracy tradeoff in sampling these types of models, providing results with competitive accuracy much more rapidly than the naive Gibbs samplers one would otherwise rely on in these scenarios.
△ Less
Submitted 19 July, 2018;
originally announced July 2018.
-
Disentangled VAE Representations for Multi-Aspect and Missing Data
Authors:
Samuel K. Ainsworth,
Nicholas J. Foti,
Emily B. Fox
Abstract:
Many problems in machine learning and related application areas are fundamentally variants of conditional modeling and sampling across multi-aspect data, either multi-view, multi-modal, or simply multi-group. For example, sampling from the distribution of English sentences conditioned on a given French sentence or sampling audio waveforms conditioned on a given piece of text. Central to many of th…
▽ More
Many problems in machine learning and related application areas are fundamentally variants of conditional modeling and sampling across multi-aspect data, either multi-view, multi-modal, or simply multi-group. For example, sampling from the distribution of English sentences conditioned on a given French sentence or sampling audio waveforms conditioned on a given piece of text. Central to many of these problems is the issue of missing data: we can observe many English, French, or German sentences individually but only occasionally do we have data for a sentence pair. Motivated by these applications and inspired by recent progress in variational autoencoders for grouped data, we develop factVAE, a deep generative model capable of handling multi-aspect data, robust to missing observations, and with a prior that encourages disentanglement between the groups and the latent dimensions. The effectiveness of factVAE is demonstrated on a variety of rich real-world datasets, including motion capture poses and pictures of faces captured from varying poses and perspectives.
△ Less
Submitted 23 June, 2018;
originally announced June 2018.
-
Large-Scale Stochastic Sampling from the Probability Simplex
Authors:
Jack Baker,
Paul Fearnhead,
Emily B Fox,
Christopher Nemeth
Abstract:
Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space the time-discretization error can dominate when we are near the boundary of the space. We demons…
▽ More
Stochastic gradient Markov chain Monte Carlo (SGMCMC) has become a popular method for scalable Bayesian inference. These methods are based on sampling a discrete-time approximation to a continuous time process, such as the Langevin diffusion. When applied to distributions defined on a constrained space the time-discretization error can dominate when we are near the boundary of the space. We demonstrate that because of this, current SGMCMC methods for the simplex struggle with sparse simplex spaces; when many of the components are close to zero. Unfortunately, many popular large-scale Bayesian models, such as network or topic models, require inference on sparse simplex spaces. To avoid the biases caused by this discretization error, we propose the stochastic Cox-Ingersoll-Ross process (SCIR), which removes all discretization error and we prove that samples from the SCIR process are asymptotically unbiased. We discuss how this idea can be extended to target other constrained spaces. Use of the SCIR process within a SGMCMC algorithm is shown to give substantially better performance for a topic model and a Dirichlet process mixture model than existing SGMCMC approaches.
△ Less
Submitted 26 October, 2018; v1 submitted 19 June, 2018;
originally announced June 2018.
-
An Efficient ADMM Algorithm for Structural Break Detection in Multivariate Time Series
Authors:
Alex Tank,
Emily B. Fox,
Ali Shojaie
Abstract:
We present an efficient alternating direction method of multipliers (ADMM) algorithm for segmenting a multivariate non-stationary time series with structural breaks into stationary regions. We draw from recent work where the series is assumed to follow a vector autoregressive model within segments and a convex estimation procedure may be formulated using group fused lasso penalties. Our ADMM appro…
▽ More
We present an efficient alternating direction method of multipliers (ADMM) algorithm for segmenting a multivariate non-stationary time series with structural breaks into stationary regions. We draw from recent work where the series is assumed to follow a vector autoregressive model within segments and a convex estimation procedure may be formulated using group fused lasso penalties. Our ADMM approach first splits the convex problem into a global quadratic program and a simple group lasso proximal update. We show that the global problem may be parallelized over rows of the time dependent transition matrices and furthermore that each subproblem may be rewritten in a form identical to the log-likelihood of a Gaussian state space model. Consequently, we develop a Kalman smoothing algorithm to solve the global update in time linear in the length of the series.
△ Less
Submitted 25 June, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.
-
An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery
Authors:
Alex Tank,
Ian Cover,
Nicholas J. Foti,
Ali Shojaie,
Emily B. Fox
Abstract:
While most classical approaches to Granger causality detection repose upon linear time series assumptions, many interactions in neuroscience and economics applications are nonlinear. We develop an approach to nonlinear Granger causality detection using multilayer perceptrons where the input to the network is the past time lags of all series and the output is the future value of a single series. A…
▽ More
While most classical approaches to Granger causality detection repose upon linear time series assumptions, many interactions in neuroscience and economics applications are nonlinear. We develop an approach to nonlinear Granger causality detection using multilayer perceptrons where the input to the network is the past time lags of all series and the output is the future value of a single series. A sufficient condition for Granger non-causality in this setting is that all of the outgoing weights of the input data, the past lags of a series, to the first hidden layer are zero. For estimation, we utilize a group lasso penalty to shrink groups of input weights to zero. We also propose a hierarchical penalty for simultaneous Granger causality and lag estimation. We validate our approach on simulated data from both a sparse linear autoregressive model and the sparse and nonlinear Lorenz-96 model.
△ Less
Submitted 25 June, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.
-
sgmcmc: An R Package for Stochastic Gradient Markov Chain Monte Carlo
Authors:
Jack Baker,
Paul Fearnhead,
Emily B. Fox,
Christopher Nemeth
Abstract:
This paper introduces the R package sgmcmc; which can be used for Bayesian inference on problems with large datasets using stochastic gradient Markov chain Monte Carlo (SGMCMC). Traditional Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings, are known to run prohibitively slowly as the dataset size increases. SGMCMC solves this issue by only using a subset of data at each iterati…
▽ More
This paper introduces the R package sgmcmc; which can be used for Bayesian inference on problems with large datasets using stochastic gradient Markov chain Monte Carlo (SGMCMC). Traditional Markov chain Monte Carlo (MCMC) methods, such as Metropolis-Hastings, are known to run prohibitively slowly as the dataset size increases. SGMCMC solves this issue by only using a subset of data at each iteration. SGMCMC requires calculating gradients of the log likelihood and log priors, which can be time consuming and error prone to perform by hand. The sgmcmc package calculates these gradients itself using automatic differentiation, making the implementation of these methods much easier. To do this, the package uses the software library TensorFlow, which has a variety of statistical distributions and mathematical operations as standard, meaning a wide class of models can be built using this framework. SGMCMC has become widely adopted in the machine learning literature, but less so in the statistics community. We believe this may be partly due to lack of software; this package aims to bridge this gap.
△ Less
Submitted 13 April, 2018; v1 submitted 2 October, 2017;
originally announced October 2017.
-
Dynamics of homelessness in urban America
Authors:
Chris Glynn,
Emily B. Fox
Abstract:
The relationship between housing costs and homelessness has important implications for the way that city and county governments respond to increasing homeless populations. Though many analyses in the public policy literature have examined inter-community variation in homelessness rates to identify causal mechanisms of homelessness (Byrne et al., 2013; Lee et al., 2003; Fargo et al., 2013), few stu…
▽ More
The relationship between housing costs and homelessness has important implications for the way that city and county governments respond to increasing homeless populations. Though many analyses in the public policy literature have examined inter-community variation in homelessness rates to identify causal mechanisms of homelessness (Byrne et al., 2013; Lee et al., 2003; Fargo et al., 2013), few studies have examined time-varying homeless counts within the same community (McCandless et al., 2016). To examine trends in homeless population counts in the 25 largest U.S. metropolitan areas, we develop a dynamic Bayesian hierarchical model for time-varying homeless count data. Particular care is given to modeling uncertainty in the homeless count generating and measurement processes, and a critical distinction is made between the counted number of homeless and the true size of the homeless population. For each metro under study, we investigate the relationship between increases in the Zillow Rent Index and increases in the homeless population. Sensitivity of inference to potential improvements in the accuracy of point-in-time counts is explored, and evidence is presented that the inferred increase in the rate of homelessness from 2011-2016 depends on prior beliefs about the accuracy of homeless counts. A main finding of the study is that the relationship between homelessness and rental costs is strongest in New York, Los Angeles, Washington, D.C., and Seattle.
△ Less
Submitted 28 July, 2017;
originally announced July 2017.
-
Control Variates for Stochastic Gradient MCMC
Authors:
Jack Baker,
Paul Fearnhead,
Emily B. Fox,
Christopher Nemeth
Abstract:
It is well known that Markov chain Monte Carlo (MCMC) methods scale poorly with dataset size. A popular class of methods for solving this issue is stochastic gradient MCMC. These methods use a noisy estimate of the gradient of the log posterior, which reduces the per iteration computational cost of the algorithm. Despite this, there are a number of results suggesting that stochastic gradient Lange…
▽ More
It is well known that Markov chain Monte Carlo (MCMC) methods scale poorly with dataset size. A popular class of methods for solving this issue is stochastic gradient MCMC. These methods use a noisy estimate of the gradient of the log posterior, which reduces the per iteration computational cost of the algorithm. Despite this, there are a number of results suggesting that stochastic gradient Langevin dynamics (SGLD), probably the most popular of these methods, still has computational cost proportional to the dataset size. We suggest an alternative log posterior gradient estimate for stochastic gradient MCMC, which uses control variates to reduce the variance. We analyse SGLD using this gradient estimate, and show that, under log-concavity assumptions on the target distribution, the computational cost required for a given level of accuracy is independent of the dataset size. Next we show that a different control variate technique, known as zero variance control variates can be applied to SGMCMC algorithms for free. This post-processing step improves the inference of the algorithm by reducing the variance of the MCMC output. Zero variance control variates rely on the gradient of the log posterior; we explore how the variance reduction is affected by replacing this with the noisy gradient estimate calculated by SGMCMC.
△ Less
Submitted 14 December, 2017; v1 submitted 16 June, 2017;
originally announced June 2017.
-
Stochastic Gradient MCMC Methods for Hidden Markov Models
Authors:
Yi-An Ma,
Nicholas J. Foti,
Emily B. Fox
Abstract:
Stochastic gradient MCMC (SG-MCMC) algorithms have proven useful in scaling Bayesian inference to large datasets under an assumption of i.i.d data. We instead develop an SG-MCMC algorithm to learn the parameters of hidden Markov models (HMMs) for time-dependent data. There are two challenges to applying SG-MCMC in this setting: The latent discrete states, and needing to break dependencies when con…
▽ More
Stochastic gradient MCMC (SG-MCMC) algorithms have proven useful in scaling Bayesian inference to large datasets under an assumption of i.i.d data. We instead develop an SG-MCMC algorithm to learn the parameters of hidden Markov models (HMMs) for time-dependent data. There are two challenges to applying SG-MCMC in this setting: The latent discrete states, and needing to break dependencies when considering minibatches. We consider a marginal likelihood representation of the HMM and propose an algorithm that harnesses the inherent memory decay of the process. We demonstrate the effectiveness of our algorithm on synthetic experiments and an ion channel recording data, with runtimes significantly outperforming batch MCMC.
△ Less
Submitted 14 June, 2017;
originally announced June 2017.
-
Granger Causality Networks for Categorical Time Series
Authors:
Alex Tank,
Emily B. Fox,
Ali Shojaie
Abstract:
We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, non-identifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the app…
▽ More
We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, non-identifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the application of MTD to high-dimensional multivariate time series. As a baseline, we also formulate a multi-output logistic autoregressive model (mLTD), which while a straightforward extension of autoregressive Bernoulli generalized linear models, has not been previously applied to the analysis of multivariate categorial time series. We develop novel identifiability conditions of the MTD model and compare them to those for mLTD. We further devise novel and efficient optimization algorithm for the MTD based on the new convex formulation, and compare the MTD and mLTD in both simulated and real data experiments. Our approach simultaneously provides a comparison of methods for network inference in categorical time series and opens the door to modern, regularized inference with the MTD model.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
Identifiability and Estimation of Structural Vector Autoregressive Models for Subsampled and Mixed Frequency Time Series
Authors:
Alex Tank,
Emily B. Fox,
Ali Shojaie
Abstract:
Causal inference in multivariate time series is challenging due to the fact that the sampling rate may not be as fast as the timescale of the causal interactions. In this context, we can view our observed series as a subsampled version of the desired series. Furthermore, due to technological and other limitations, series may be observed at different sampling rates, representing a mixed frequency s…
▽ More
Causal inference in multivariate time series is challenging due to the fact that the sampling rate may not be as fast as the timescale of the causal interactions. In this context, we can view our observed series as a subsampled version of the desired series. Furthermore, due to technological and other limitations, series may be observed at different sampling rates, representing a mixed frequency setting. To determine instantaneous and lagged effects between time series at the true causal scale, we take a model-based approach based on structural vector autoregressive (SVAR) models. In this context, we present a unifying framework for parameter identifiability and estimation under both subsampling and mixed frequencies when the noise, or shocks, are non-Gaussian. Importantly, by studying the SVAR case, we are able to both provide identifiability and estimation methods for the causal structure of both lagged and instantaneous effects at the desired time scale. We further derive an exact EM algorithm for inference in both subsampled and mixed frequency settings. We validate our approach in simulated scenarios and on two real world data sets.
△ Less
Submitted 8 April, 2017;
originally announced April 2017.
-
Irreversible Samplers from Jump and Continuous Markov Processes
Authors:
Yi-An Ma,
Emily B. Fox,
Tianqi Chen,
Lei Wu
Abstract:
In this paper, we propose irreversible versions of the Metropolis Hastings (MH) and Metropolis adjusted Langevin algorithm (MALA) with a main focus on the latter. For the former, we show how one can simply switch between different proposal and acceptance distributions upon rejection to obtain an irreversible jump sampler (I-Jump). The resulting algorithm has a simple implementation akin to MH, but…
▽ More
In this paper, we propose irreversible versions of the Metropolis Hastings (MH) and Metropolis adjusted Langevin algorithm (MALA) with a main focus on the latter. For the former, we show how one can simply switch between different proposal and acceptance distributions upon rejection to obtain an irreversible jump sampler (I-Jump). The resulting algorithm has a simple implementation akin to MH, but with the demonstrated benefits of irreversibility. We then show how the previously proposed MALA method can also be extended to exploit irreversible stochastic dynamics as proposal distributions in the I-Jump sampler. Our experiments explore how irreversibility can increase the efficiency of the samplers in different situations.
△ Less
Submitted 12 March, 2018; v1 submitted 21 August, 2016;
originally announced August 2016.
-
A Complete Recipe for Stochastic Gradient MCMC
Authors:
Yi-An Ma,
Tianqi Chen,
Emily B. Fox
Abstract:
Many recent Markov chain Monte Carlo (MCMC) samplers leverage continuous dynamics to define a transition kernel that efficiently explores a target distribution. In tandem, a focus has been on devising scalable variants that subsample the data and use stochastic gradients in place of full-data gradients in the dynamic simulations. However, such stochastic gradient MCMC samplers have lagged behind t…
▽ More
Many recent Markov chain Monte Carlo (MCMC) samplers leverage continuous dynamics to define a transition kernel that efficiently explores a target distribution. In tandem, a focus has been on devising scalable variants that subsample the data and use stochastic gradients in place of full-data gradients in the dynamic simulations. However, such stochastic gradient MCMC samplers have lagged behind their full-data counterparts in terms of the complexity of dynamics considered since proving convergence in the presence of the stochastic gradient noise is non-trivial. Even with simple dynamics, significant physical intuition is often required to modify the dynamical system to account for the stochastic gradient noise. In this paper, we provide a general recipe for constructing MCMC samplers--including stochastic gradient versions--based on continuous Markov processes specified via two matrices. We constructively prove that the framework is complete. That is, any continuous Markov process that provides samples from the target distribution can be written in our framework. We show how previous continuous-dynamic samplers can be trivially "reinvented" in our framework, avoiding the complicated sampler-specific proofs. We likewise use our recipe to straightforwardly propose a new state-adaptive sampler: stochastic gradient Riemann Hamiltonian Monte Carlo (SGRHMC). Our experiments on simulated data and a streaming Wikipedia analysis demonstrate that the proposed SGRHMC sampler inherits the benefits of Riemann HMC, with the scalability of stochastic gradient methods.
△ Less
Submitted 31 October, 2015; v1 submitted 15 June, 2015;
originally announced June 2015.
-
Achieving a Hyperlocal Housing Price Index: Overcoming Data Sparsity by Bayesian Dynamical Modeling of Multiple Data Streams
Authors:
You Ren,
Emily B. Fox,
Andrew Bruce
Abstract:
Understanding how housing values evolve over time is important to policy makers, consumers and real estate professionals. Existing methods for constructing housing indices are computed at a coarse spatial granularity, such as metropolitan regions, which can mask or distort price dynamics apparent in local markets, such as neighborhoods and census tracts. A challenge in moving to estimates at, for…
▽ More
Understanding how housing values evolve over time is important to policy makers, consumers and real estate professionals. Existing methods for constructing housing indices are computed at a coarse spatial granularity, such as metropolitan regions, which can mask or distort price dynamics apparent in local markets, such as neighborhoods and census tracts. A challenge in moving to estimates at, for example, the census tract level is the sparsity of spatiotemporally localized house sales observations. Our work aims at addressing this challenge by leveraging observations from multiple census tracts discovered to have correlated valuation dynamics. Our proposed Bayesian nonparametric approach builds on the framework of latent factor models to enable a flexible, data-driven method for inferring the clustering of correlated census tracts. We explore methods for scalability and parallelizability of computations, yielding a housing valuation index at the level of census tract rather than zip code, and on a monthly basis rather than quarterly. Our analysis is provided on a large Seattle metropolitan housing dataset.
△ Less
Submitted 5 May, 2015;
originally announced May 2015.
-
Streaming Variational Inference for Bayesian Nonparametric Mixture Models
Authors:
Alex Tank,
Nicholas J. Foti,
Emily B. Fox
Abstract:
In theory, Bayesian nonparametric (BNP) models are well suited to streaming data scenarios due to their ability to adapt model complexity with the observed data. Unfortunately, such benefits have not been fully realized in practice; existing inference algorithms are either not applicable to streaming applications or not extensible to BNP models. For the special case of Dirichlet processes, streami…
▽ More
In theory, Bayesian nonparametric (BNP) models are well suited to streaming data scenarios due to their ability to adapt model complexity with the observed data. Unfortunately, such benefits have not been fully realized in practice; existing inference algorithms are either not applicable to streaming applications or not extensible to BNP models. For the special case of Dirichlet processes, streaming inference has been considered. However, there is growing interest in more flexible BNP models building on the class of normalized random measures (NRMs). We work within this general framework and present a streaming variational inference algorithm for NRM mixture models. Our algorithm is based on assumed density filtering (ADF), leading straightforwardly to expectation propagation (EP) for large-scale batch inference as well. We demonstrate the efficacy of the algorithm on clustering documents in large, streaming text corpora.
△ Less
Submitted 21 April, 2015; v1 submitted 1 December, 2014;
originally announced December 2014.
-
Stochastic Variational Inference for Hidden Markov Models
Authors:
Nicholas J. Foti,
Jason Xu,
Dillon Laird,
Emily B. Fox
Abstract:
Variational inference algorithms have proven successful for Bayesian analysis in large data settings, with recent advances using stochastic variational inference (SVI). However, such methods have largely been studied in independent or exchangeable data settings. We develop an SVI algorithm to learn the parameters of hidden Markov models (HMMs) in a time-dependent data setting. The challenge in app…
▽ More
Variational inference algorithms have proven successful for Bayesian analysis in large data settings, with recent advances using stochastic variational inference (SVI). However, such methods have largely been studied in independent or exchangeable data settings. We develop an SVI algorithm to learn the parameters of hidden Markov models (HMMs) in a time-dependent data setting. The challenge in applying stochastic optimization in this setting arises from dependencies in the chain, which must be broken to consider minibatches of observations. We propose an algorithm that harnesses the memory decay of the chain to adaptively bound errors arising from edge effects. We demonstrate the effectiveness of our algorithm on synthetic experiments and a large genomics dataset where a batch algorithm is computationally infeasible.
△ Less
Submitted 6 November, 2014;
originally announced November 2014.
-
Modeling the Complex Dynamics and Changing Correlations of Epileptic Events
Authors:
Drausin F. Wulsin,
Emily B. Fox,
Brian Litt
Abstract:
Patients with epilepsy can manifest short, sub-clinical epileptic "bursts" in addition to full-blown clinical seizures. We believe the relationship between these two classes of events---something not previously studied quantitatively---could yield important insights into the nature and intrinsic dynamics of seizures. A goal of our work is to parse these complex epileptic events into distinct dynam…
▽ More
Patients with epilepsy can manifest short, sub-clinical epileptic "bursts" in addition to full-blown clinical seizures. We believe the relationship between these two classes of events---something not previously studied quantitatively---could yield important insights into the nature and intrinsic dynamics of seizures. A goal of our work is to parse these complex epileptic events into distinct dynamic regimes. A challenge posed by the intracranial EEG (iEEG) data we study is the fact that the number and placement of electrodes can vary between patients. We develop a Bayesian nonparametric Markov switching process that allows for (i) shared dynamic regimes between a variable number of channels, (ii) asynchronous regime-switching, and (iii) an unknown dictionary of dynamic regimes. We encode a sparse and changing set of dependencies between the channels using a Markov-switching Gaussian graphical model for the innovations process driving the channel dynamics and demonstrate the importance of this model in parsing and out-of-sample predictions of iEEG data. We show that our model produces intuitive state assignments that can help automate clinical analysis of seizures and enable the comparison of sub-clinical bursts and full clinical seizures.
△ Less
Submitted 13 July, 2014; v1 submitted 27 February, 2014;
originally announced February 2014.
-
Learning the Parameters of Determinantal Point Process Kernels
Authors:
Raja Hafiz Affandi,
Emily B. Fox,
Ryan P. Adams,
Ben Taskar
Abstract:
Determinantal point processes (DPPs) are well-suited for modeling repulsion and have proven useful in many applications where diversity is desired. While DPPs have many appealing properties, such as efficient sampling, learning the parameters of a DPP is still considered a difficult problem due to the non-convex nature of the likelihood function. In this paper, we propose using Bayesian methods to…
▽ More
Determinantal point processes (DPPs) are well-suited for modeling repulsion and have proven useful in many applications where diversity is desired. While DPPs have many appealing properties, such as efficient sampling, learning the parameters of a DPP is still considered a difficult problem due to the non-convex nature of the likelihood function. In this paper, we propose using Bayesian methods to learn the DPP kernel parameters. These methods are applicable in large-scale and continuous DPP settings even when the exact form of the eigendecomposition is unknown. We demonstrate the utility of our DPP learning methods in studying the progression of diabetic neuropathy based on spatial distribution of nerve fibers, and in studying human perception of diversity in images.
△ Less
Submitted 19 February, 2014;
originally announced February 2014.
-
Stochastic Gradient Hamiltonian Monte Carlo
Authors:
Tianqi Chen,
Emily B. Fox,
Carlos Guestrin
Abstract:
Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient compu…
▽ More
Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient computation for simulation of the Hamiltonian dynamical system-such computation is infeasible in problems involving a large sample size or streaming data. Instead, we must rely on a noisy gradient estimate computed from a subset of the data. In this paper, we explore the properties of such a stochastic gradient HMC approach. Surprisingly, the natural implementation of the stochastic approximation can be arbitrarily bad. To address this problem we introduce a variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution. Results on simulated data validate our theory. We also provide an application of our methods to a classification task using neural networks and to online Bayesian matrix factorization.
△ Less
Submitted 12 May, 2014; v1 submitted 17 February, 2014;
originally announced February 2014.
-
Sparse graphs using exchangeable random measures
Authors:
François Caron,
Emily B. Fox
Abstract:
Statistical network modeling has focused on representing the graph as a discrete structure, namely the adjacency matrix, and considering the exchangeability of this array. In such cases, the Aldous-Hoover representation theorem (Aldous, 1981;Hoover, 1979} applies and informs us that the graph is necessarily either dense or empty. In this paper, we instead consider representing the graph as a mea…
▽ More
Statistical network modeling has focused on representing the graph as a discrete structure, namely the adjacency matrix, and considering the exchangeability of this array. In such cases, the Aldous-Hoover representation theorem (Aldous, 1981;Hoover, 1979} applies and informs us that the graph is necessarily either dense or empty. In this paper, we instead consider representing the graph as a measure on $\mathbb{R}_+^2$. For the associated definition of exchangeability in this continuous space, we rely on the Kallenberg representation theorem (Kallenberg, 2005). We show that for certain choices of such exchangeable random measures underlying our graph construction, our network process is sparse with power-law degree distribution. In particular, we build on the framework of completely random measures (CRMs) and use the theory associated with such processes to derive important network properties, such as an urn representation for our analysis and network simulation. Our theoretical results are explored empirically and compared to common network models. We then present a Hamiltonian Monte Carlo algorithm for efficient exploration of the posterior distribution and demonstrate that we are able to recover graphs ranging from dense to sparse--and perform associated tests--based on our flexible CRM-based formulation. We explore network properties in a range of real datasets, including Facebook social circles, a political blogosphere, protein networks, citation networks, and world wide web networks, including networks with hundreds of thousands of nodes and millions of edges.
△ Less
Submitted 27 March, 2015; v1 submitted 6 January, 2014;
originally announced January 2014.
-
Approximate Inference in Continuous Determinantal Point Processes
Authors:
Raja Hafiz Affandi,
Emily B. Fox,
Ben Taskar
Abstract:
Determinantal point processes (DPPs) are random point processes well-suited for modeling repulsion. In machine learning, the focus of DPP-based models has been on diverse subset selection from a discrete and finite base set. This discrete setting admits an efficient sampling algorithm based on the eigendecomposition of the defining kernel matrix. Recently, there has been growing interest in using…
▽ More
Determinantal point processes (DPPs) are random point processes well-suited for modeling repulsion. In machine learning, the focus of DPP-based models has been on diverse subset selection from a discrete and finite base set. This discrete setting admits an efficient sampling algorithm based on the eigendecomposition of the defining kernel matrix. Recently, there has been growing interest in using DPPs defined on continuous spaces. While the discrete-DPP sampler extends formally to the continuous case, computationally, the steps required are not tractable in general. In this paper, we present two efficient DPP sampling schemes that apply to a wide range of kernel functions: one based on low rank approximations via Nystrom and random Fourier feature techniques and another based on Gibbs sampling. We demonstrate the utility of continuous DPPs in repulsive mixture modeling and synthesizing human poses spanning activity spaces.
△ Less
Submitted 12 November, 2013;
originally announced November 2013.
-
Mixed Membership Models for Time Series
Authors:
Emily B. Fox,
Michael I. Jordan
Abstract:
In this article we discuss some of the consequences of the mixed membership perspective on time series analysis. In its most abstract form, a mixed membership model aims to associate an individual entity with some set of attributes based on a collection of observed data. Although much of the literature on mixed membership models considers the setting in which exchangeable collections of data are a…
▽ More
In this article we discuss some of the consequences of the mixed membership perspective on time series analysis. In its most abstract form, a mixed membership model aims to associate an individual entity with some set of attributes based on a collection of observed data. Although much of the literature on mixed membership models considers the setting in which exchangeable collections of data are associated with each member of a set of entities, it is equally natural to consider problems in which an entire time series is viewed as an entity and the goal is to characterize the time series in terms of a set of underlying dynamic attributes or "dynamic regimes". Indeed, this perspective is already present in the classical hidden Markov model, where the dynamic regimes are referred to as "states", and the collection of states realized in a sample path of the underlying process can be viewed as a mixed membership characterization of the observed time series. Our goal here is to review some of the richer modeling possibilities for time series that are provided by recent developments in the mixed membership framework.
△ Less
Submitted 13 September, 2013;
originally announced September 2013.
-
Joint modeling of multiple time series via the beta process with application to motion capture segmentation
Authors:
Emily B. Fox,
Michael C. Hughes,
Erik B. Sudderth,
Michael I. Jordan
Abstract:
We propose a Bayesian nonparametric approach to the problem of jointly modeling multiple related time series. Our model discovers a latent set of dynamical behaviors shared among the sequences, and segments each time series into regions defined by a subset of these behaviors. Using a beta process prior, the size of the behavior set and the sharing pattern are both inferred from data. We develop Ma…
▽ More
We propose a Bayesian nonparametric approach to the problem of jointly modeling multiple related time series. Our model discovers a latent set of dynamical behaviors shared among the sequences, and segments each time series into regions defined by a subset of these behaviors. Using a beta process prior, the size of the behavior set and the sharing pattern are both inferred from data. We develop Markov chain Monte Carlo (MCMC) methods based on the Indian buffet process representation of the predictive distribution of the beta process. Our MCMC inference algorithm efficiently adds and removes behaviors via novel split-merge moves as well as data-driven birth and death proposals, avoiding the need to consider a truncated model. We demonstrate promising results on unsupervised segmentation of human motion capture data.
△ Less
Submitted 13 November, 2014; v1 submitted 21 August, 2013;
originally announced August 2013.
-
A Bayesian approach for predicting the popularity of tweets
Authors:
Tauhid Zaman,
Emily B. Fox,
Eric T. Bradlow
Abstract:
We predict the popularity of short messages called tweets created in the micro-blogging site known as Twitter. We measure the popularity of a tweet by the time-series path of its retweets, which is when people forward the tweet to others. We develop a probabilistic model for the evolution of the retweets using a Bayesian approach, and form predictions using only observations on the retweet times a…
▽ More
We predict the popularity of short messages called tweets created in the micro-blogging site known as Twitter. We measure the popularity of a tweet by the time-series path of its retweets, which is when people forward the tweet to others. We develop a probabilistic model for the evolution of the retweets using a Bayesian approach, and form predictions using only observations on the retweet times and the local network or "graph" structure of the retweeters. We obtain good step ahead forecasts and predictions of the final total number of retweets even when only a small fraction (i.e., less than one tenth) of the retweet path is observed. This translates to good predictions within a few minutes of a tweet being posted, and has potential implications for understanding the spread of broader ideas, memes, or trends in social networks.
△ Less
Submitted 24 November, 2014; v1 submitted 24 April, 2013;
originally announced April 2013.
-
Spatio-Temporal Low Count Processes with Application to Violent Crime Events
Authors:
Sivan Aldor-Noiman,
Lawrence D. Brown,
Emily B. Fox,
Robert A. Stine
Abstract:
There is significant interest in being able to predict where crimes will happen, for example to aid in the efficient tasking of police and other protective measures. We aim to model both the temporal and spatial dependencies often exhibited by violent crimes in order to make such predictions. The temporal variation of crimes typically follows patterns familiar in time series analysis, but the spat…
▽ More
There is significant interest in being able to predict where crimes will happen, for example to aid in the efficient tasking of police and other protective measures. We aim to model both the temporal and spatial dependencies often exhibited by violent crimes in order to make such predictions. The temporal variation of crimes typically follows patterns familiar in time series analysis, but the spatial patterns are irregular and do not vary smoothly across the area. Instead we find that spatially disjoint regions exhibit correlated crime patterns. It is this indeterminate inter-region correlation structure along with the low-count, discrete nature of counts of serious crimes that motivates our proposed forecasting tool. In particular, we propose to model the crime counts in each region using an integer-valued first order autoregressive process. We take a Bayesian nonparametric approach to flexibly discover a clustering of these region-specific time series. We then describe how to account for covariates within this framework. Both approaches adjust for seasonality. We demonstrate our approach through an analysis of weekly reported violent crimes in Washington, D.C. between 2001-2008. Our forecasts outperform standard methods while additionally providing useful tools such as prediction intervals.
△ Less
Submitted 20 April, 2013;
originally announced April 2013.
-
Markov Determinantal Point Processes
Authors:
Raja Hafiz Affandi,
Alex Kulesza,
Emily B. Fox
Abstract:
A determinantal point process (DPP) is a random process useful for modeling the combinatorial problem of subset selection. In particular, DPPs encourage a random subset Y to contain a diverse set of items selected from a base set Y. For example, we might use a DPP to display a set of news headlines that are relevant to a user's interests while covering a variety of topics. Suppose, however, that w…
▽ More
A determinantal point process (DPP) is a random process useful for modeling the combinatorial problem of subset selection. In particular, DPPs encourage a random subset Y to contain a diverse set of items selected from a base set Y. For example, we might use a DPP to display a set of news headlines that are relevant to a user's interests while covering a variety of topics. Suppose, however, that we are asked to sequentially select multiple diverse sets of items, for example, displaying new headlines day-by-day. We might want these sets to be diverse not just individually but also through time, offering headlines today that are unlike the ones shown yesterday. In this paper, we construct a Markov DPP (M-DPP) that models a sequence of random sets {Yt}. The proposed M-DPP defines a stationary process that maintains DPP margins. Crucially, the induced union process Zt = Yt u Yt-1 is also marginally DPP-distributed. Jointly, these properties imply that the sequence of random sets are encouraged to be diverse both at a given time step as well as across time steps. We describe an exact, efficient sampling procedure, and a method for incrementally learning a quality measure over items in the base set Y based on external preferences. We apply the M-DPP to the task of sequentially displaying diverse and relevant news articles to a user with topic preferences.
△ Less
Submitted 16 October, 2012;
originally announced October 2012.
-
Multiresolution Gaussian Processes
Authors:
Emily B. Fox,
David B. Dunson
Abstract:
We propose a multiresolution Gaussian process to capture long-range, non-Markovian dependencies while allowing for abrupt changes. The multiresolution GP hierarchically couples a collection of smooth GPs, each defined over an element of a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points define the abrupt changes. Due to the inherent conju…
▽ More
We propose a multiresolution Gaussian process to capture long-range, non-Markovian dependencies while allowing for abrupt changes. The multiresolution GP hierarchically couples a collection of smooth GPs, each defined over an element of a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points define the abrupt changes. Due to the inherent conjugacy of the GPs, one can analytically marginalize the GPs and compute the conditional likelihood of the observations given the partition tree. This property allows for efficient inference of the partition itself, for which we employ graph-theoretic techniques. We apply the multiresolution GP to the analysis of Magnetoencephalography (MEG) recordings of brain activity.
△ Less
Submitted 4 September, 2012;
originally announced September 2012.
-
Concept Modeling with Superwords
Authors:
Khalid El-Arini,
Emily B. Fox,
Carlos Guestrin
Abstract:
In information retrieval, a fundamental goal is to transform a document into concepts that are representative of its content. The term "representative" is in itself challenging to define, and various tasks require different granularities of concepts. In this paper, we aim to model concepts that are sparse over the vocabulary, and that flexibly adapt their content based on other relevant semantic i…
▽ More
In information retrieval, a fundamental goal is to transform a document into concepts that are representative of its content. The term "representative" is in itself challenging to define, and various tasks require different granularities of concepts. In this paper, we aim to model concepts that are sparse over the vocabulary, and that flexibly adapt their content based on other relevant semantic information such as textual structure or associated image features. We explore a Bayesian nonparametric model based on nested beta processes that allows for inferring an unknown number of strictly sparse concepts. The resulting model provides an inherently different representation of concepts than a standard LDA (or HDP) based topic model, and allows for direct incorporation of semantic features. We demonstrate the utility of this representation on multilingual blog data and the Congressional Record.
△ Less
Submitted 11 April, 2012;
originally announced April 2012.
-
Joint Modeling of Multiple Related Time Series via the Beta Process
Authors:
Emily B. Fox,
Erik B. Sudderth,
Michael I. Jordan,
Alan S. Willsky
Abstract:
We propose a Bayesian nonparametric approach to the problem of jointly modeling multiple related time series. Our approach is based on the discovery of a set of latent, shared dynamical behaviors. Using a beta process prior, the size of the set and the sharing pattern are both inferred from data. We develop efficient Markov chain Monte Carlo methods based on the Indian buffet process representatio…
▽ More
We propose a Bayesian nonparametric approach to the problem of jointly modeling multiple related time series. Our approach is based on the discovery of a set of latent, shared dynamical behaviors. Using a beta process prior, the size of the set and the sharing pattern are both inferred from data. We develop efficient Markov chain Monte Carlo methods based on the Indian buffet process representation of the predictive distribution of the beta process, without relying on a truncated model. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth and death proposals. We examine the benefits of our proposed feature-based model on several synthetic datasets, and also demonstrate promising results on unsupervised segmentation of visual motion capture data.
△ Less
Submitted 17 November, 2011;
originally announced November 2011.
-
Autoregressive Models for Variance Matrices: Stationary Inverse Wishart Processes
Authors:
Emily B. Fox,
Mike West
Abstract:
We introduce and explore a new class of stationary time series models for variance matrices based on a constructive definition exploiting inverse Wishart distribution theory. The main class of models explored is a novel class of stationary, first-order autoregressive (AR) processes on the cone of positive semi-definite matrices. Aspects of the theory and structure of these new models for multivari…
▽ More
We introduce and explore a new class of stationary time series models for variance matrices based on a constructive definition exploiting inverse Wishart distribution theory. The main class of models explored is a novel class of stationary, first-order autoregressive (AR) processes on the cone of positive semi-definite matrices. Aspects of the theory and structure of these new models for multivariate "volatility" processes are described in detail and exemplified. We then develop approaches to model fitting via Bayesian simulation-based computations, creating a custom filtering method that relies on an efficient innovations sampler. An example is then provided in analysis of a multivariate electroencephalogram (EEG) time series in neurological studies. We conclude by discussing potential further developments of higher-order AR models and a number of connections with prior approaches.
△ Less
Submitted 26 July, 2011;
originally announced July 2011.
-
Bayesian Nonparametric Inference of Switching Linear Dynamical Systems
Authors:
Emily B. Fox,
Erik B. Sudderth,
Michael I. Jordan,
Alan S. Willsky
Abstract:
Many complex dynamical phenomena can be effectively modeled by a system that switches among a set of conditionally linear dynamical modes. We consider two such models: the switching linear dynamical system (SLDS) and the switching vector autoregressive (VAR) process. Our Bayesian nonparametric approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smoot…
▽ More
Many complex dynamical phenomena can be effectively modeled by a system that switches among a set of conditionally linear dynamical modes. We consider two such models: the switching linear dynamical system (SLDS) and the switching vector autoregressive (VAR) process. Our Bayesian nonparametric approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smooth dynamical modes. We additionally employ automatic relevance determination to infer a sparse set of dynamic dependencies allowing us to learn SLDS with varying state dimension or switching VAR processes with varying autoregressive order. We develop a sampling algorithm that combines a truncated approximation to the Dirichlet process with efficient joint sampling of the mode and state sequences. The utility and flexibility of our model are demonstrated on synthetic data, sequences of dancing honey bees, the IBOVESPA stock index, and a maneuvering target tracking application.
△ Less
Submitted 19 March, 2010;
originally announced March 2010.
-
A sticky HDP-HMM with application to speaker diarization
Authors:
Emily B. Fox,
Erik B. Sudderth,
Michael I. Jordan,
Alan S. Willsky
Abstract:
We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speake…
▽ More
We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566--1581]. Although the basic HDP-HMM tends to over-segment the audio data---creating redundant states and rapidly switching among them---we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.
△ Less
Submitted 16 August, 2011; v1 submitted 15 May, 2009;
originally announced May 2009.