-
Lizard: A Large-Scale Dataset for Colonic Nuclear Instance Segmentation and Classification
Authors:
Simon Graham,
Mostafa Jahanifar,
Ayesha Azam,
Mohammed Nimir,
Yee-Wah Tsang,
Katherine Dodd,
Emily Hero,
Harvir Sahota,
Atisha Tank,
Ksenija Benes,
Noorul Wahab,
Fayyaz Minhas,
Shan E Ahmed Raza,
Hesham El Daly,
Kishore Gopalakrishnan,
David Snead,
Nasir Rajpoot
Abstract:
The development of deep segmentation models for computational pathology (CPath) can help foster the investigation of interpretable morphological biomarkers. Yet, there is a major bottleneck in the success of such approaches because supervised deep learning models require an abundance of accurately labelled data. This issue is exacerbated in the field of CPath because the generation of detailed ann…
▽ More
The development of deep segmentation models for computational pathology (CPath) can help foster the investigation of interpretable morphological biomarkers. Yet, there is a major bottleneck in the success of such approaches because supervised deep learning models require an abundance of accurately labelled data. This issue is exacerbated in the field of CPath because the generation of detailed annotations usually demands the input of a pathologist to be able to distinguish between different tissue constructs and nuclei. Manually labelling nuclei may not be a feasible approach for collecting large-scale annotated datasets, especially when a single image region can contain thousands of different cells. However, solely relying on automatic generation of annotations will limit the accuracy and reliability of ground truth. Therefore, to help overcome the above challenges, we propose a multi-stage annotation pipeline to enable the collection of large-scale datasets for histology image analysis, with pathologist-in-the-loop refinement steps. Using this pipeline, we generate the largest known nuclear instance segmentation and classification dataset, containing nearly half a million labelled nuclei in H&E stained colon tissue. We have released the dataset and encourage the research community to utilise it to drive forward the development of downstream cell-based models in CPath.
△ Less
Submitted 29 November, 2021; v1 submitted 25 August, 2021;
originally announced August 2021.
-
Neural Granger Causality
Authors:
Alex Tank,
Ian Covert,
Nicholas Foti,
Ali Shojaie,
Emily Fox
Abstract:
While most classical approaches to Granger causality detection assume linear dynamics, many interactions in real-world applications, like neuroscience and genomics, are inherently nonlinear. In these cases, using linear models may lead to inconsistent estimation of Granger causal interactions. We propose a class of nonlinear methods by applying structured multilayer perceptrons (MLPs) or recurrent…
▽ More
While most classical approaches to Granger causality detection assume linear dynamics, many interactions in real-world applications, like neuroscience and genomics, are inherently nonlinear. In these cases, using linear models may lead to inconsistent estimation of Granger causal interactions. We propose a class of nonlinear methods by applying structured multilayer perceptrons (MLPs) or recurrent neural networks (RNNs) combined with sparsity-inducing penalties on the weights. By encouraging specific sets of weights to be zero--in particular, through the use of convex group-lasso penalties--we can extract the Granger causal structure. To further contrast with traditional approaches, our framework naturally enables us to efficiently capture long-range dependencies between series either via our RNNs or through an automatic lag selection in the MLP. We show that our neural Granger causality methods outperform state-of-the-art nonlinear Granger causality methods on the DREAM3 challenge data. This data consists of nonlinear gene expression and regulation time courses with only a limited number of time points. The successes we show in this challenging dataset provide a powerful example of how deep learning can be useful in cases that go beyond prediction on large datasets. We likewise illustrate our methods in detecting nonlinear interactions in a human motion capture dataset.
△ Less
Submitted 13 March, 2021; v1 submitted 16 February, 2018;
originally announced February 2018.
-
An Efficient ADMM Algorithm for Structural Break Detection in Multivariate Time Series
Authors:
Alex Tank,
Emily B. Fox,
Ali Shojaie
Abstract:
We present an efficient alternating direction method of multipliers (ADMM) algorithm for segmenting a multivariate non-stationary time series with structural breaks into stationary regions. We draw from recent work where the series is assumed to follow a vector autoregressive model within segments and a convex estimation procedure may be formulated using group fused lasso penalties. Our ADMM appro…
▽ More
We present an efficient alternating direction method of multipliers (ADMM) algorithm for segmenting a multivariate non-stationary time series with structural breaks into stationary regions. We draw from recent work where the series is assumed to follow a vector autoregressive model within segments and a convex estimation procedure may be formulated using group fused lasso penalties. Our ADMM approach first splits the convex problem into a global quadratic program and a simple group lasso proximal update. We show that the global problem may be parallelized over rows of the time dependent transition matrices and furthermore that each subproblem may be rewritten in a form identical to the log-likelihood of a Gaussian state space model. Consequently, we develop a Kalman smoothing algorithm to solve the global update in time linear in the length of the series.
△ Less
Submitted 25 June, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.
-
An Interpretable and Sparse Neural Network Model for Nonlinear Granger Causality Discovery
Authors:
Alex Tank,
Ian Cover,
Nicholas J. Foti,
Ali Shojaie,
Emily B. Fox
Abstract:
While most classical approaches to Granger causality detection repose upon linear time series assumptions, many interactions in neuroscience and economics applications are nonlinear. We develop an approach to nonlinear Granger causality detection using multilayer perceptrons where the input to the network is the past time lags of all series and the output is the future value of a single series. A…
▽ More
While most classical approaches to Granger causality detection repose upon linear time series assumptions, many interactions in neuroscience and economics applications are nonlinear. We develop an approach to nonlinear Granger causality detection using multilayer perceptrons where the input to the network is the past time lags of all series and the output is the future value of a single series. A sufficient condition for Granger non-causality in this setting is that all of the outgoing weights of the input data, the past lags of a series, to the first hidden layer are zero. For estimation, we utilize a group lasso penalty to shrink groups of input weights to zero. We also propose a hierarchical penalty for simultaneous Granger causality and lag estimation. We validate our approach on simulated data from both a sparse linear autoregressive model and the sparse and nonlinear Lorenz-96 model.
△ Less
Submitted 25 June, 2018; v1 submitted 22 November, 2017;
originally announced November 2017.
-
A Unified Framework for Long Range and Cold Start Forecasting of Seasonal Profiles in Time Series
Authors:
Christopher Xie,
Alex Tank,
Alec Greaves-Tunnell,
Emily Fox
Abstract:
Providing long-range forecasts is a fundamental challenge in time series modeling, which is only compounded by the challenge of having to form such forecasts when a time series has never previously been observed. The latter challenge is the time series version of the cold-start problem seen in recommender systems which, to our knowledge, has not been addressed in previous work. A similar problem o…
▽ More
Providing long-range forecasts is a fundamental challenge in time series modeling, which is only compounded by the challenge of having to form such forecasts when a time series has never previously been observed. The latter challenge is the time series version of the cold-start problem seen in recommender systems which, to our knowledge, has not been addressed in previous work. A similar problem occurs when a long range forecast is required after only observing a small number of time points --- a warm start forecast. With these aims in mind, we focus on forecasting seasonal profiles---or baseline demand---for periods on the order of a year in three cases: the long range case with multiple previously observed seasonal profiles, the cold start case with no previous observed seasonal profiles, and the warm start case with only a single partially observed profile. Classical time series approaches that perform iterated step-ahead forecasts based on previous observations struggle to provide accurate long range predictions; in settings with little to no observed data, such approaches are simply not applicable. Instead, we present a straightforward framework which combines ideas from high-dimensional regression and matrix factorization on a carefully constructed data matrix. Key to our formulation and resulting performance is leveraging (1) repeated patterns over fixed periods of time and across series, and (2) metadata associated with the individual series; without this additional data, the cold-start/warm-start problems are nearly impossible to solve. We demonstrate that our framework can accurately forecast an array of seasonal profiles on multiple large scale datasets.
△ Less
Submitted 26 August, 2018; v1 submitted 23 October, 2017;
originally announced October 2017.
-
Granger Causality Networks for Categorical Time Series
Authors:
Alex Tank,
Emily B. Fox,
Ali Shojaie
Abstract:
We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, non-identifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the app…
▽ More
We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, non-identifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the application of MTD to high-dimensional multivariate time series. As a baseline, we also formulate a multi-output logistic autoregressive model (mLTD), which while a straightforward extension of autoregressive Bernoulli generalized linear models, has not been previously applied to the analysis of multivariate categorial time series. We develop novel identifiability conditions of the MTD model and compare them to those for mLTD. We further devise novel and efficient optimization algorithm for the MTD based on the new convex formulation, and compare the MTD and mLTD in both simulated and real data experiments. Our approach simultaneously provides a comparison of methods for network inference in categorical time series and opens the door to modern, regularized inference with the MTD model.
△ Less
Submitted 8 June, 2017;
originally announced June 2017.
-
Identifiability and Estimation of Structural Vector Autoregressive Models for Subsampled and Mixed Frequency Time Series
Authors:
Alex Tank,
Emily B. Fox,
Ali Shojaie
Abstract:
Causal inference in multivariate time series is challenging due to the fact that the sampling rate may not be as fast as the timescale of the causal interactions. In this context, we can view our observed series as a subsampled version of the desired series. Furthermore, due to technological and other limitations, series may be observed at different sampling rates, representing a mixed frequency s…
▽ More
Causal inference in multivariate time series is challenging due to the fact that the sampling rate may not be as fast as the timescale of the causal interactions. In this context, we can view our observed series as a subsampled version of the desired series. Furthermore, due to technological and other limitations, series may be observed at different sampling rates, representing a mixed frequency setting. To determine instantaneous and lagged effects between time series at the true causal scale, we take a model-based approach based on structural vector autoregressive (SVAR) models. In this context, we present a unifying framework for parameter identifiability and estimation under both subsampling and mixed frequencies when the noise, or shocks, are non-Gaussian. Importantly, by studying the SVAR case, we are able to both provide identifiability and estimation methods for the causal structure of both lagged and instantaneous effects at the desired time scale. We further derive an exact EM algorithm for inference in both subsampled and mixed frequency settings. We validate our approach in simulated scenarios and on two real world data sets.
△ Less
Submitted 8 April, 2017;
originally announced April 2017.
-
Bayesian Structure Learning for Stationary Time Series
Authors:
Alex Tank,
Nicholas Foti,
Emily Fox
Abstract:
While much work has explored probabilistic graphical models for independent data, less attention has been paid to time series. The goal in this setting is to determine conditional independence relations between entire time series, which for stationary series, are encoded by zeros in the inverse spectral density matrix. We take a Bayesian approach to structure learning, placing priors on (i) the gr…
▽ More
While much work has explored probabilistic graphical models for independent data, less attention has been paid to time series. The goal in this setting is to determine conditional independence relations between entire time series, which for stationary series, are encoded by zeros in the inverse spectral density matrix. We take a Bayesian approach to structure learning, placing priors on (i) the graph structure and (ii) spectral matrices given the graph. We leverage a Whittle likelihood approximation and define a conjugate prior---the hyper complex inverse Wishart---on the complex-valued and graph-constrained spectral matrices. Due to conjugacy, we can analytically marginalize the spectral matrices and obtain a closed-form marginal likelihood of the time series given a graph. Importantly, our analytic marginal likelihood allows us to avoid inference of the complex spectral matrices themselves and places us back into the framework of standard (Bayesian) structure learning. In particular, combining this marginal likelihood with our graph prior leads to efficient inference of the time series graph itself, which we base on a stochastic search procedure, though any standard approach can be straightforwardly modified to our time series case. We demonstrate our methods on analyzing stock data and neuroimaging data of brain activity during various auditory tasks.
△ Less
Submitted 3 July, 2015; v1 submitted 12 May, 2015;
originally announced May 2015.
-
Streaming Variational Inference for Bayesian Nonparametric Mixture Models
Authors:
Alex Tank,
Nicholas J. Foti,
Emily B. Fox
Abstract:
In theory, Bayesian nonparametric (BNP) models are well suited to streaming data scenarios due to their ability to adapt model complexity with the observed data. Unfortunately, such benefits have not been fully realized in practice; existing inference algorithms are either not applicable to streaming applications or not extensible to BNP models. For the special case of Dirichlet processes, streami…
▽ More
In theory, Bayesian nonparametric (BNP) models are well suited to streaming data scenarios due to their ability to adapt model complexity with the observed data. Unfortunately, such benefits have not been fully realized in practice; existing inference algorithms are either not applicable to streaming applications or not extensible to BNP models. For the special case of Dirichlet processes, streaming inference has been considered. However, there is growing interest in more flexible BNP models building on the class of normalized random measures (NRMs). We work within this general framework and present a streaming variational inference algorithm for NRM mixture models. Our algorithm is based on assumed density filtering (ADF), leading straightforwardly to expectation propagation (EP) for large-scale batch inference as well. We demonstrate the efficacy of the algorithm on clustering documents in large, streaming text corpora.
△ Less
Submitted 21 April, 2015; v1 submitted 1 December, 2014;
originally announced December 2014.
-
Biased perception leads to biased action: Validating a Bayesian model of interception
Authors:
Alexander Tank,
Alan A. Stocker
Abstract:
We tested whether and how biases in visual perception might influence motor actions. To do so, we designed an interception task in which subjects had to indicate the time when a moving object, whose trajectory was occluded, would reach a target area. Subjects made their judgments based on a brief display of the object's initial motion at a given starting point. Based on the known illusion that slo…
▽ More
We tested whether and how biases in visual perception might influence motor actions. To do so, we designed an interception task in which subjects had to indicate the time when a moving object, whose trajectory was occluded, would reach a target area. Subjects made their judgments based on a brief display of the object's initial motion at a given starting point. Based on the known illusion that slow contrast stimuli appear to move slower than high contrast ones, we predict that if perception directly influences motion actions subjects would show delayed interception times for low contrast objects. In order to provide a more quantitative prediction, we developed a Bayesian model for the complete sensory-motor interception task. Using fit parameters for the prior and likelihood on visual speed from a previous study we were able to predict not only the expected interception times but also the precise characteristics of response variability. Psychophysical experiments confirm the model's predictions. Individual differences in subjects' timing responses can be accounted for by individual differences in the perceptual priors on visual speed. Taken together, our behavioral and model results show that biases in perception percolate downstream and cause action biases that are fully predictable.
△ Less
Submitted 7 March, 2014;
originally announced March 2014.
-
On tail trend detection: modeling relative risk
Authors:
Laurens de Haan,
Albert Klein Tank,
Cláudia Neves
Abstract:
The climate change dispute is about changes over time of environmental characteristics (such as rainfall). Some people say that a possible change is not so much in the mean but rather in the extreme phenomena (that is, the average rainfall may not change much but heavy storms may become more or less frequent). The paper studies changes over time in the probability that some high threshold is excee…
▽ More
The climate change dispute is about changes over time of environmental characteristics (such as rainfall). Some people say that a possible change is not so much in the mean but rather in the extreme phenomena (that is, the average rainfall may not change much but heavy storms may become more or less frequent). The paper studies changes over time in the probability that some high threshold is exceeded. The model is such that the threshold does not need to be specified, the results hold for any high threshold. For simplicity a certain linear trend is studied depending on one real parameter. Estimation and testing procedures (is there a trend?) are developed. Simulation results are presented. The method is applied to trends in heavy rainfall at 18 gauging stations across Germany and The Netherlands. A tentative conclusion is that the trend seems to depend on whether or not a station is close to the sea.
△ Less
Submitted 3 April, 2013; v1 submitted 21 June, 2011;
originally announced June 2011.
-
On the El-Nino Teleconnection to Spring Precipitation in Europe
Authors:
Geert Jan van Oldenborgh,
Gerrit Burgers,
Albert Klein Tank
Abstract:
In a statistical analysis of more than a century of data we find a strong connection between strong warm El Nino winter events and high spring precipitation in a band from Southern England eastwards into Asia. This relationship is an extension of the connection mentioned by Kiladis and Diaz (1989), and much stronger than the winter season teleconnection that has been the subject of other studies…
▽ More
In a statistical analysis of more than a century of data we find a strong connection between strong warm El Nino winter events and high spring precipitation in a band from Southern England eastwards into Asia. This relationship is an extension of the connection mentioned by Kiladis and Diaz (1989), and much stronger than the winter season teleconnection that has been the subject of other studies. Linear correlation coefficients between DJF NINO3 indices and MAM precipitation are higher than r=0.3 for individual stations, and as high as r=0.49 for an index of precipitation anomalies around 50N from 5W to 35E. The lagged correlation suggests that south-east Asian surface temperature anomalies may act as intermediate variables.
△ Less
Submitted 21 December, 1998;
originally announced December 1998.