-
Euclid preparation. LensMC, weak lensing cosmic shear measurement with forward modelling and Markov Chain Monte Carlo sampling
Authors:
Euclid Collaboration,
G. Congedo,
L. Miller,
A. N. Taylor,
N. Cross,
C. A. J. Duncan,
T. Kitching,
N. Martinet,
S. Matthew,
T. Schrabback,
M. Tewes,
N. Welikala,
N. Aghanim,
A. Amara,
S. Andreon,
N. Auricchio,
M. Baldi,
S. Bardelli,
R. Bender,
C. Bodendorf,
D. Bonino,
E. Branchini,
M. Brescia,
J. Brinchmann,
S. Camera
, et al. (217 additional authors not shown)
Abstract:
LensMC is a weak lensing shear measurement method developed for Euclid and Stage-IV surveys. It is based on forward modelling to deal with convolution by a point spread function with comparable size to many galaxies; sampling the posterior distribution of galaxy parameters via Markov Chain Monte Carlo; and marginalisation over nuisance parameters for each of the 1.5 billion galaxies observed by Eu…
▽ More
LensMC is a weak lensing shear measurement method developed for Euclid and Stage-IV surveys. It is based on forward modelling to deal with convolution by a point spread function with comparable size to many galaxies; sampling the posterior distribution of galaxy parameters via Markov Chain Monte Carlo; and marginalisation over nuisance parameters for each of the 1.5 billion galaxies observed by Euclid. The scientific performance is quantified through high-fidelity images based on the Euclid Flagship simulations and emulation of the Euclid VIS images; realistic clustering with a mean surface number density of 250 arcmin$^{-2}$ ($I_{\rm E}<29.5$) for galaxies, and 6 arcmin$^{-2}$ ($I_{\rm E}<26$) for stars; and a diffraction-limited chromatic point spread function with a full width at half maximum of $0.^{\!\prime\prime}2$ and spatial variation across the field of view. Objects are measured with a density of 90 arcmin$^{-2}$ ($I_{\rm E}<26.5$) in 4500 deg$^2$. The total shear bias is broken down into measurement (our main focus here) and selection effects (which will be addressed elsewhere). We find: measurement multiplicative and additive biases of $m_1=(-3.6\pm0.2)\times10^{-3}$, $m_2=(-4.3\pm0.2)\times10^{-3}$, $c_1=(-1.78\pm0.03)\times10^{-4}$, $c_2=(0.09\pm0.03)\times10^{-4}$; a large detection bias with a multiplicative component of $1.2\times10^{-2}$ and an additive component of $-3\times10^{-4}$; and a measurement PSF leakage of $α_1=(-9\pm3)\times10^{-4}$ and $α_2=(2\pm3)\times10^{-4}$. When model bias is suppressed, the obtained measurement biases are close to Euclid requirement and largely dominated by undetected faint galaxies ($-5\times10^{-3}$). Although significant, model bias will be straightforward to calibrate given the weak sensitivity.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
Designing a Data Science simulation with MERITS: A Primer
Authors:
Corrine F Elliott,
James Duncan,
Tiffany M Tang,
Merle Behr,
Karl Kumbier,
Bin Yu
Abstract:
Simulations play a crucial role in the modern scientific process. Yet despite (or due to) their ubiquity, the Data Science community shares neither a comprehensive definition for a "high-quality" study nor a consolidated guide to designing one. Inspired by the Predictability-Computability-Stability (PCS) framework for 'veridical' Data Science, we propose six MERITS that a Data Science simulation s…
▽ More
Simulations play a crucial role in the modern scientific process. Yet despite (or due to) their ubiquity, the Data Science community shares neither a comprehensive definition for a "high-quality" study nor a consolidated guide to designing one. Inspired by the Predictability-Computability-Stability (PCS) framework for 'veridical' Data Science, we propose six MERITS that a Data Science simulation should satisfy. Modularity and Efficiency support the Computability of a study, encouraging clean and flexible implementation. Realism and Stability address the conceptualization of the research problem: How well does a study Predict reality, such that its conclusions generalize to new data/contexts? Finally, Intuitiveness and Transparency encourage good communication and trustworthiness of study design and results. Drawing an analogy between simulation and cooking, we moreover offer (a) a conceptual framework for thinking about the anatomy of a simulation 'recipe'; (b) a baker's dozen in guidelines to aid the Data Science practitioner in designing one; and (c) a case study deconstructing a simulation through the lens of our framework to demonstrate its practical utility. By contributing this "PCS primer" for high-quality Data Science simulation, we seek to distill and enrich the best practices of simulation across disciplines into a cohesive recipe for trustworthy, veridical Data Science.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
A Mixing Time Lower Bound for a Simplified Version of BART
Authors:
Omer Ronen,
Theo Saarinen,
Yan Shuo Tan,
James Duncan,
Bin Yu
Abstract:
Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression algorithm. The posterior is a distribution over sums of decision trees, and predictions are made by averaging approximate samples from the posterior.
The combination of strong predictive performance and the ability to provide uncertainty measures has led BART to be commonly used in the social sciences, bios…
▽ More
Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression algorithm. The posterior is a distribution over sums of decision trees, and predictions are made by averaging approximate samples from the posterior.
The combination of strong predictive performance and the ability to provide uncertainty measures has led BART to be commonly used in the social sciences, biostatistics, and causal inference.
BART uses Markov Chain Monte Carlo (MCMC) to obtain approximate posterior samples over a parameterized space of sums of trees, but it has often been observed that the chains are slow to mix.
In this paper, we provide the first lower bound on the mixing time for a simplified version of BART in which we reduce the sum to a single tree and use a subset of the possible moves for the MCMC proposal distribution. Our lower bound for the mixing time grows exponentially with the number of data points.
Inspired by this new connection between the mixing time and the number of data points, we perform rigorous simulations on BART. We show qualitatively that BART's mixing time increases with the number of data points.
The slow mixing time of the simplified BART suggests a large variation between different runs of the simplified BART algorithm and a similar large variation is known for BART in the literature. This large variation could result in a lack of stability in the models, predictions, and posterior intervals obtained from the BART MCMC samples.
Our lower bound and simulations suggest increasing the number of chains with the number of data points.
△ Less
Submitted 17 October, 2022;
originally announced October 2022.
-
Group Probability-Weighted Tree Sums for Interpretable Modeling of Heterogeneous Data
Authors:
Keyan Nasseri,
Chandan Singh,
James Duncan,
Aaron Kornblith,
Bin Yu
Abstract:
Machine learning in high-stakes domains, such as healthcare, faces two critical challenges: (1) generalizing to diverse data distributions given limited training data while (2) maintaining interpretability. To address these challenges, we propose an instance-weighted tree-sum method that effectively pools data across diverse groups to output a concise, rule-based model. Given distinct groups of in…
▽ More
Machine learning in high-stakes domains, such as healthcare, faces two critical challenges: (1) generalizing to diverse data distributions given limited training data while (2) maintaining interpretability. To address these challenges, we propose an instance-weighted tree-sum method that effectively pools data across diverse groups to output a concise, rule-based model. Given distinct groups of instances in a dataset (e.g., medical patients grouped by age or treatment site), our method first estimates group membership probabilities for each instance. Then, it uses these estimates as instance weights in FIGS (Tan et al. 2022), to grow a set of decision trees whose values sum to the final prediction. We call this new method Group Probability-Weighted Tree Sums (G-FIGS). G-FIGS achieves state-of-the-art prediction performance on important clinical datasets; e.g., holding the level of sensitivity fixed at 92%, G-FIGS increases specificity for identifying cervical spine injury by up to 10% over CART and up to 3% over FIGS alone, with larger gains at higher sensitivity levels. By kee** the total number of rules below 16 in FIGS, the final models remain interpretable, and we find that their rules match medical domain expertise. All code, data, and models are released on Github.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Fast Interpretable Greedy-Tree Sums
Authors:
Yan Shuo Tan,
Chandan Singh,
Keyan Nasseri,
Abhineet Agarwal,
James Duncan,
Omer Ronen,
Matthew Epland,
Aaron Kornblith,
Bin Yu
Abstract:
Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FI…
▽ More
Modern machine learning has achieved impressive prediction performance, but often sacrifices interpretability, a critical consideration in high-stakes domains such as medicine. In such settings, practitioners often use highly interpretable decision tree models, but these suffer from inductive bias against additive structure. To overcome this bias, we propose Fast Interpretable Greedy-Tree Sums (FIGS), which generalizes the CART algorithm to simultaneously grow a flexible number of trees in summation. By combining logical rules with addition, FIGS is able to adapt to additive structure while remaining highly interpretable. Extensive experiments on real-world datasets show that FIGS achieves state-of-the-art prediction performance. To demonstrate the usefulness of FIGS in high-stakes domains, we adapt FIGS to learn clinical decision instruments (CDIs), which are tools for guiding clinical decision-making. Specifically, we introduce a variant of FIGS known as G-FIGS that accounts for the heterogeneity in medical data. G-FIGS derives CDIs that reflect domain knowledge and enjoy improved specificity (by up to 20% over CART) without sacrificing sensitivity or interpretability. To provide further insight into FIGS, we prove that FIGS learns components of additive models, a property we refer to as disentanglement. Further, we show (under oracle conditions) that unconstrained tree-sum models leverage disentanglement to generalize more efficiently than single decision tree models when fitted to additive regression functions. Finally, to avoid overfitting with an unconstrained number of splits, we develop Bagging-FIGS, an ensemble version of FIGS that borrows the variance reduction techniques of random forests. Bagging-FIGS enjoys competitive performance with random forests and XGBoost on real-world datasets.
△ Less
Submitted 8 July, 2023; v1 submitted 27 January, 2022;
originally announced January 2022.
-
Euclid: Covariance of weak lensing pseudo-$C_\ell$ estimates. Calculation, comparison to simulations, and dependence on survey geometry
Authors:
R. E. Upham,
M. L. Brown,
L. Whittaker,
A. Amara,
N. Auricchio,
D. Bonino,
E. Branchini,
M. Brescia,
J. Brinchmann,
V. Capobianco,
C. Carbone,
J. Carretero,
M. Castellano,
S. Cavuoti,
A. Cimatti,
R. Cledassou,
G. Congedo,
L. Conversi,
Y. Copin,
L. Corcione,
M. Cropper,
A. Da Silva,
H. Degaudenzi,
M. Douspis,
F. Dubath
, et al. (80 additional authors not shown)
Abstract:
An accurate covariance matrix is essential for obtaining reliable cosmological results when using a Gaussian likelihood. In this paper we study the covariance of pseudo-$C_\ell$ estimates of tomographic cosmic shear power spectra. Using two existing publicly available codes in combination, we calculate the full covariance matrix, including mode-coupling contributions arising from both partial sky…
▽ More
An accurate covariance matrix is essential for obtaining reliable cosmological results when using a Gaussian likelihood. In this paper we study the covariance of pseudo-$C_\ell$ estimates of tomographic cosmic shear power spectra. Using two existing publicly available codes in combination, we calculate the full covariance matrix, including mode-coupling contributions arising from both partial sky coverage and non-linear structure growth. For three different sky masks, we compare the theoretical covariance matrix to that estimated from publicly available N-body weak lensing simulations, finding good agreement. We find that as a more extreme sky cut is applied, a corresponding increase in both Gaussian off-diagonal covariance and non-Gaussian super-sample covariance is observed in both theory and simulations, in accordance with expectations. Studying the different contributions to the covariance in detail, we find that the Gaussian covariance dominates along the main diagonal and the closest off-diagonals, but further away from the main diagonal the super-sample covariance is dominant. Forming mock constraints in parameters describing matter clustering and dark energy, we find that neglecting non-Gaussian contributions to the covariance can lead to underestimating the true size of confidence regions by up to 70 per cent. The dominant non-Gaussian covariance component is the super-sample covariance, but neglecting the smaller connected non-Gaussian covariance can still lead to the underestimation of uncertainties by 10--20 per cent. A real cosmological analysis will require marginalisation over many nuisance parameters, which will decrease the relative importance of all cosmological contributions to the covariance, so these values should be taken as upper limits on the importance of each component.
△ Less
Submitted 17 February, 2022; v1 submitted 14 December, 2021;
originally announced December 2021.
-
Estimating Reproducible Functional Networks Associated with Task Dynamics using Unsupervised LSTMs
Authors:
Nicha C. Dvornek,
Pamela Ventola,
James S. Duncan
Abstract:
We propose a method for estimating more reproducible functional networks that are more strongly associated with dynamic task activity by using recurrent neural networks with long short term memory (LSTMs). The LSTM model is trained in an unsupervised manner to learn to generate the functional magnetic resonance imaging (fMRI) time-series data in regions of interest. The learned functional networks…
▽ More
We propose a method for estimating more reproducible functional networks that are more strongly associated with dynamic task activity by using recurrent neural networks with long short term memory (LSTMs). The LSTM model is trained in an unsupervised manner to learn to generate the functional magnetic resonance imaging (fMRI) time-series data in regions of interest. The learned functional networks can then be used for further analysis, e.g., correlation analysis to determine functional networks that are strongly associated with an fMRI task paradigm. We test our approach and compare to other methods for decomposing functional networks from fMRI activity on 2 related but separate datasets that employ a biological motion perception task. We demonstrate that the functional networks learned by the LSTM model are more strongly associated with the task activity and dynamics compared to other approaches. Furthermore, the patterns of network association are more closely replicated across subjects within the same dataset as well as across datasets. More reproducible functional networks are essential for better characterizing the neural correlates of a target task.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Quantifying changes in the British cattle movement network
Authors:
Andrew J Duncan,
Aaron Reeves,
George J Gunn,
Roger W Humphry
Abstract:
The Cattle Tracing System database is an online recording system for cattle births, deaths and between--herd movements in the United Kingdom. Although it has been thoroughly examined, the most recently reported movement analysis is from 2009. This article uses the database to construct weighted directed monthly movement networks for two distinct periods of time, 2004--2006 and 2015--2017, to quant…
▽ More
The Cattle Tracing System database is an online recording system for cattle births, deaths and between--herd movements in the United Kingdom. Although it has been thoroughly examined, the most recently reported movement analysis is from 2009. This article uses the database to construct weighted directed monthly movement networks for two distinct periods of time, 2004--2006 and 2015--2017, to quantify by how much the underlying structure of the network has changed. Substantial changes in network structure may influence policy--makers directly or may influence models built upon the network data, and these in turn could impact policy--makers and their assessment of risk. Four general network measures are used (total number of nodes with movements, movements, births and deaths), in conjunction with network metrics to describe each monthly network. Two updates of the database were examined to determine by how much the movement data stored for a particular time period had been cleansed between updates. Statistical models show that there is a statistically significant effect of the time period (2004--2006 vs 2015--2017) in the values of all network measures and six of nine network metrics. Changes in the sizes of both the Giant and Weakly Strongly Connected components predict reductions in the upper and lower bounds of the maximum epidemic size. Examination of the updates of the database show that there are differences in records between updates and therefore evidence of historical data changing between updates. Accurate modelling of disease spread through a network requires representative descriptions of the network. The authors recommend that where possible the most recent available data always be used for network modelling and that methods of network prediction be examined to mitigate for the time required for data to become available.
△ Less
Submitted 4 March, 2021;
originally announced April 2021.
-
Demographic-Guided Attention in Recurrent Neural Networks for Modeling Neuropathophysiological Heterogeneity
Authors:
Nicha C. Dvornek,
Xiaoxiao Li,
Juntang Zhuang,
Pamela Ventola,
James S. Duncan
Abstract:
Heterogeneous presentation of a neurological disorder suggests potential differences in the underlying pathophysiological changes that occur in the brain. We propose to model heterogeneous patterns of functional network differences using a demographic-guided attention (DGA) mechanism for recurrent neural network models for prediction from functional magnetic resonance imaging (fMRI) time-series da…
▽ More
Heterogeneous presentation of a neurological disorder suggests potential differences in the underlying pathophysiological changes that occur in the brain. We propose to model heterogeneous patterns of functional network differences using a demographic-guided attention (DGA) mechanism for recurrent neural network models for prediction from functional magnetic resonance imaging (fMRI) time-series data. The context computed from the DGA head is used to help focus on the appropriate functional networks based on individual demographic information. We demonstrate improved classification on 3 subsets of the ABIDE I dataset used in published studies that have previously produced state-of-the-art results, evaluating performance under a leave-one-site-out cross-validation framework for better generalizeability to new data. Finally, we provide examples of interpreting functional network differences based on individual demographic variables.
△ Less
Submitted 15 April, 2021;
originally announced April 2021.
-
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
Authors:
Juntang Zhuang,
Tommy Tang,
Yifan Ding,
Sekhar Tatikonda,
Nicha Dvornek,
Xenophon Papademetris,
James S. Duncan
Abstract:
Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptiv…
▽ More
Most popular optimizers for deep learning can be broadly categorized as adaptive methods (e.g. Adam) and accelerated schemes (e.g. stochastic gradient descent (SGD) with momentum). For many models such as convolutional neural networks (CNNs), adaptive methods typically converge faster but generalize worse compared to SGD; for complex settings such as generative adversarial networks (GANs), adaptive methods are typically the default because of their stability.We propose AdaBelief to simultaneously achieve three goals: fast convergence as in adaptive methods, good generalization as in SGD, and training stability. The intuition for AdaBelief is to adapt the stepsize according to the "belief" in the current gradient direction. Viewing the exponential moving average (EMA) of the noisy gradient as the prediction of the gradient at the next time step, if the observed gradient greatly deviates from the prediction, we distrust the current observation and take a small step; if the observed gradient is close to the prediction, we trust it and take a large step. We validate AdaBelief in extensive experiments, showing that it outperforms other methods with fast convergence and high accuracy on image classification and language modeling. Specifically, on ImageNet, AdaBelief achieves comparable accuracy to SGD. Furthermore, in the training of a GAN on Cifar10, AdaBelief demonstrates high stability and improves the quality of generated samples compared to a well-tuned Adam optimizer. Code is available at https://github.com/juntang-zhuang/Adabelief-Optimizer
△ Less
Submitted 20 December, 2020; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Pooling Regularized Graph Neural Network for fMRI Biomarker Analysis
Authors:
Xiaoxiao Li,
Yuan Zhou,
Nicha C. Dvornek,
Muhan Zhang,
Juntang Zhuang,
Pamela Ventola,
James S Duncan
Abstract:
Understanding how certain brain regions relate to a specific neurological disorder has been an important area of neuroimaging research. A promising approach to identify the salient regions is using Graph Neural Networks (GNNs), which can be used to analyze graph structured data, e.g. brain networks constructed by functional magnetic resonance imaging (fMRI). We propose an interpretable GNN framewo…
▽ More
Understanding how certain brain regions relate to a specific neurological disorder has been an important area of neuroimaging research. A promising approach to identify the salient regions is using Graph Neural Networks (GNNs), which can be used to analyze graph structured data, e.g. brain networks constructed by functional magnetic resonance imaging (fMRI). We propose an interpretable GNN framework with a novel salient region selection mechanism to determine neurological brain biomarkers associated with disorders. Specifically, we design novel regularized pooling layers that highlight salient regions of interests (ROIs) so that we can infer which ROIs are important to identify a certain disease based on the node pooling scores calculated by the pooling layers. Our proposed framework, Pooling Regularized-GNN (PR-GNN), encourages reasonable ROI-selection and provides flexibility to preserve either individual- or group-level patterns. We apply the PR-GNN framework on a Biopoint Autism Spectral Disorder (ASD) fMRI dataset. We investigate different choices of the hyperparameters and show that PR-GNN outperforms baseline methods in terms of classification accuracy. The salient ROI detection results show high correspondence with the previous neuroimaging-derived biomarkers for ASD.
△ Less
Submitted 29 July, 2020;
originally announced July 2020.
-
Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE
Authors:
Juntang Zhuang,
Nicha Dvornek,
Xiaoxiao Li,
Sekhar Tatikonda,
Xenophon Papademetris,
James Duncan
Abstract:
Neural ordinary differential equations (NODEs) have recently attracted increasing attention; however, their empirical performance on benchmark tasks (e.g. image classification) are significantly inferior to discrete-layer models. We demonstrate an explanation for their poorer performance is the inaccuracy of existing gradient estimation methods: the adjoint method has numerical errors in reverse-m…
▽ More
Neural ordinary differential equations (NODEs) have recently attracted increasing attention; however, their empirical performance on benchmark tasks (e.g. image classification) are significantly inferior to discrete-layer models. We demonstrate an explanation for their poorer performance is the inaccuracy of existing gradient estimation methods: the adjoint method has numerical errors in reverse-mode integration; the naive method directly back-propagates through ODE solvers, but suffers from a redundantly deep computation graph when searching for the optimal stepsize. We propose the Adaptive Checkpoint Adjoint (ACA) method: in automatic differentiation, ACA applies a trajectory checkpoint strategy which records the forward-mode trajectory as the reverse-mode trajectory to guarantee accuracy; ACA deletes redundant components for shallow computation graphs; and ACA supports adaptive solvers. On image classification tasks, compared with the adjoint and naive method, ACA achieves half the error rate in half the training time; NODE trained with ACA outperforms ResNet in both accuracy and test-retest reliability. On time-series modeling, ACA outperforms competing methods. Finally, in an example of the three-body problem, we show NODE with ACA can incorporate physical knowledge to achieve better accuracy. We provide the PyTorch implementation of ACA: \url{https://github.com/juntang-zhuang/torch-ACA}.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Curating a COVID-19 data repository and forecasting county-level death counts in the United States
Authors:
Nick Altieri,
Rebecca L. Barter,
James Duncan,
Raaz Dwivedi,
Karl Kumbier,
Xiao Li,
Robert Netzorg,
Briton Park,
Chandan Singh,
Yan Shuo Tan,
Tiffany Tang,
Yu Wang,
Chao Zhang,
Bin Yu
Abstract:
As the COVID-19 outbreak evolves, accurate forecasting continues to play an extremely important role in informing policy decisions. In this paper, we present our continuous curation of a large data repository containing COVID-19 information from a range of sources. We use this data to develop predictions and corresponding prediction intervals for the short-term trajectory of COVID-19 cumulative de…
▽ More
As the COVID-19 outbreak evolves, accurate forecasting continues to play an extremely important role in informing policy decisions. In this paper, we present our continuous curation of a large data repository containing COVID-19 information from a range of sources. We use this data to develop predictions and corresponding prediction intervals for the short-term trajectory of COVID-19 cumulative death counts at the county-level in the United States up to two weeks ahead. Using data from January 22 to June 20, 2020, we develop and combine multiple forecasts using ensembling techniques, resulting in an ensemble we refer to as Combined Linear and Exponential Predictors (CLEP). Our individual predictors include county-specific exponential and linear predictors, a shared exponential predictor that pools data together across counties, an expanded shared exponential predictor that uses data from neighboring counties, and a demographics-based shared exponential predictor. We use prediction errors from the past five days to assess the uncertainty of our death predictions, resulting in generally-applicable prediction intervals, Maximum (absolute) Error Prediction Intervals (MEPI). MEPI achieves a coverage rate of more than 94% when averaged across counties for predicting cumulative recorded death counts two weeks in the future. Our forecasts are currently being used by the non-profit organization, Response4Life, to determine the medical supply need for individual hospitals and have directly contributed to the distribution of medical supplies across the country. We hope that our forecasts and data repository at https://covidseverity.com can help guide necessary county-specific decision-making and help counties prepare for their continued fight against COVID-19.
△ Less
Submitted 9 August, 2020; v1 submitted 16 May, 2020;
originally announced May 2020.
-
Jointly Discriminative and Generative Recurrent Neural Networks for Learning from fMRI
Authors:
Nicha C. Dvornek,
Xiaoxiao Li,
Juntang Zhuang,
James S. Duncan
Abstract:
Recurrent neural networks (RNNs) were designed for dealing with time-series data and have recently been used for creating predictive models from functional magnetic resonance imaging (fMRI) data. However, gathering large fMRI datasets for learning is a difficult task. Furthermore, network interpretability is unclear. To address these issues, we utilize multitask learning and design a novel RNN-bas…
▽ More
Recurrent neural networks (RNNs) were designed for dealing with time-series data and have recently been used for creating predictive models from functional magnetic resonance imaging (fMRI) data. However, gathering large fMRI datasets for learning is a difficult task. Furthermore, network interpretability is unclear. To address these issues, we utilize multitask learning and design a novel RNN-based model that learns to discriminate between classes while simultaneously learning to generate the fMRI time-series data. Employing the long short-term memory (LSTM) structure, we develop a discriminative model based on the hidden state and a generative model based on the cell state. The addition of the generative model constrains the network to learn functional communities represented by the LSTM nodes that are both consistent with the data generation as well as useful for the classification task. We apply our approach to the classification of subjects with autism vs. healthy controls using several datasets from the Autism Brain Imaging Data Exchange. Experiments show that our jointly discriminative and generative model improves classification learning while also producing robust and meaningful functional communities for better model understanding.
△ Less
Submitted 15 October, 2019;
originally announced October 2019.
-
Decision Explanation and Feature Importance for Invertible Networks
Authors:
Juntang Zhuang,
Nicha C. Dvornek,
Xiaoxiao Li,
Junlin Yang,
James S. Duncan
Abstract:
Deep neural networks are vulnerable to adversarial attacks and hard to interpret because of their black-box nature. The recently proposed invertible network is able to accurately reconstruct the inputs to a layer from its outputs, thus has the potential to unravel the black-box model. An invertible network classifier can be viewed as a two-stage model: (1) invertible transformation from input spac…
▽ More
Deep neural networks are vulnerable to adversarial attacks and hard to interpret because of their black-box nature. The recently proposed invertible network is able to accurately reconstruct the inputs to a layer from its outputs, thus has the potential to unravel the black-box model. An invertible network classifier can be viewed as a two-stage model: (1) invertible transformation from input space to the feature space; (2) a linear classifier in the feature space. We can determine the decision boundary of a linear classifier in the feature space; since the transform is invertible, we can invert the decision boundary from the feature space to the input space. Furthermore, we propose to determine the projection of a data point onto the decision boundary, and define explanation as the difference between data and its projection. Finally, we propose to locally approximate a neural network with its first-order Taylor expansion, and define feature importance using a local linear model. We provide the implementation of our method: \url{https://github.com/juntang-zhuang/explain_invertible}.
△ Less
Submitted 14 October, 2019; v1 submitted 29 September, 2019;
originally announced October 2019.
-
Graph Embedding Using Infomax for ASD Classification and Brain Functional Difference Detection
Authors:
Xiaoxiao Li,
Nicha C. Dvornek,
Juntang Zhuang,
Pamela Ventola,
James Duncan
Abstract:
Significant progress has been made using fMRI to characterize the brain changes that occur in ASD, a complex neuro-developmental disorder. However, due to the high dimensionality and low signal-to-noise ratio of fMRI, embedding informative and robust brain regional fMRI representations for both graph-level classification and region-level functional difference detection tasks between ASD and health…
▽ More
Significant progress has been made using fMRI to characterize the brain changes that occur in ASD, a complex neuro-developmental disorder. However, due to the high dimensionality and low signal-to-noise ratio of fMRI, embedding informative and robust brain regional fMRI representations for both graph-level classification and region-level functional difference detection tasks between ASD and healthy control (HC) groups is difficult. Here, we model the whole brain fMRI as a graph, which preserves geometrical and temporal information and use a Graph Neural Network (GNN) to learn from the graph-structured fMRI data. We investigate the potential of including mutual information (MI) loss (Infomax), which is an unsupervised term encouraging large MI of each nodal representation and its corresponding graph-level summarized representation to learn a better graph embedding. Specifically, this work developed a pipeline including a GNN encoder, a classifier and a discriminator, which forces the encoded nodal representations to both benefit classification and reveal the common nodal patterns in a graph. We simultaneously optimize graph-level classification loss and Infomax. We demonstrated that Infomax graph embedding improves classification performance as a regularization term. Furthermore, we found separable nodal representations of ASD and HC groups in prefrontal cortex, cingulate cortex, visual regions, and other social, emotional and execution related brain regions. In contrast with GNN with classification loss only, the proposed pipeline can facilitate training more robust ASD classification models. Moreover, the separable nodal representations can detect the functional differences between the two groups and contribute to revealing new ASD biomarkers.
△ Less
Submitted 13 August, 2019; v1 submitted 9 August, 2019;
originally announced August 2019.
-
Graph Neural Network for Interpreting Task-fMRI Biomarkers
Authors:
Xiaoxiao Li,
Nicha C. Dvornek,
Yuan Zhou,
Juntang Zhuang,
Pamela Ventola,
James S. Duncan
Abstract:
Finding the biomarkers associated with ASD is helpful for understanding the underlying roots of the disorder and can lead to earlier diagnosis and more targeted treatment. A promising approach to identify biomarkers is using Graph Neural Networks (GNNs), which can be used to analyze graph structured data, i.e. brain networks constructed by fMRI. One way to interpret important features is through l…
▽ More
Finding the biomarkers associated with ASD is helpful for understanding the underlying roots of the disorder and can lead to earlier diagnosis and more targeted treatment. A promising approach to identify biomarkers is using Graph Neural Networks (GNNs), which can be used to analyze graph structured data, i.e. brain networks constructed by fMRI. One way to interpret important features is through looking at how the classification probability changes if the features are occluded or replaced. The major limitation of this approach is that replacing values may change the distribution of the data and lead to serious errors. Therefore, we develop a 2-stage pipeline to eliminate the need to replace features for reliable biomarker interpretation. Specifically, we propose an inductive GNN to embed the graphs containing different properties of task-fMRI for identifying ASD and then discover the brain regions/sub-graphs used as evidence for the GNN classifier. We first show GNN can achieve high accuracy in identifying ASD. Next, we calculate the feature importance scores using GNN and compare the interpretation ability with Random Forest. Finally, we run with different atlases and parameters, proving the robustness of the proposed method. The detected biomarkers reveal their association with social behaviors. We also show the potential of discovering new informative biomarkers. Our pipeline can be generalized to other graph feature importance interpretation problems.
△ Less
Submitted 11 July, 2019; v1 submitted 2 July, 2019;
originally announced July 2019.
-
Efficient Interpretation of Deep Learning Models Using Graph Structure and Cooperative Game Theory: Application to ASD Biomarker Discovery
Authors:
Xiaoxiao Li,
Nicha C. Dvornek,
Yuan Zhou,
Juntang Zhuang,
Pamela Ventola,
James S. Duncan
Abstract:
Discovering imaging biomarkers for autism spectrum disorder (ASD) is critical to help explain ASD and predict or monitor treatment outcomes. Toward this end, deep learning classifiers have recently been used for identifying ASD from functional magnetic resonance imaging (fMRI) with higher accuracy than traditional learning strategies. However, a key challenge with deep learning models is understan…
▽ More
Discovering imaging biomarkers for autism spectrum disorder (ASD) is critical to help explain ASD and predict or monitor treatment outcomes. Toward this end, deep learning classifiers have recently been used for identifying ASD from functional magnetic resonance imaging (fMRI) with higher accuracy than traditional learning strategies. However, a key challenge with deep learning models is understanding just what image features the network is using, which can in turn be used to define the biomarkers. Current methods extract biomarkers, i.e., important features, by looking at how the prediction changes if "ignoring" one feature at a time. In this work, we go beyond looking at only individual features by using Shapley value explanation (SVE) from cooperative game theory. Cooperative game theory is advantageous here because it directly considers the interaction between features and can be applied to any machine learning method, making it a novel, more accurate way of determining instance-wise biomarker importance from deep learning models. A barrier to using SVE is its computational complexity: $2^N$ given $N$ features. We explicitly reduce the complexity of SVE computation by two approaches based on the underlying graph structure of the input data: 1) only consider the centralized coalition of each feature; 2) a hierarchical pipeline which first clusters features into small communities, then applies SVE in each community. Monte Carlo approximation can be used for large permutation sets. We first validate our methods on the MNIST dataset and compare to human perception. Next, to insure plausibility of our biomarker results, we train a Random Forest (RF) to classify ASD/control subjects from fMRI and compare SVE results to standard RF-based feature importance. Finally, we show initial results on ranked fMRI biomarkers using SVE on a deep learning classifier for the ASD/control dataset.
△ Less
Submitted 13 March, 2019; v1 submitted 14 December, 2018;
originally announced December 2018.
-
Prediction of severity and treatment outcome for ASD from fMRI
Authors:
Juntang Zhuang,
Nicha C. Dvornek,
Xiaoxiao Li,
Pamela Ventola,
James S. Duncan
Abstract:
Autism spectrum disorder (ASD) is a complex neurodevelopmental syndrome. Early diagnosis and precise treatment are essential for ASD patients. Although researchers have built many analytical models, there has been limited progress in accurate predictive models for early diagnosis. In this project, we aim to build an accurate model to predict treatment outcome and ASD severity from early stage func…
▽ More
Autism spectrum disorder (ASD) is a complex neurodevelopmental syndrome. Early diagnosis and precise treatment are essential for ASD patients. Although researchers have built many analytical models, there has been limited progress in accurate predictive models for early diagnosis. In this project, we aim to build an accurate model to predict treatment outcome and ASD severity from early stage functional magnetic resonance imaging (fMRI) scans. The difficulty in building large databases of patients who have received specific treatments and the high dimensionality of medical image analysis problems are challenges in this work. We propose a generic and accurate two-level approach for high-dimensional regression problems in medical image analysis. First, we perform region-level feature selection using a predefined brain parcellation. Based on the assumption that voxels within one region in the brain have similar values, for each region we use the bootstrapped mean of voxels within it as a feature. In this way, the dimension of data is reduced from number of voxels to number of regions. Then we detect predictive regions by various feature selection methods. Second, we extract voxels within selected regions, and perform voxel-level feature selection. To use this model in both linear and non-linear cases with limited training examples, we apply two-level elastic net regression and random forest (RF) models respectively. To validate accuracy and robustness of this approach, we perform experiments on both task-fMRI and resting state fMRI datasets. Furthermore, we visualize the influence of each region, and show that the results match well with other findings.
△ Less
Submitted 28 October, 2018;
originally announced October 2018.
-
Prediction of treatment outcome for autism from structure of the brain based on sure independence screening
Authors:
Juntang Zhuang,
Nicha C. Dvornek,
Qingyu Zhao,
Xiaoxiao Li,
Pamela Ventola,
James S. Duncan
Abstract:
Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder, and behavioral treatment interventions have shown promise for young children with ASD. However, there is limited progress in understanding the effect of each type of treatment. In this project, we aim to detect structural changes in the brain after treatment and select structural features associated with treatment outcomes. T…
▽ More
Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder, and behavioral treatment interventions have shown promise for young children with ASD. However, there is limited progress in understanding the effect of each type of treatment. In this project, we aim to detect structural changes in the brain after treatment and select structural features associated with treatment outcomes. The difficulty in building large databases of patients who have received specific treatments and the high dimensionality of medical image analysis problems are the challenges in this work. To select predictive features and build accurate models, we use the sure independence screening (SIS) method. SIS is a theoretically and empirically validated method for ultra-high dimensional general linear models, and it achieves both predictive accuracy and correct feature selection by iterative feature selection. Compared with step-wise feature selection methods, SIS removes multiple features in each iteration and is computationally efficient. Compared with other linear models such as elastic-net regression, support vector regression (SVR) and partial least squares regression (PSLR), SIS achieves higher accuracy. We validated the superior performance of SIS in various experiments: First, we extract brain structural features from FreeSurfer, including cortical thickness, surface area, mean curvature and cortical volume. Next, we predict different measures of treatment outcomes based on structural features. We show that SIS achieves the highest correlation between prediction and measurements in all tasks. Furthermore, we report regions selected by SIS as biomarkers for ASD.
△ Less
Submitted 25 February, 2019; v1 submitted 17 October, 2018;
originally announced October 2018.
-
Prediction of Autism Treatment Response from Baseline fMRI using Random Forests and Tree Bagging
Authors:
Nicha C. Dvornek,
Daniel Yang,
Archana Venkataraman,
Pamela Ventola,
Lawrence H. Staib,
Kevin A. Pelphrey,
James S. Duncan
Abstract:
Treating children with autism spectrum disorders (ASD) with behavioral interventions, such as Pivotal Response Treatment (PRT), has shown promise in recent studies. However, deciding which therapy is best for a given patient is largely by trial and error, and choosing an ineffective intervention results in loss of valuable treatment time. We propose predicting patient response to PRT from baseline…
▽ More
Treating children with autism spectrum disorders (ASD) with behavioral interventions, such as Pivotal Response Treatment (PRT), has shown promise in recent studies. However, deciding which therapy is best for a given patient is largely by trial and error, and choosing an ineffective intervention results in loss of valuable treatment time. We propose predicting patient response to PRT from baseline task-based fMRI by the novel application of a random forest and tree bagging strategy. Our proposed learning pipeline uses random forest regression to determine candidate brain voxels that may be informative in predicting treatment response. The candidate voxels are then tested stepwise for inclusion in a bagged tree ensemble. After the predictive model is constructed, bias correction is performed to further increase prediction accuracy. Using data from 19 ASD children who underwent a 16 week trial of PRT and a leave-one-out cross-validation framework, the presented learning pipeline was tested against several standard methods and variations of the pipeline and resulted in the highest prediction accuracy.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.