Search | arXiv e-print repository

Variable Importance in High-Dimensional Settings Requires Grou**

Authors: Ahmad Chamma, Bertrand Thirion, Denis A. Engemann

Abstract: Explaining the decision process of machine learning algorithms is nowadays crucial for both model's performance enhancement and human comprehension. This can be achieved by assessing the variable importance of single variables, even for high-capacity non-linear methods, e.g. Deep Neural Networks (DNNs). While only removal-based approaches, such as Permutation Importance (PI), can bring statistical… ▽ More Explaining the decision process of machine learning algorithms is nowadays crucial for both model's performance enhancement and human comprehension. This can be achieved by assessing the variable importance of single variables, even for high-capacity non-linear methods, e.g. Deep Neural Networks (DNNs). While only removal-based approaches, such as Permutation Importance (PI), can bring statistical validity, they return misleading results when variables are correlated. Conditional Permutation Importance (CPI) bypasses PI's limitations in such cases. However, in high-dimensional settings, where high correlations between the variables cancel their conditional importance, the use of CPI as well as other methods leads to unreliable results, besides prohibitive computation costs. Grou** variables statistically via clustering or some prior knowledge gains some power back and leads to better interpretations. In this work, we introduce BCPI (Block-Based Conditional Permutation Importance), a new generic framework for variable importance computation with statistical guarantees handling both single and group cases. Furthermore, as handling groups with high cardinality (such as a set of observations of a given modality) are both time-consuming and resource-intensive, we also introduce a new stacking approach extending the DNN architecture with sub-linear layers adapted to the group structure. We show that the ensuing approach extended with stacking controls the type-I error even with highly-correlated groups and shows top accuracy across benchmarks. Furthermore, we perform a real-world data analysis in a large-scale medical dataset where we aim to show the consistency between our results and the literature for a biomarker prediction. △ Less

Submitted 17 December, 2023; originally announced December 2023.

arXiv:2310.10373 [pdf, other]

False Discovery Proportion control for aggregated Knockoffs

Authors: Alexandre Blain, Bertrand Thirion, Olivier Grisel, Pierre Neuvial

Abstract: Controlled variable selection is an important analytical step in various scientific fields, such as brain imaging or genomics. In these high-dimensional data settings, considering too many variables leads to poor models and high costs, hence the need for statistical guarantees on false positives. Knockoffs are a popular statistical tool for conditional variable selection in high dimension. However… ▽ More Controlled variable selection is an important analytical step in various scientific fields, such as brain imaging or genomics. In these high-dimensional data settings, considering too many variables leads to poor models and high costs, hence the need for statistical guarantees on false positives. Knockoffs are a popular statistical tool for conditional variable selection in high dimension. However, they control for the expected proportion of false discoveries (FDR) and not their actual proportion (FDP). We present a new method, KOPI, that controls the proportion of false discoveries for Knockoff-based inference. The proposed method also relies on a new type of aggregation to address the undesirable randomness associated with classical Knockoff inference. We demonstrate FDP control and substantial power gains over existing Knockoff-based methods in various simulation settings and achieve good sensitivity/specificity tradeoffs on brain imaging and genomic data. △ Less

Submitted 16 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2309.07593 [pdf, other]

Statistically Valid Variable Importance Assessment through Conditional Permutations

Authors: Ahmad Chamma, Denis A. Engemann, Bertrand Thirion

Abstract: Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip sid… ▽ More Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An experiment on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods. △ Less

Submitted 25 October, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

arXiv:2309.05768 [pdf]

The Past, Present, and Future of the Brain Imaging Data Structure (BIDS)

Authors: Russell A. Poldrack, Christopher J. Markiewicz, Stefan Appelhoff, Yoni K. Ashar, Tibor Auer, Sylvain Baillet, Shashank Bansal, Leandro Beltrachini, Christian G. Benar, Giacomo Bertazzoli, Suyash Bhogawar, Ross W. Blair, Marta Bortoletto, Mathieu Boudreau, Teon L. Brooks, Vince D. Calhoun, Filippo Maria Castelli, Patricia Clement, Alexander L Cohen, Julien Cohen-Adad, Sasha D'Ambrosio, Gilles de Hollander, María de la iglesia-Vayá, Alejandro de la Vega, Arnaud Delorme , et al. (89 additional authors not shown)

Abstract: The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities. This paper is meant as a history of how the standard has developed and grown over time. We outline the principles behind the project, the mechanisms by which it has been extended, and some of the challenges being addressed as it evolves.… ▽ More The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities. This paper is meant as a history of how the standard has developed and grown over time. We outline the principles behind the project, the mechanisms by which it has been extended, and some of the challenges being addressed as it evolves. We also discuss the lessons learned through the project, with the aim of enabling researchers in other domains to learn from the success of BIDS. △ Less

Submitted 8 January, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2305.13863 [pdf, other]

Probing Brain Context-Sensitivity with Masked-Attention Generation

Authors: Alexandre Pasquiou, Yair Lakretz, Bertrand Thirion, Christophe Pallier

Abstract: Two fundamental questions in neurolinguistics concerns the brain regions that integrate information beyond the lexical level, and the size of their window of integration. To address these questions we introduce a new approach named masked-attention generation. It uses GPT-2 transformers to generate word embeddings that capture a fixed amount of contextual information. We then tested whether these… ▽ More Two fundamental questions in neurolinguistics concerns the brain regions that integrate information beyond the lexical level, and the size of their window of integration. To address these questions we introduce a new approach named masked-attention generation. It uses GPT-2 transformers to generate word embeddings that capture a fixed amount of contextual information. We then tested whether these embeddings could predict fMRI brain activity in humans listening to naturalistic text. The results showed that most of the cortex within the language network is sensitive to contextual information, and that the right hemisphere is more sensitive to longer contexts than the left. Masked-attention generation supports previous analyses of context-sensitivity in the brain, and complements them by quantifying the window size of context integration per voxel. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 2 pages, 2 figures, CCN 2023

Journal ref: CCN 2023

arXiv:2302.14389 [pdf, other]

Information-Restricted Neural Language Models Reveal Different Brain Regions' Sensitivity to Semantics, Syntax and Context

Authors: Alexandre Pasquiou, Yair Lakretz, Bertrand Thirion, Christophe Pallier

Abstract: A fundamental question in neurolinguistics concerns the brain regions involved in syntactic and semantic processing during speech comprehension, both at the lexical (word processing) and supra-lexical levels (sentence and discourse processing). To what extent are these regions separated or intertwined? To address this question, we trained a lexical language model, Glove, and a supra-lexical langua… ▽ More A fundamental question in neurolinguistics concerns the brain regions involved in syntactic and semantic processing during speech comprehension, both at the lexical (word processing) and supra-lexical levels (sentence and discourse processing). To what extent are these regions separated or intertwined? To address this question, we trained a lexical language model, Glove, and a supra-lexical language model, GPT-2, on a text corpus from which we selectively removed either syntactic or semantic information. We then assessed to what extent these information-restricted models were able to predict the time-courses of fMRI signal of humans listening to naturalistic text. We also manipulated the size of contextual information provided to GPT-2 in order to determine the windows of integration of brain regions involved in supra-lexical processing. Our analyses show that, while most brain regions involved in language are sensitive to both syntactic and semantic variables, the relative magnitudes of these effects vary a lot across these regions. Furthermore, we found an asymmetry between the left and right hemispheres, with semantic and syntactic processing being more dissociated in the left hemisphere than in the right, and the left and right hemispheres showing respectively greater sensitivity to short and long contexts. The use of information-restricted NLP models thus shed new light on the spatial organization of syntactic processing, semantic processing and compositionality. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: 19 pages, 8 figures, 10 pages of Appendix, 5 appendix figures

arXiv:2208.13724 [pdf, other]

FDP control in multivariate linear models using the bootstrap

Authors: Samuel Davenport, Bertrand Thirion, Pierre Neuvial

Abstract: In this article we develop a method for performing post hoc inference of the False Discovery Proportion (FDP) over multiple contrasts of interest in the multivariate linear model. To do so we use the bootstrap to simulate from the distribution of the null contrasts. We combine the bootstrap with the post hoc inference bounds of Blanchard (2020) and prove that doing so provides simultaneous asympto… ▽ More In this article we develop a method for performing post hoc inference of the False Discovery Proportion (FDP) over multiple contrasts of interest in the multivariate linear model. To do so we use the bootstrap to simulate from the distribution of the null contrasts. We combine the bootstrap with the post hoc inference bounds of Blanchard (2020) and prove that doing so provides simultaneous asymptotic control of the FDP over all subsets of hypotheses. This requires us to demonstrate consistency of the multivariate bootstrap in the linear model, which we do via the Lindeberg Central Limit Theorem, providing a simpler proof of this result than that of Eck (2018). We demonstrate, via simulations, that our approach provides simultaneous control of the FDP over all subsets and is typically more powerful than existing, state of the art, parametric methods. We illustrate our approach on functional Magnetic Resonance Imaging data from the Human Connectome project and on a transcriptomic dataset of chronic obstructive pulmonary disease. △ Less

Submitted 20 September, 2022; v1 submitted 29 August, 2022; originally announced August 2022.

arXiv:2207.03380 [pdf, other]

Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps

Authors: Alexandre Pasquiou, Yair Lakretz, John Hale, Bertrand Thirion, Christophe Pallier

Abstract: Neural Language Models (NLMs) have made tremendous advances during the last years, achieving impressive performance on various linguistic tasks. Capitalizing on this, studies in neuroscience have started to use NLMs to study neural activity in the human brain during language processing. However, many questions remain unanswered regarding which factors determine the ability of a neural language mod… ▽ More Neural Language Models (NLMs) have made tremendous advances during the last years, achieving impressive performance on various linguistic tasks. Capitalizing on this, studies in neuroscience have started to use NLMs to study neural activity in the human brain during language processing. However, many questions remain unanswered regarding which factors determine the ability of a neural language model to capture brain activity (aka its 'brain score'). Here, we make first steps in this direction and examine the impact of test loss, training corpus and model architecture (comparing GloVe, LSTM, GPT-2 and BERT), on the prediction of functional Magnetic Resonance Imaging timecourses of participants listening to an audiobook. We find that (1) untrained versions of each model already explain significant amount of signal in the brain by capturing similarity in brain responses across identical words, with the untrained LSTM outperforming the transformerbased models, being less impacted by the effect of context; (2) that training NLP models improves brain scores in the same brain regions irrespective of the model's architecture; (3) that Perplexity (test loss) is not a good predictor of brain score; (4) that training data have a strong influence on the outcome and, notably, that off-the-shelf models may lack statistical power to detect brain activations. Overall, we outline the impact of modeltraining choices, and suggest good practices for future studies aiming at explaining the human language system using neural language models. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Journal ref: ICML 2022 - 39th International Conference on Machine Learning, Jul 2022, Baltimore, United States. pp.18

arXiv:2206.09398 [pdf, other]

Aligning individual brains with Fused Unbalanced Gromov-Wasserstein

Authors: Alexis Thual, Huy Tran, Tatiana Zemskova, Nicolas Courty, Rémi Flamary, Stanislas Dehaene, Bertrand Thirion

Abstract: Individual brains vary in both anatomy and functional organization, even within a given species. Inter-individual variability is a major impediment when trying to draw generalizable conclusions from neuroimaging data collected on groups of subjects. Current co-registration procedures rely on limited data, and thus lead to very coarse inter-subject alignments. In this work, we present a novel metho… ▽ More Individual brains vary in both anatomy and functional organization, even within a given species. Inter-individual variability is a major impediment when trying to draw generalizable conclusions from neuroimaging data collected on groups of subjects. Current co-registration procedures rely on limited data, and thus lead to very coarse inter-subject alignments. In this work, we present a novel method for inter-subject alignment based on Optimal Transport, denoted as Fused Unbalanced Gromov Wasserstein (FUGW). The method aligns cortical surfaces based on the similarity of their functional signatures in response to a variety of stimulation settings, while penalizing large deformations of individual topographic organization. We demonstrate that FUGW is well-suited for whole-brain landmark-free alignment. The unbalanced feature allows to deal with the fact that functional areas vary in size across subjects. Our results show that FUGW alignment significantly increases between-subject correlation of activity for independent functional data, and leads to more precise map** at the group level. △ Less

Submitted 22 August, 2023; v1 submitted 19 June, 2022; originally announced June 2022.

Journal ref: Advances in Neural Information Processing Systems, 35 (2022) 21792-21804

arXiv:2205.14613 [pdf, other]

A Conditional Randomization Test for Sparse Logistic Regression in High-Dimension

Authors: Binh T. Nguyen, Bertrand Thirion, Sylvain Arlot

Abstract: Identifying the relevant variables for a classification model with correct confidence levels is a central but difficult task in high-dimension. Despite the core role of sparse logistic regression in statistics and machine learning, it still lacks a good solution for accurate inference in the regime where the number of features $p$ is as large as or larger than the number of samples $n$. Here, we t… ▽ More Identifying the relevant variables for a classification model with correct confidence levels is a central but difficult task in high-dimension. Despite the core role of sparse logistic regression in statistics and machine learning, it still lacks a good solution for accurate inference in the regime where the number of features $p$ is as large as or larger than the number of samples $n$. Here, we tackle this problem by improving the Conditional Randomization Test (CRT). The original CRT algorithm shows promise as a way to output p-values while making few assumptions on the distribution of the test statistics. As it comes with a prohibitive computational cost even in mildly high-dimensional problems, faster solutions based on distillation have been proposed. Yet, they rely on unrealistic hypotheses and result in low-power solutions. To improve this, we propose \emph{CRT-logit}, an algorithm that combines a variable-distillation step and a decorrelation step that takes into account the geometry of $\ell_1$-penalized logistic regression problem. We provide a theoretical analysis of this procedure, and demonstrate its effectiveness on simulations, along with experiments on large-scale brain-imaging and genomics datasets. △ Less

Submitted 29 May, 2022; originally announced May 2022.

arXiv:2204.10572 [pdf, other]

doi 10.1016/j.neuroimage.2022.119492

Notip: Non-parametric True Discovery Proportion control for brain imaging

Authors: Alexandre Blain, Bertrand Thirion, Pierre Neuvial

Abstract: Cluster-level inference procedures are widely used for brain map**. These methods compare the size of clusters obtained by thresholding brain maps to an upper bound under the global null hypothesis, computed using Random Field Theory or permutations. However, the guarantees obtained by this type of inference - i.e. at least one voxel is truly activated in the cluster - are not informative with r… ▽ More Cluster-level inference procedures are widely used for brain map**. These methods compare the size of clusters obtained by thresholding brain maps to an upper bound under the global null hypothesis, computed using Random Field Theory or permutations. However, the guarantees obtained by this type of inference - i.e. at least one voxel is truly activated in the cluster - are not informative with regards to the strength of the signal therein. There is thus a need for methods to assess the amount of signal within clusters; yet such methods have to take into account that clusters are defined based on the data, which creates circularity in the inference scheme. This has motivated the use of post hoc estimates that allow statistically valid estimation of the proportion of activated voxels in clusters. In the context of fMRI data, the All-Resolutions Inference framework introduced in [25] provides post hoc estimates of the proportion of activated voxels. However, this method relies on parametric threshold families, which results in conservative inference. In this paper, we leverage randomization methods to adapt to data characteristics and obtain tighter false discovery control. We obtain Notip, for Non-parametric True Discovery Proportion control: a powerful, non-parametric method that yields statistically valid guarantees on the proportion of activated voxels in data-derived clusters. Numerical experiments demonstrate substantial gains in number of detections compared with state-of-the-art methods on 36 fMRI datasets. The conditions under which the proposed method brings benefits are also discussed. △ Less

Submitted 21 July, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: NeuroImage (2022)

Journal ref: NeuroImage (2022), 119492

arXiv:2110.13502 [pdf, other]

Shared Independent Component Analysis for Multi-Subject Neuroimaging

Authors: Hugo Richard, Pierre Ablin, Bertrand Thirion, Alexandre Gramfort, Aapo Hyvärinen

Abstract: We consider shared response modeling, a multi-view learning problem where one wants to identify common components from multiple datasets or views. We introduce Shared Independent Component Analysis (ShICA) that models each view as a linear transform of shared independent components contaminated by additive Gaussian noise. We show that this model is identifiable if the components are either non-Gau… ▽ More We consider shared response modeling, a multi-view learning problem where one wants to identify common components from multiple datasets or views. We introduce Shared Independent Component Analysis (ShICA) that models each view as a linear transform of shared independent components contaminated by additive Gaussian noise. We show that this model is identifiable if the components are either non-Gaussian or have enough diversity in noise variances. We then show that in some cases multi-set canonical correlation analysis can recover the correct unmixing matrices, but that even a small amount of sampling noise makes Multiset CCA fail. To solve this problem, we propose to use joint diagonalization after Multiset CCA, leading to a new approach called ShICA-J. We show via simulations that ShICA-J leads to improved results while being very fast to fit. While ShICA-J is based on second-order statistics, we further propose to leverage non-Gaussianity of the components using a maximum-likelihood method, ShICA-ML, that is both more accurate and more costly. Further, ShICA comes with a principled method for shared components estimation. Finally, we provide empirical evidence on fMRI and MEG datasets that ShICA yields more accurate estimation of the components than alternatives. △ Less

Submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted at NeurIPS 2021

arXiv:2110.06135 [pdf, other]

Label scarcity in biomedicine: Data-rich latent factor discovery enhances phenotype prediction

Authors: Marc-Andre Schulz, Bertrand Thirion, Alexandre Gramfort, Gaël Varoquaux, Danilo Bzdok

Abstract: High-quality data accumulation is now becoming ubiquitous in the health domain. There is increasing opportunity to exploit rich data from normal subjects to improve supervised estimators in specific diseases with notorious data scarcity. We demonstrate that low-dimensional embedding spaces can be derived from the UK Biobank population dataset and used to enhance data-scarce prediction of health in… ▽ More High-quality data accumulation is now becoming ubiquitous in the health domain. There is increasing opportunity to exploit rich data from normal subjects to improve supervised estimators in specific diseases with notorious data scarcity. We demonstrate that low-dimensional embedding spaces can be derived from the UK Biobank population dataset and used to enhance data-scarce prediction of health indicators, lifestyle and demographic characteristics. Phenotype predictions facilitated by Variational Autoencoder manifolds typically scaled better with increasing unlabeled data than dimensionality reduction by PCA or Isomap. Performances gains from semisupervison approaches will probably become an important ingredient for various medical data science applications. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: Accepted at NIPS 2017 Workshop on Machine Learning for Health

arXiv:2107.06104 [pdf, other]

Functional Magnetic Resonance Imaging data augmentation through conditional ICA

Authors: Badr Ta**i, Hugo Richard, Bertrand Thirion

Abstract: Advances in computational cognitive neuroimaging research are related to the availability of large amounts of labeled brain imaging data, but such data are scarce and expensive to generate. While powerful data generation mechanisms, such as Generative Adversarial Networks (GANs), have been designed in the last decade for computer vision, such improvements have not yet carried over to brain imaging… ▽ More Advances in computational cognitive neuroimaging research are related to the availability of large amounts of labeled brain imaging data, but such data are scarce and expensive to generate. While powerful data generation mechanisms, such as Generative Adversarial Networks (GANs), have been designed in the last decade for computer vision, such improvements have not yet carried over to brain imaging. A likely reason is that GANs training is ill-suited to the noisy, high-dimensional and small-sample data available in functional neuroimaging. In this paper, we introduce Conditional Independent Components Analysis (Conditional ICA): a fast functional Magnetic Resonance Imaging (fMRI) data augmentation technique, that leverages abundant resting-state data to create images by sampling from an ICA decomposition. We then propose a mechanism to condition the generator on classes observed with few samples. We first show that the generative mechanism is successful at synthesizing data indistinguishable from observations, and that it yields gains in classification accuracy in brain decoding problems. In particular it outperforms GANs while being much easier to optimize and interpret. Lastly, Conditional ICA enhances classification accuracy in eight datasets without further parameters tuning. △ Less

Submitted 14 July, 2021; v1 submitted 11 July, 2021; originally announced July 2021.

Comments: 14 pages, 5 figures, 7 tables

arXiv:2106.02590 [pdf, other]

Spatially relaxed inference on high-dimensional linear models

Authors: Jérôme-Alexis Chevalier, Tuan-Binh Nguyen, Bertrand Thirion, Joseph Salmon

Abstract: We consider the inference problem for high-dimensional linear models, when covariates have an underlying spatial organization reflected in their correlation. A typical example of such a setting is high-resolution imaging, in which neighboring pixels are usually very similar. Accurate point and confidence intervals estimation is not possible in this context with many more covariates than samples, f… ▽ More We consider the inference problem for high-dimensional linear models, when covariates have an underlying spatial organization reflected in their correlation. A typical example of such a setting is high-resolution imaging, in which neighboring pixels are usually very similar. Accurate point and confidence intervals estimation is not possible in this context with many more covariates than samples, furthermore with high correlation between covariates. This calls for a reformulation of the statistical inference problem, that takes into account the underlying spatial structure: if covariates are locally correlated, it is acceptable to detect them up to a given spatial uncertainty. We thus propose to rely on the $δ$-FWER, that is the probability of making a false discovery at a distance greater than $δ$ from any true positive. With this target measure in mind, we study the properties of ensembled clustered inference algorithms which combine three techniques: spatially constrained clustering, statistical inference, and ensembling to aggregate several clustered inference solutions. We show that ensembled clustered inference algorithms control the $δ$-FWER under standard assumptions for $δ$ equal to the largest cluster diameter. We complement the theoretical analysis with empirical results, demonstrating accurate $δ$-FWER control and decent power achieved by such inference algorithms. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2102.10964 [pdf, other]

Adaptive Multi-View ICA: Estimation of noise levels for optimal inference

Authors: Hugo Richard, Pierre Ablin, Aapo Hyvärinen, Alexandre Gramfort, Bertrand Thirion

Abstract: We consider a multi-view learning problem known as group independent component analysis (group ICA), where the goal is to recover shared independent sources from many views. The statistical modeling of this problem requires to take noise into account. When the model includes additive noise on the observations, the likelihood is intractable. By contrast, we propose Adaptive multiView ICA (AVICA), a… ▽ More We consider a multi-view learning problem known as group independent component analysis (group ICA), where the goal is to recover shared independent sources from many views. The statistical modeling of this problem requires to take noise into account. When the model includes additive noise on the observations, the likelihood is intractable. By contrast, we propose Adaptive multiView ICA (AVICA), a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources. In this setting, the likelihood has a tractable expression, which enables either direct optimization of the log-likelihood using a quasi-Newton method, or generalized EM. Importantly, we consider that the noise levels are also parameters that are learned from the data. This enables sources estimation with a closed-form Minimum Mean Squared Error (MMSE) estimator which weights each view according to its relative noise level. On synthetic data, AVICA yields better sources estimates than other group ICA methods thanks to its explicit MMSE estimator. On real magnetoencephalograpy (MEG) data, we provide evidence that the decomposition is less sensitive to sampling noise and that the noise variance estimates are biologically plausible. Lastly, on functional magnetic resonance imaging (fMRI) data, AVICA exhibits best performance in transferring information across views. △ Less

Submitted 22 February, 2021; originally announced February 2021.

arXiv:2009.14310 [pdf, other]

Statistical control for spatio-temporal MEG/EEG source imaging with desparsified multi-task Lasso

Authors: Jérôme-Alexis Chevalier, Alexandre Gramfort, Joseph Salmon, Bertrand Thirion

Abstract: Detecting where and when brain regions activate in a cognitive task or in a given clinical condition is the promise of non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG). This problem, referred to as source localization, or source imaging, poses however a high-dimensional statistical inference challenge. While sparsity promoting regularizations have been prop… ▽ More Detecting where and when brain regions activate in a cognitive task or in a given clinical condition is the promise of non-invasive techniques like magnetoencephalography (MEG) or electroencephalography (EEG). This problem, referred to as source localization, or source imaging, poses however a high-dimensional statistical inference challenge. While sparsity promoting regularizations have been proposed to address the regression problem, it remains unclear how to ensure statistical control of false detections. Moreover, M/EEG source imaging requires to work with spatio-temporal data and autocorrelated noise. To deal with this, we adapt the desparsified Lasso estimator -- an estimator tailored for high dimensional linear model that asymptotically follows a Gaussian distribution under sparsity and moderate feature correlation assumptions -- to temporal data corrupted with autocorrelated noise. We call it the desparsified multi-task Lasso (d-MTLasso). We combine d-MTLasso with spatially constrained clustering to reduce data dimension and with ensembling to mitigate the arbitrary choice of clustering; the resulting estimator is called ensemble of clustered desparsified multi-task Lasso (ecd-MTLasso). With respect to the current procedures, the two advantages of ecd-MTLasso are that i)it offers statistical guarantees and ii)it allows to trade spatial specificity for sensitivity, leading to a powerful adaptive method. Extensive simulations on realistic head geometries, as well as empirical results on various MEG datasets, demonstrate the high recovery performance of ecd-MTLasso and its primary practical benefit: offer a statistically principled way to threshold MEG/EEG source maps. △ Less

Submitted 25 November, 2020; v1 submitted 29 September, 2020; originally announced September 2020.

Comments: 21 pages

arXiv:2006.06635 [pdf, other]

Modeling Shared Responses in Neuroimaging Studies through MultiView ICA

Authors: Hugo Richard, Luigi Gresele, Aapo Hyvärinen, Bertrand Thirion, Alexandre Gramfort, Pierre Ablin

Abstract: Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. However, the aggregation of data coming from multiple subjects is challenging, since it requires accounting for large variability in anatomy, functional topography and stimulus response across individuals. Data modeling is especially hard for ecologically relevant condit… ▽ More Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. However, the aggregation of data coming from multiple subjects is challenging, since it requires accounting for large variability in anatomy, functional topography and stimulus response across individuals. Data modeling is especially hard for ecologically relevant conditions such as movie watching, where the experimental setup does not imply well-defined cognitive operations. We propose a novel MultiView Independent Component Analysis (ICA) model for group studies, where data from each subject are modeled as a linear combination of shared independent sources plus noise. Contrary to most group-ICA procedures, the likelihood of the model is available in closed form. We develop an alternate quasi-Newton method for maximizing the likelihood, which is robust and converges quickly. We demonstrate the usefulness of our approach first on fMRI data, where our model demonstrates improved sensitivity in identifying common sources among subjects. Moreover, the sources recovered by our model exhibit lower between-session variability than other methods.On magnetoencephalography (MEG) data, our method yields more accurate source localization on phantom data. Applied on 200 subjects from the Cam-CAN dataset it reveals a clear sequence of evoked activity in sensor and source space. The code is freely available at https://github.com/hugorichard/multiviewica. △ Less

Submitted 24 December, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

Comments: Accepted to NeurIPS 2020

arXiv:2003.05405 [pdf, other]

Fine-grain atlases of functional modes for fMRI analysis

Authors: Kamalaker Dadi, Gaël Varoquaux, Antonia Machlouzarides-Shalit, Krzysztof J. Gorgolewski, Demian Wassermann, Bertrand Thirion, Arthur Mensch

Abstract: Population imaging markedly increased the size of functional-imaging datasets, shedding new light on the neural basis of inter-individual differences. Analyzing these large data entails new scalability challenges, computational and statistical. For this reason, brain images are typically summarized in a few signals, for instance reducing voxel-level measures with brain atlases or functional modes.… ▽ More Population imaging markedly increased the size of functional-imaging datasets, shedding new light on the neural basis of inter-individual differences. Analyzing these large data entails new scalability challenges, computational and statistical. For this reason, brain images are typically summarized in a few signals, for instance reducing voxel-level measures with brain atlases or functional modes. A good choice of the corresponding brain networks is important, as most data analyses start from these reduced signals. We contribute finely-resolved atlases of functional modes, comprising from 64 to 1024 networks. These dictionaries of functional modes (DiFuMo) are trained on millions of fMRI functional brain volumes of total size 2.4TB, spanned over 27 studies and many research groups. We demonstrate the benefits of extracting reduced signals on our fine-grain atlases for many classic functional data analysis pipelines: stimuli decoding from 12,334 brain responses, standard GLM analysis of fMRI across sessions and individuals, extraction of resting-state functional-connectomes biomarkers for 2,500 individuals, data compression and meta-analysis over more than 15,000 statistical maps. In each of these analysis scenarii, we compare the performance of our functional atlases with that of other popular references, and to a simple voxel-level analysis. Results highlight the importance of using high-dimensional "soft" functional atlases, to represent and analyse brain activity while capturing its functional gradients. Analyses on high-dimensional modes achieve similar statistical performance as at the voxel level, but with much reduced computational cost and higher interpretability. In addition to making them available, we provide meaningful names for these modes, based on their anatomical location. It will facilitate reporting of results. △ Less

Submitted 5 March, 2020; originally announced March 2020.

arXiv:2002.09269 [pdf, other]

Aggregation of Multiple Knockoffs

Authors: Tuan-Binh Nguyen, Jérôme-Alexis Chevalier, Bertrand Thirion, Sylvain Arlot

Abstract: We develop an extension of the Knockoff Inference procedure, introduced by Barber and Candes (2015). This new method, called Aggregation of Multiple Knockoffs (AKO), addresses the instability inherent to the random nature of Knockoff-based inference. Specifically, AKO improves both the stability and power compared with the original Knockoff algorithm while still maintaining guarantees for False Di… ▽ More We develop an extension of the Knockoff Inference procedure, introduced by Barber and Candes (2015). This new method, called Aggregation of Multiple Knockoffs (AKO), addresses the instability inherent to the random nature of Knockoff-based inference. Specifically, AKO improves both the stability and power compared with the original Knockoff algorithm while still maintaining guarantees for False Discovery Rate control. We provide a new inference procedure, prove its core properties, and demonstrate its benefits in a set of experiments on synthetic and real datasets. △ Less

Submitted 25 June, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

Comments: Accepted to ICML 2020 (Thirty-seventh International Conference on Machine Learning). This version includes both the main text of the conference paper and supplementary materials (as appendices). 35 pages, 7 figures

arXiv:2002.09261 [pdf, other]

NeuroQuery: comprehensive meta-analysis of human brain map**

Authors: Jérôme Dockès, Russell Poldrack, Romain Primet, Hande Gözükan, Tal Yarkoni, Fabian Suchanek, Bertrand Thirion, Gaël Varoquaux

Abstract: Reaching a global view of brain organization requires assembling evidence on widely different mental processes and mechanisms. The variety of human neuroscience concepts and terminology poses a fundamental challenge to relating brain imaging results across the scientific literature. Existing meta-analysis methods perform statistical tests on sets of publications associated with a particular concep… ▽ More Reaching a global view of brain organization requires assembling evidence on widely different mental processes and mechanisms. The variety of human neuroscience concepts and terminology poses a fundamental challenge to relating brain imaging results across the scientific literature. Existing meta-analysis methods perform statistical tests on sets of publications associated with a particular concept. Thus, large-scale meta-analyses only tackle single terms that occur frequently. We propose a new paradigm, focusing on prediction rather than inference. Our multivariate model predicts the spatial distribution of neurological observations, given text describing an experiment, cognitive process, or disease. This approach handles text of arbitrary length and terms that are too rare for standard meta-analysis. We capture the relationships and neural correlates of 7 547 neuroscience terms across 13 459 neuroimaging publications. The resulting meta-analytic tool, neuroquery.org, can ground hypothesis generation and data-analysis priors on a comprehensive view of published findings on the brain. △ Less

Submitted 21 February, 2020; originally announced February 2020.

arXiv:1910.01914 [pdf, other]

Multi-subject MEG/EEG source imaging with sparse multi-task regression

Authors: Hicham Janati, Thomas Bazeille, Bertrand Thirion, Marco Cuturi, Alexandre Gramfort

Abstract: Magnetoencephalography and electroencephalography (M/EEG) are non-invasive modalities that measure the weak electromagnetic fields generated by neural activity. Estimating the location and magnitude of the current sources that generated these electromagnetic fields is a challenging ill-posed regression problem known as \emph{source imaging}. When considering a group study, a common approach consis… ▽ More Magnetoencephalography and electroencephalography (M/EEG) are non-invasive modalities that measure the weak electromagnetic fields generated by neural activity. Estimating the location and magnitude of the current sources that generated these electromagnetic fields is a challenging ill-posed regression problem known as \emph{source imaging}. When considering a group study, a common approach consists in carrying out the regression tasks independently for each subject. An alternative is to jointly localize sources for all subjects taken together, while enforcing some similarity between them. By pooling all measurements in a single multi-task regression, one makes the problem better posed, offering the ability to identify more sources and with greater precision. The Minimum Wasserstein Estimates (MWE) promotes focal activations that do not perfectly overlap for all subjects, thanks to a regularizer based on Optimal Transport (OT) metrics. MWE promotes spatial proximity on the cortical mantel while co** with the varying noise levels across subjects. On realistic simulations, MWE decreases the localization error by up to 4 mm per source compared to individual solutions. Experiments on the Cam-CAN dataset show a considerable improvement in spatial specificity in population imaging. Our analysis of a multimodal dataset shows how multi-subject source localization closes the gap between MEG and fMRI for brain map**. △ Less

Submitted 14 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

Comments: version 2. arXiv admin note: text overlap with arXiv:1902.04812

arXiv:1909.12537 [pdf, other]

Fast shared response model for fMRI data

Authors: Hugo Richard, Lucas Martin, Ana Luısa Pinho, Jonathan Pillow, Bertrand Thirion

Abstract: The shared response model provides a simple but effective framework to analyse fMRI data of subjects exposed to naturalistic stimuli. However when the number of subjects or runs is large, fitting the model requires a large amount of memory and computational power, which limits its use in practice. In this work, we introduce the FastSRM algorithm that relies on an intermediate atlas-based represent… ▽ More The shared response model provides a simple but effective framework to analyse fMRI data of subjects exposed to naturalistic stimuli. However when the number of subjects or runs is large, fitting the model requires a large amount of memory and computational power, which limits its use in practice. In this work, we introduce the FastSRM algorithm that relies on an intermediate atlas-based representation. It provides considerable speed-up in time and memory usage, hence it allows easy and fast large-scale analysis of naturalistic-stimulus fMRI data. Using four different datasets, we show that our method matches the performance of the original SRM algorithm while being about 5x faster and 20x to 40x more memory efficient. Based on this contribution, we use FastSRM to predict age from movie watching data on the CamCAN sample. Besides delivering accurate predictions (mean absolute error of 7.5 years), FastSRM extracts topographic patterns that are predictive of age, demonstrating that brain activity during free perception reflects age. △ Less

Submitted 3 December, 2019; v1 submitted 27 September, 2019; originally announced September 2019.

arXiv:1903.04955 [pdf, other]

ECKO: Ensemble of Clustered Knockoffs for multivariate inference on fMRI data

Authors: Tuan-Binh Nguyen, Jérôme-Alexis Chevalier, Bertrand Thirion

Abstract: Continuous improvement in medical imaging techniques allows the acquisition of higher-resolution images. When these are used in a predictive setting, a greater number of explanatory variables are potentially related to the dependent variable (the response). Meanwhile, the number of acquisitions per experiment remains limited. In such high dimension/small sample size setting, it is desirable to fin… ▽ More Continuous improvement in medical imaging techniques allows the acquisition of higher-resolution images. When these are used in a predictive setting, a greater number of explanatory variables are potentially related to the dependent variable (the response). Meanwhile, the number of acquisitions per experiment remains limited. In such high dimension/small sample size setting, it is desirable to find the explanatory variables that are truly related to the response while controlling the rate of false discoveries. To achieve this goal, novel multivariate inference procedures, such as knockoff inference, have been proposed recently. However, they require the feature covariance to be well-defined, which is impossible in high-dimensional settings. In this paper, we propose a new algorithm, called Ensemble of Clustered Knockoffs, that allows to select explanatory variables while controlling the false discovery rate (FDR), up to a prescribed spatial tolerance. The core idea is that knockoff-based inference can be applied on groups (clusters) of voxels, which drastically reduces the problem's dimension; an ensembling step then removes the dependence on a fixed clustering and stabilizes the results. We benchmark this algorithm and other FDR-controlling methods on brain imaging datasets and observe empirical gains in sensitivity, while the false discovery rate is controlled at the nominal level. △ Less

Submitted 12 March, 2019; originally announced March 2019.

Comments: Accepted to 26th International Conference on Information Processing in Medical Imaging (IPMI)

arXiv:1902.04812 [pdf, other]

Group level MEG/EEG source imaging via optimal transport: minimum Wasserstein estimates

Authors: Hicham Janati, Thomas Bazeille, Bertrand Thirion, Marco Cuturi, Alexandre Gramfort

Abstract: Magnetoencephalography (MEG) and electroencephalogra-phy (EEG) are non-invasive modalities that measure the weak electromagnetic fields generated by neural activity. Inferring the location of the current sources that generated these magnetic fields is an ill-posed inverse problem known as source imaging. When considering a group study, a baseline approach consists in carrying out the estimation of… ▽ More Magnetoencephalography (MEG) and electroencephalogra-phy (EEG) are non-invasive modalities that measure the weak electromagnetic fields generated by neural activity. Inferring the location of the current sources that generated these magnetic fields is an ill-posed inverse problem known as source imaging. When considering a group study, a baseline approach consists in carrying out the estimation of these sources independently for each subject. The ill-posedness of each problem is typically addressed using sparsity promoting regularizations. A straightforward way to define a common pattern for these sources is then to average them. A more advanced alternative relies on a joint localization of sources for all subjects taken together, by enforcing some similarity across all estimated sources. An important advantage of this approach is that it consists in a single estimation in which all measurements are pooled together, making the inverse problem better posed. Such a joint estimation poses however a few challenges, notably the selection of a valid regularizer that can quantify such spatial similarities. We propose in this work a new procedure that can do so while taking into account the geometrical structure of the cortex. We call this procedure Minimum Wasserstein Estimates (MWE). The benefits of this model are twofold. First, joint inference allows to pool together the data of different brain geometries, accumulating more spatial information. Second, MWE are defined through Optimal Transport (OT) metrics which provide a tool to model spatial proximity between cortical sources of different subjects, hence not enforcing identical source location in the group. These benefits allow MWE to be more accurate than standard MEG source localization techniques. To support these claims, we perform source localization on realistic MEG simulations based on forward operators derived from MRI scans. On a visual task dataset, we demonstrate how MWE infer neural patterns similar to functional Magnetic Resonance Imaging (fMRI) maps. △ Less

Submitted 13 February, 2019; originally announced February 2019.

arXiv:1809.06304 [pdf, other]

Approximate message-passing for convex optimization with non-separable penalties

Authors: Andre Manoel, Florent Krzakala, Gaël Varoquaux, Bertrand Thirion, Lenka Zdeborová

Abstract: We introduce an iterative optimization scheme for convex objectives consisting of a linear loss and a non-separable penalty, based on the expectation-consistent approximation and the vector approximate message-passing (VAMP) algorithm. Specifically, the penalties we approach are convex on a linear transformation of the variable to be determined, a notable example being total variation (TV). We des… ▽ More We introduce an iterative optimization scheme for convex objectives consisting of a linear loss and a non-separable penalty, based on the expectation-consistent approximation and the vector approximate message-passing (VAMP) algorithm. Specifically, the penalties we approach are convex on a linear transformation of the variable to be determined, a notable example being total variation (TV). We describe the connection between message-passing algorithms -- typically used for approximate inference -- and proximal methods for optimization, and show that our scheme is, as VAMP, similar in nature to the Peaceman-Rachford splitting, with the important difference that stepsizes are set adaptively. Finally, we benchmark the performance of our VAMP-like iteration in problems where TV penalties are useful, namely classification in task fMRI and reconstruction in tomography, and show faster convergence than that of state-of-the-art approaches such as FISTA and ADMM in most settings. △ Less

Submitted 17 September, 2018; originally announced September 2018.

Comments: 18 pages, 6 figures

arXiv:1809.06035 [pdf, other]

Extracting representations of cognition across neuroimaging studies improves brain decoding

Authors: Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux

Abstract: Cognitive brain imaging is accumulating datasets about the neural substrate of many different mental processes. Yet, most studies are based on few subjects and have low statistical power. Analyzing data across studies could bring more statistical power; yet the current brain-imaging analytic framework cannot be used at scale as it requires casting all cognitive tasks in a unified theoretical frame… ▽ More Cognitive brain imaging is accumulating datasets about the neural substrate of many different mental processes. Yet, most studies are based on few subjects and have low statistical power. Analyzing data across studies could bring more statistical power; yet the current brain-imaging analytic framework cannot be used at scale as it requires casting all cognitive tasks in a unified theoretical framework. We introduce a new methodology to analyze brain responses across tasks without a joint model of the psychological processes. The method boosts statistical power in small studies with specific cognitive focus by analyzing them jointly with large studies that probe less focal mental processes. Our approach improves decoding performance for 80% of 35 widely-different functional-imaging studies. It finds commonalities across tasks in a data-driven way, via common brain representations that predict mental processes. These are brain networks tuned to psychological manipulations. They outline interpretable and plausible brain structures. The extracted networks have been made available; they can be readily reused in new neuro-imaging studies. We provide a multi-study decoding tool to adapt to new data. △ Less

Submitted 19 May, 2021; v1 submitted 17 September, 2018; originally announced September 2018.

Journal ref: PLoS Computational Biology, Public Library of Science, 2021

arXiv:1809.02440 [pdf, other]

Optimizing deep video representation to match brain activity

Authors: Hugo Richard, Ana Pinho, Bertrand Thirion, Guillaume Charpiat

Abstract: The comparison of observed brain activity with the statistics generated by artificial intelligence systems is useful to probe brain functional organization under ecological conditions. Here we study fMRI activity in ten subjects watching color natural movies and compute deep representations of these movies with an architecture that relies on optical flow and image content. The association of activ… ▽ More The comparison of observed brain activity with the statistics generated by artificial intelligence systems is useful to probe brain functional organization under ecological conditions. Here we study fMRI activity in ten subjects watching color natural movies and compute deep representations of these movies with an architecture that relies on optical flow and image content. The association of activity in visual areas with the different layers of the deep architecture displays complexity-related contrasts across visual areas and reveals a striking foveal/peripheral dichotomy. △ Less

Submitted 7 September, 2018; originally announced September 2018.

Journal ref: 2018 Conference on Cognitive Computational Neuroscience, Sep 2018, Philadelphia, United States

arXiv:1807.11718 [pdf, other]

Feature Grou** as a Stochastic Regularizer for High-Dimensional Structured Data

Authors: Sergul Aydore, Bertrand Thirion, Gael Varoquaux

Abstract: In many applications where collecting data is expensive, for example neuroscience or medical imaging, the sample size is typically small compared to the feature dimension. It is challenging in this setting to train expressive, non-linear models without overfitting. These datasets call for intelligent regularization that exploits known structure, such as correlations between the features arising fr… ▽ More In many applications where collecting data is expensive, for example neuroscience or medical imaging, the sample size is typically small compared to the feature dimension. It is challenging in this setting to train expressive, non-linear models without overfitting. These datasets call for intelligent regularization that exploits known structure, such as correlations between the features arising from the measurement device. However, existing structured regularizers need specially crafted solvers, which are difficult to apply to complex models. We propose a new regularizer specifically designed to leverage structure in the data in a way that can be applied efficiently to complex models. Our approach relies on feature grou**, using a fast clustering algorithm inside a stochastic gradient descent loop: given a family of feature grou**s that capture feature covariations, we randomly select these groups at each iteration. We show that this approach amounts to enforcing a denoising regularizer on the solution. The method is easy to implement in many model architectures, such as fully connected neural networks, and has a linear computational cost. We apply this regularizer to a real-world fMRI dataset and the Olivetti Faces datasets. Experiments on both datasets demonstrate that the proposed approach produces models that generalize better than those trained with conventional regularizers, and also improves convergence speed. △ Less

Submitted 22 April, 2019; v1 submitted 31 July, 2018; originally announced July 2018.

Comments: 12 pages, 14 figures

Journal ref: ICML2019

arXiv:1806.05829 [pdf, other]

Statistical Inference with Ensemble of Clustered Desparsified Lasso

Authors: Jérôme-Alexis Chevalier, Joseph Salmon, Bertrand Thirion

Abstract: Medical imaging involves high-dimensional data, yet their acquisition is obtained for limited samples. Multivariate predictive models have become popular in the last decades to fit some external variables from imaging data, and standard algorithms yield point estimates of the model parameters. It is however challenging to attribute confidence to these parameter estimates, which makes solutions har… ▽ More Medical imaging involves high-dimensional data, yet their acquisition is obtained for limited samples. Multivariate predictive models have become popular in the last decades to fit some external variables from imaging data, and standard algorithms yield point estimates of the model parameters. It is however challenging to attribute confidence to these parameter estimates, which makes solutions hardly trustworthy. In this paper we present a new algorithm that assesses parameters statistical significance and that can scale even when the number of predictors p $\ge$ 10^5 is much higher than the number of samples n $\le$ 10^3 , by lever-aging structure among features. Our algorithm combines three main ingredients: a powerful inference procedure for linear models --the so-called Desparsified Lasso-- feature clustering and an ensembling step. We first establish that Desparsified Lasso alone cannot handle n p regimes; then we demonstrate that the combination of clustering and ensembling provides an accurate solution, whose specificity is controlled. We also demonstrate stability improvements on two neuroimaging datasets. △ Less

Submitted 15 June, 2018; originally announced June 2018.

arXiv:1806.01139 [pdf, other]

Text to brain: predicting the spatial distribution of neuroimaging observations from text reports

Authors: Jérôme Dockès, Demian Wassermann, Russell Poldrack, Fabian Suchanek, Bertrand Thirion, Gaël Varoquaux

Abstract: Despite the digital nature of magnetic resonance imaging, the resulting observations are most frequently reported and stored in text documents. There is a trove of information untapped in medical health records, case reports, and medical publications. In this paper, we propose to mine brain medical publications to learn the spatial distribution associated with anatomical terms. The problem is form… ▽ More Despite the digital nature of magnetic resonance imaging, the resulting observations are most frequently reported and stored in text documents. There is a trove of information untapped in medical health records, case reports, and medical publications. In this paper, we propose to mine brain medical publications to learn the spatial distribution associated with anatomical terms. The problem is formulated in terms of minimization of a risk on distributions which leads to a least-deviation cost function. An efficient algorithm in the dual then learns the map** from documents to brain structures. Empirical results using coordinates extracted from the brain-imaging literature show that i) models must adapt to semantic variation in the terms used to describe a given anatomical structure, ii) voxel-wise parameterization leads to higher likelihood of locations reported in unseen documents, iii) least-deviation cost outperforms least-square. As a proof of concept for our method, we use our model of spatial distributions to predict the distribution of specific neurological conditions from text-only reports. △ Less

Submitted 28 June, 2018; v1 submitted 4 June, 2018; originally announced June 2018.

Journal ref: MICCAI 2018 - 21st International Conference on Medical Image Computing and Computer Assisted Intervention, Sep 2018, Granada, Spain. pp.1-18, 2018

arXiv:1710.11438 [pdf, other]

Learning Neural Representations of Human Cognition across Many fMRI Studies

Authors: Arthur Mensch, Julien Mairal, Danilo Bzdok, Bertrand Thirion, Gaël Varoquaux

Abstract: Cognitive neuroscience is enjoying rapid increase in extensive public brain-imaging datasets. It opens the door to large-scale statistical models. Finding a unified perspective for all available data calls for scalable and automated solutions to an old challenge: how to aggregate heterogeneous information on brain function into a universal cognitive system that relates mental operations/cognitive… ▽ More Cognitive neuroscience is enjoying rapid increase in extensive public brain-imaging datasets. It opens the door to large-scale statistical models. Finding a unified perspective for all available data calls for scalable and automated solutions to an old challenge: how to aggregate heterogeneous information on brain function into a universal cognitive system that relates mental operations/cognitive processes/psychological tasks to brain networks? We cast this challenge in a machine-learning approach to predict conditions from statistical brain maps across different studies. For this, we leverage multi-task learning and multi-scale dimension reduction to learn low-dimensional representations of brain images that carry cognitive information and can be robustly associated with psychological stimuli. Our multi-dataset classification model achieves the best prediction performance on several large reference datasets, compared to models without cognitive-aware low-dimension representations, it brings a substantial performance boost to the analysis of small datasets, and can be introspected to identify universal template cognitive concepts. △ Less

Submitted 10 November, 2017; v1 submitted 31 October, 2017; originally announced October 2017.

Comments: Advances in Neural Information Processing Systems, Dec 2017, Long Beach, United States. 2017

Journal ref: Advances in Neural Information Processing Systems, 2017

arXiv:1701.05363 [pdf, other]

doi 10.1109/TSP.2017.2752697

Stochastic Subsampling for Factorizing Huge Matrices

Authors: Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux

Abstract: We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns. Learned factors may be sparse or dense and/or non-negative, which makes our algorithm suitable for dictionary learning, sparse component analysis, and non-negative matrix factorization. Our algorithm streams matrix columns while subsampling them to iteratively learn the matrix facto… ▽ More We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns. Learned factors may be sparse or dense and/or non-negative, which makes our algorithm suitable for dictionary learning, sparse component analysis, and non-negative matrix factorization. Our algorithm streams matrix columns while subsampling them to iteratively learn the matrix factors. At each iteration, the row dimension of a new sample is reduced by subsampling, resulting in lower time complexity compared to a simple streaming algorithm. Our method comes with convergence guarantees to reach a stationary point of the matrix-factorization problem. We demonstrate its efficiency on massive functional Magnetic Resonance Imaging data (2 TB), and on patches extracted from hyperspectral images (103 GB). For both problems, which involve different penalties on rows and columns, we obtain significant speed-ups compared to state-of-the-art algorithms. △ Less

Submitted 30 October, 2017; v1 submitted 19 January, 2017; originally announced January 2017.

Comments: IEEE Transactions on Signal Processing, Institute of Electrical and Electronics Engineers, A Paraître

Journal ref: IEEE Transactions on Signal Processing, 2018, 66 (1), pp 113-128

arXiv:1611.10041 [pdf, other]

Subsampled online matrix factorization with convergence guarantees

Authors: Arthur Mensch, Julien Mairal, Gaël Varoquaux, Bertrand Thirion

Abstract: We present a matrix factorization algorithm that scales to input matrices that are large in both dimensions (i.e., that contains morethan 1TB of data). The algorithm streams the matrix columns while subsampling them, resulting in low complexity per iteration andreasonable memory footprint. In contrast to previous online matrix factorization methods, our approach relies on low-dimensional statistic… ▽ More We present a matrix factorization algorithm that scales to input matrices that are large in both dimensions (i.e., that contains morethan 1TB of data). The algorithm streams the matrix columns while subsampling them, resulting in low complexity per iteration andreasonable memory footprint. In contrast to previous online matrix factorization methods, our approach relies on low-dimensional statistics from past iterates to control the extra variance introduced by subsampling. We present a convergence analysis that guarantees us to reach a stationary point of the problem. Large speed-ups can be obtained compared to previous online algorithms that do not perform subsampling, thanks to the feature redundancy that often exists in high-dimensional settings. △ Less

Submitted 30 November, 2016; originally announced November 2016.

Journal ref: 9th NIPS Workshop on Optimization for Machine Learning, Dec 2016, Barcelone, Spain

arXiv:1611.06066 [pdf, other]

doi 10.1016/j.neuroimage.2016.10.045

Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example

Authors: Alexandre Abraham, Michael Milham, Adriana Di Martino, R. Cameron Craddock, Dimitris Samaras, Bertrand Thirion, Gaël Varoquaux

Abstract: Resting-state functional Magnetic Resonance Imaging (R-fMRI) holds the promise to reveal functional biomarkers of neuropsychiatric disorders. However, extracting such biomarkers is challenging for complex multi-faceted neuropatholo-gies, such as autism spectrum disorders. Large multi-site datasets increase sample sizes to compensate for this complexity, at the cost of uncontrolled heterogeneity. T… ▽ More Resting-state functional Magnetic Resonance Imaging (R-fMRI) holds the promise to reveal functional biomarkers of neuropsychiatric disorders. However, extracting such biomarkers is challenging for complex multi-faceted neuropatholo-gies, such as autism spectrum disorders. Large multi-site datasets increase sample sizes to compensate for this complexity, at the cost of uncontrolled heterogeneity. This heterogeneity raises new challenges, akin to those face in realistic diagnostic applications. Here, we demonstrate the feasibility of inter-site classification of neuropsychiatric status, with an application to the Autism Brain Imaging Data Exchange (ABIDE) database, a large (N=871) multi-site autism dataset. For this purpose, we investigate pipelines that extract the most predictive biomarkers from the data. These R-fMRI pipelines build participant-specific connectomes from functionally-defined brain areas. Connectomes are then compared across participants to learn patterns of connectivity that differentiate typical controls from individuals with autism. We predict this neuropsychiatric status for participants from the same acquisition sites or different, unseen, ones. Good choices of methods for the various steps of the pipeline lead to 67% prediction accuracy on the full ABIDE data, which is significantly better than previously reported results. We perform extensive validation on multiple subsets of the data defined by different inclusion criteria. These enables detailed analysis of the factors contributing to successful connectome-based prediction. First, prediction accuracy improves as we include more subjects, up to the maximum amount of subjects available. Second, the definition of functional brain areas is of paramount importance for biomarker discovery: brain areas extracted from large R-fMRI datasets outperform reference atlases in the classification tasks. △ Less

Submitted 18 November, 2016; originally announced November 2016.

Comments: in NeuroImage, Elsevier, 2016

arXiv:1609.04608 [pdf, other]

doi 10.1109/TPAMI.2018.2815524

Recursive nearest agglomeration (ReNA): fast clustering for approximation of structured signals

Authors: Andrés Hoyos-Idrobo, Gaël Varoquaux, Jonas Kahn, Bertrand Thirion

Abstract: In this work, we revisit fast dimension reduction approaches, as with random projections and random sampling. Our goal is to summarize the data to decrease computational costs and memory footprint of subsequent analysis. Such dimension reduction can be very efficient when the signals of interest have a strong structure, such as with images. We focus on this setting and investigate feature clusteri… ▽ More In this work, we revisit fast dimension reduction approaches, as with random projections and random sampling. Our goal is to summarize the data to decrease computational costs and memory footprint of subsequent analysis. Such dimension reduction can be very efficient when the signals of interest have a strong structure, such as with images. We focus on this setting and investigate feature clustering schemes for data reductions that capture this structure. An impediment to fast dimension reduction is that good clustering comes with large algorithmic costs. We address it by contributing a linear-time agglomerative clustering scheme, Recursive Nearest Agglomeration (ReNA). Unlike existing fast agglomerative schemes, it avoids the creation of giant clusters. We empirically validate that it approximates the data as well as traditional variance-minimizing clustering schemes that have a quadratic complexity. In addition, we analyze signal approximation with feature clustering and show that it can remove noise, improving subsequent analysis steps. As a consequence, data reduction by clustering features with ReNA yields very fast and accurate models, enabling to process large datasets on budget. Our theoretical analysis is backed by extensive experiments on publicly-available data that illustrate the computation efficiency and the denoising properties of the resulting dimension reduction scheme. △ Less

Submitted 19 March, 2018; v1 submitted 15 September, 2016; originally announced September 2016.

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, In press

arXiv:1606.06439 [pdf, other]

Social-sparsity brain decoders: faster spatial sparsity

Authors: Gaël Varoquaux, Matthieu Kowalski, Bertrand Thirion

Abstract: Spatially-sparse predictors are good models for brain decoding: they give accurate predictions and their weight maps are interpretable as they focus on a small number of regions. However, the state of the art, based on total variation or graph-net, is computationally costly. Here we introduce sparsity in the local neighborhood of each voxel with social-sparsity, a structured shrinkage operator. We… ▽ More Spatially-sparse predictors are good models for brain decoding: they give accurate predictions and their weight maps are interpretable as they focus on a small number of regions. However, the state of the art, based on total variation or graph-net, is computationally costly. Here we introduce sparsity in the local neighborhood of each voxel with social-sparsity, a structured shrinkage operator. We find that, on brain imaging classification problems, social-sparsity performs almost as well as total-variation models and better than graph-net, for a fraction of the computational cost. It also very clearly outlines predictive regions. We give details of the model and the algorithm. △ Less

Submitted 21 June, 2016; originally announced June 2016.

Comments: in Pattern Recognition in NeuroImaging, Jun 2016, Trento, Italy. 2016

arXiv:1606.05201 [pdf, other]

doi 10.1016/j.neuroimage.2016.10.038

Assessing and tuning brain decoders: cross-validation, caveats, and guidelines

Authors: Gaël Varoquaux, Pradeep Reddy Raamana, Denis Engemann, Andrés Hoyos-Idrobo, Yannick Schwartz, Bertrand Thirion

Abstract: Decoding, ie prediction from brain images or signals, calls for empirical evaluation of its predictive power. Such evaluation is achieved via cross-validation, a method also used to tune decoders' hyper-parameters. This paper is a review on cross-validation procedures for decoding in neuroimaging. It includes a didactic overview of the relevant theoretical considerations. Practical aspects are hig… ▽ More Decoding, ie prediction from brain images or signals, calls for empirical evaluation of its predictive power. Such evaluation is achieved via cross-validation, a method also used to tune decoders' hyper-parameters. This paper is a review on cross-validation procedures for decoding in neuroimaging. It includes a didactic overview of the relevant theoretical considerations. Practical aspects are highlighted with an extensive empirical study of the common decoders in within-and across-subject predictions, on multiple datasets --anatomical and functional MRI and MEG-- and simulations. Theory and experiments outline that the popular " leave-one-out " strategy leads to unstable and biased estimates, and a repeated random splits method should be preferred. Experiments outline the large error bars of cross-validation in neuroimaging settings: typical confidence intervals of 10%. Nested cross-validation can tune decoders' parameters while avoiding circularity bias. However we find that it can be more favorable to use sane defaults, in particular for non-sparse decoders. △ Less

Submitted 7 November, 2016; v1 submitted 16 June, 2016; originally announced June 2016.

Comments: NeuroImage, Elsevier, 2016

arXiv:1605.00937 [pdf, other]

Dictionary Learning for Massive Matrix Factorization

Authors: Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux

Abstract: Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising. Its applicability to large datasets has been addressed with online and randomized methods, that reduce the complexity in one of the matrix dimension, but not in both of them. In this paper, we tackle very large matrices in both dimensions. We… ▽ More Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising. Its applicability to large datasets has been addressed with online and randomized methods, that reduce the complexity in one of the matrix dimension, but not in both of them. In this paper, we tackle very large matrices in both dimensions. We propose a new factoriza-tion method that scales gracefully to terabyte-scale datasets, that could not be processed by previous algorithms in a reasonable amount of time. We demonstrate the efficiency of our approach on massive functional Magnetic Resonance Imaging (fMRI) data, and on matrix completion problems for recommender systems, where we obtain significant speed-ups compared to state-of-the art coordinate descent methods. △ Less

Submitted 26 May, 2016; v1 submitted 3 May, 2016; originally announced May 2016.

Journal ref: Proceedings of the International Conference on Machine Learning, 2016, pp 1737-1746

arXiv:1602.02701 [pdf, other]

doi 10.1109/ISBI.2016.7493501

Compressed Online Dictionary Learning for Fast fMRI Decomposition

Authors: Arthur Mensch, Gaël Varoquaux, Bertrand Thirion

Abstract: We present a method for fast resting-state fMRI spatial decomposi-tions of very large datasets, based on the reduction of the temporal dimension before applying dictionary learning on concatenated individual records from groups of subjects. Introducing a measure of correspondence between spatial decompositions of rest fMRI, we demonstrates that time-reduced dictionary learning produces result as r… ▽ More We present a method for fast resting-state fMRI spatial decomposi-tions of very large datasets, based on the reduction of the temporal dimension before applying dictionary learning on concatenated individual records from groups of subjects. Introducing a measure of correspondence between spatial decompositions of rest fMRI, we demonstrates that time-reduced dictionary learning produces result as reliable as non-reduced decompositions. We also show that this reduction significantly improves computational scalability. △ Less

Submitted 8 February, 2016; originally announced February 2016.

Journal ref: IEEE International Symposium on Biomedical Imaging, 2016

arXiv:1512.06999 [pdf, ps, other]

FAASTA: A fast solver for total-variation regularization of ill-conditioned problems with application to brain imaging

Authors: Gaël Varoquaux, Michael Eickenberg, Elvis Dohmatob, Bertand Thirion

Abstract: The total variation (TV) penalty, as many other analysis-sparsity problems, does not lead to separable factors or a proximal operatorwith a closed-form expression, such as soft thresholding for the $\ell\_1$ penalty. As a result, in a variational formulation of an inverse problem or statisticallearning estimation, it leads to challenging non-smooth optimization problemsthat are often solved with e… ▽ More The total variation (TV) penalty, as many other analysis-sparsity problems, does not lead to separable factors or a proximal operatorwith a closed-form expression, such as soft thresholding for the $\ell\_1$ penalty. As a result, in a variational formulation of an inverse problem or statisticallearning estimation, it leads to challenging non-smooth optimization problemsthat are often solved with elaborate single-step first-order methods. When thedata-fit term arises from empirical measurements, as in brain imaging, it isoften very ill-conditioned and without simple structure. In this situation, in proximal splitting methods, the computation cost of thegradient step can easily dominate each iteration. Thus it is beneficialto minimize the number of gradient steps.We present fAASTA, a variant of FISTA, that relies on an internal solver forthe TV proximal operator, and refines its tolerance to balance computationalcost of the gradient and the proximal steps. We give benchmarks andillustrations on "brain decoding": recovering brain maps from noisymeasurements to predict observed behavior. The algorithm as well as theempirical study of convergence speed are valuable for any non-exact proximaloperator, in particular analysis-sparsity problems. △ Less

Submitted 22 December, 2015; originally announced December 2015.

Journal ref: Colloque GRETSI, Sep 2015, Lyon, France. Gretsi, 2015, http://www.gretsi.fr/colloque2015/myGretsi/programme.php

arXiv:1511.04898 [pdf, other]

Fast clustering for scalable statistical analysis on structured images

Authors: Bertrand Thirion, Andrés Hoyos-Idrobo, Jonas Kahn, Gael Varoquaux

Abstract: The use of brain images as markers for diseases or behavioral differences is challenged by the small effects size and the ensuing lack of power, an issue that has incited researchers to rely more systematically on large cohorts. Coupled with resolution increases, this leads to very large datasets. A striking example in the case of brain imaging is that of the Human Connectome Project: 20 Terabytes… ▽ More The use of brain images as markers for diseases or behavioral differences is challenged by the small effects size and the ensuing lack of power, an issue that has incited researchers to rely more systematically on large cohorts. Coupled with resolution increases, this leads to very large datasets. A striking example in the case of brain imaging is that of the Human Connectome Project: 20 Terabytes of data and growing. The resulting data deluge poses severe challenges regarding the tractability of some processing steps (discriminant analysis, multivariate models) due to the memory demands posed by these data. In this work, we revisit dimension reduction approaches, such as random projections, with the aim of replacing costly function evaluations by cheaper ones while decreasing the memory requirements. Specifically, we investigate the use of alternate schemes, based on fast clustering, that are well suited for signals exhibiting a strong spatial structure, such as anatomical and functional brain images. Our contribution is twofold: i) we propose a linear-time clustering scheme that bypasses the percolation issues inherent in these algorithms and thus provides compressions nearly as good as traditional quadratic-complexity variance-minimizing clustering schemes, ii) we show that cluster-based compression can have the virtuous effect of removing high-frequency noise, actually improving subsequent estimations steps. As a consequence, the proposed approach yields very accurate models on several large-scale problems yet with impressive gains in computational efficiency, making it possible to analyze large datasets. △ Less

Submitted 16 November, 2015; originally announced November 2015.

Comments: ICML Workshop on Statistics, Machine Learning and Neuroscience (Stamlins 2015), Jul 2015, Lille, France

arXiv:1412.3925 [pdf, other]

Region segmentation for sparse decompositions: better brain parcellations from rest fMRI

Authors: Alexandre Abraham, Elvis Dohmatob, Bertrand Thirion, Dimitris Samaras, Gael Varoquaux

Abstract: Functional Magnetic Resonance Images acquired during resting-state provide information about the functional organization of the brain through measuring correlations between brain areas. Independent components analysis is the reference approach to estimate spatial components from weakly structured data such as brain signal time courses; each of these components may be referred to as a brain network… ▽ More Functional Magnetic Resonance Images acquired during resting-state provide information about the functional organization of the brain through measuring correlations between brain areas. Independent components analysis is the reference approach to estimate spatial components from weakly structured data such as brain signal time courses; each of these components may be referred to as a brain network and the whole set of components can be conceptualized as a brain functional atlas. Recently, new methods using a sparsity prior have emerged to deal with low signal-to-noise ratio data. However, even when using sophisticated priors, the results may not be very sparse and most often do not separate the spatial components into brain regions. This work presents post-processing techniques that automatically sparsify brain maps and separate regions properly using geometric operations, and compares these techniques according to faithfulness to data and stability metrics. In particular, among threshold-based approaches, hysteresis thresholding and random walker segmentation, the latter improves significantly the stability of both dense and sparse models. △ Less

Submitted 12 December, 2014; originally announced December 2014.

Journal ref: Sparsity Techniques in Medical Imaging, Sep 2014, Boston, United States. pp.8

arXiv:1412.3919 [pdf, other]

Machine Learning for Neuroimaging with Scikit-Learn

Authors: Alexandre Abraham, Fabian Pedregosa, Michael Eickenberg, Philippe Gervais, Andreas Muller, Jean Kossaifi, Alexandre Gramfort, Bertrand Thirion, Gäel Varoquaux

Abstract: Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learnin… ▽ More Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g. multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g. resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain. △ Less

Submitted 12 December, 2014; originally announced December 2014.

Comments: Frontiers in neuroscience, Frontiers Research Foundation, 2013, pp.15

arXiv:1402.7015 [pdf, other]

doi 10.1016/j.neuroimage.2014.09.060

Data-driven HRF estimation for encoding and decoding models

Authors: Fabian Pedregosa, Michael Eickenberg, Philippe Ciuciu, Bertrand Thirion, Alexandre Gramfort

Abstract: Despite the common usage of a canonical, data-independent, hemodynamic response function (HRF), it is known that the shape of the HRF varies across brain regions and subjects. This suggests that a data-driven estimation of this function could lead to more statistical power when modeling BOLD fMRI data. However, unconstrained estimation of the HRF can yield highly unstable results when the number o… ▽ More Despite the common usage of a canonical, data-independent, hemodynamic response function (HRF), it is known that the shape of the HRF varies across brain regions and subjects. This suggests that a data-driven estimation of this function could lead to more statistical power when modeling BOLD fMRI data. However, unconstrained estimation of the HRF can yield highly unstable results when the number of free parameters is large. We develop a method for the joint estimation of activation and HRF using a rank constraint causing the estimated HRF to be equal across events/conditions, yet permitting it to be different across voxels. Model estimation leads to an optimization problem that we propose to solve with an efficient quasi-Newton method exploiting fast gradient computations. This model, called GLM with Rank-1 constraint (R1-GLM), can be extended to the setting of GLM with separate designs which has been shown to improve decoding accuracy in brain activity decoding experiments. We compare 10 different HRF modeling methods in terms of encoding and decoding score in two different datasets. Our results show that the R1-GLM model significantly outperforms competing methods in both encoding and decoding settings, positioning it as an attractive method both from the points of view of accuracy and computational efficiency. △ Less

Submitted 7 November, 2014; v1 submitted 27 February, 2014; originally announced February 2014.

Comments: appears in NeuroImage (2015)

arXiv:1311.3859 [pdf, other]

Map** cognitive ontologies to and from the brain

Authors: Yannick Schwartz, Bertrand Thirion, Gaël Varoquaux

Abstract: Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies. Due to the nature of the individual experiments, based on eliciting neural response from a small number of stimuli, this link is incomplete, and unidirectional from the causal point of view. To come to conclusions on the function implied by the activation of brain regions, it is necessary to combi… ▽ More Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies. Due to the nature of the individual experiments, based on eliciting neural response from a small number of stimuli, this link is incomplete, and unidirectional from the causal point of view. To come to conclusions on the function implied by the activation of brain regions, it is necessary to combine a wide exploration of the various brain functions and some inversion of the statistical inference. Here we introduce a methodology for accumulating knowledge towards a bidirectional link between observed brain activity and the corresponding function. We rely on a large corpus of imaging studies and a predictive engine. Technically, the challenges are to find commonality between the studies without denaturing the richness of the corpus. The key elements that we contribute are labeling the tasks performed with a cognitive ontology, and modeling the long tail of rare paradigms in the corpus. To our knowledge, our approach is the first demonstration of predicting the cognitive content of completely new brain images. To that end, we propose a method that predicts the experimental paradigms across different studies. △ Less

Submitted 20 November, 2013; v1 submitted 15 November, 2013; originally announced November 2013.

Comments: NIPS (Neural Information Processing Systems), United States (2013)

arXiv:1310.1257 [pdf, other]

Second order scattering descriptors predict fMRI activity due to visual textures

Authors: Michael Eickenberg, Fabian Pedregosa, Senoussi Mehdi, Alexandre Gramfort, Bertrand Thirion

Abstract: Second layer scattering descriptors are known to provide good classification performance on natural quasi-stationary processes such as visual textures due to their sensitivity to higher order moments and continuity with respect to small deformations. In a functional Magnetic Resonance Imaging (fMRI) experiment we present visual textures to subjects and evaluate the predictive power of these descri… ▽ More Second layer scattering descriptors are known to provide good classification performance on natural quasi-stationary processes such as visual textures due to their sensitivity to higher order moments and continuity with respect to small deformations. In a functional Magnetic Resonance Imaging (fMRI) experiment we present visual textures to subjects and evaluate the predictive power of these descriptors with respect to the predictive power of simple contour energy - the first scattering layer. We are able to conclude not only that invariant second layer scattering coefficients better encode voxel activity, but also that well predicted voxels need not necessarily lie in known retinotopic regions. △ Less

Submitted 10 August, 2013; originally announced October 2013.

Comments: 3nd International Workshop on Pattern Recognition in NeuroImaging (2013)

arXiv:1305.2788 [pdf, other]

HRF estimation improves sensitivity of fMRI encoding and decoding models

Authors: Fabian Pedregosa, Michael Eickenberg, Bertrand Thirion, Alexandre Gramfort

Abstract: Extracting activation patterns from functional Magnetic Resonance Images (fMRI) datasets remains challenging in rapid-event designs due to the inherent delay of blood oxygen level-dependent (BOLD) signal. The general linear model (GLM) allows to estimate the activation from a design matrix and a fixed hemodynamic response function (HRF). However, the HRF is known to vary substantially between subj… ▽ More Extracting activation patterns from functional Magnetic Resonance Images (fMRI) datasets remains challenging in rapid-event designs due to the inherent delay of blood oxygen level-dependent (BOLD) signal. The general linear model (GLM) allows to estimate the activation from a design matrix and a fixed hemodynamic response function (HRF). However, the HRF is known to vary substantially between subjects and brain regions. In this paper, we propose a model for jointly estimating the hemodynamic response function (HRF) and the activation patterns via a low-rank representation of task effects.This model is based on the linearity assumption behind the GLM and can be computed using standard gradient-based solvers. We use the activation patterns computed by our model as input data for encoding and decoding studies and report performance improvement in both settings. △ Less

Submitted 13 May, 2013; originally announced May 2013.

Comments: 3nd International Workshop on Pattern Recognition in NeuroImaging (2013)

arXiv:1209.5375 [pdf, other]

Improving accuracy and power with transfer learning using a meta-analytic database

Authors: Yannick Schwartz, Gaël Varoquaux, Christophe Pallier, Philippe Pinel, Jean-Baptiste Poline, Bertrand Thirion

Abstract: Typical cohorts in brain imaging studies are not large enough for systematic testing of all the information contained in the images. To build testable working hypotheses, investigators thus rely on analysis of previous work, sometimes formalized in a so-called meta-analysis. In brain imaging, this approach underlies the specification of regions of interest (ROIs) that are usually selected on the b… ▽ More Typical cohorts in brain imaging studies are not large enough for systematic testing of all the information contained in the images. To build testable working hypotheses, investigators thus rely on analysis of previous work, sometimes formalized in a so-called meta-analysis. In brain imaging, this approach underlies the specification of regions of interest (ROIs) that are usually selected on the basis of the coordinates of previously detected effects. In this paper, we propose to use a database of images, rather than coordinates, and frame the problem as transfer learning: learning a discriminant model on a reference task to apply it to a different but related new task. To facilitate statistical analysis of small cohorts, we use a sparse discriminant model that selects predictive voxels on the reference task and thus provides a principled procedure to define ROIs. The benefits of our approach are twofold. First it uses the reference database for prediction, i.e. to provide potential biomarkers in a clinical setting. Second it increases statistical power on the new task. We demonstrate on a set of 18 pairs of functional MRI experimental conditions that our approach gives good prediction. In addition, on a specific transfer situation involving different scanners at different locations, we show that voxel selection based on transfer learning leads to higher detection power on small cohorts. △ Less

Submitted 28 September, 2012; v1 submitted 24 September, 2012; originally announced September 2012.

Comments: MICCAI, Nice : France (2012)

arXiv:1209.1450 [pdf, other]

On spatial selectivity and prediction across conditions with fMRI

Authors: Yannick Schwartz, Gaël Varoquaux, Bertrand Thirion

Abstract: Researchers in functional neuroimaging mostly use activation coordinates to formulate their hypotheses. Instead, we propose to use the full statistical images to define regions of interest (ROIs). This paper presents two machine learning approaches, transfer learning and selection transfer, that are compared upon their ability to identify the common patterns between brain activation maps related t… ▽ More Researchers in functional neuroimaging mostly use activation coordinates to formulate their hypotheses. Instead, we propose to use the full statistical images to define regions of interest (ROIs). This paper presents two machine learning approaches, transfer learning and selection transfer, that are compared upon their ability to identify the common patterns between brain activation maps related to two functional tasks. We provide some preliminary quantification of these similarities, and show that selection transfer makes it possible to set a spatial scale yielding ROIs that are more specific to the context of interest than with transfer learning. In particular, selection transfer outlines well known regions such as the Visual Word Form Area when discriminating between different visual tasks. △ Less

Submitted 7 September, 2012; originally announced September 2012.

Comments: PRNI 2012 : 2nd International Workshop on Pattern Recognition in NeuroImaging, London : United Kingdom (2012)

Showing 1–50 of 63 results for author: Thirion, B