-
Predicting Parkinson's disease trajectory using clinical and functional MRI features: a reproduction and replication study
Authors:
Elodie Germani,
Nikhil Baghwat,
Mathieu Dugré,
Rémi Gau,
Albert Montillo,
Kevin Nguyen,
Andrzej Sokolowski,
Madeleine Sharp,
Jean-Baptiste Poline,
Tristan Glatard
Abstract:
Parkinson's disease (PD) is a common neurodegenerative disorder with a poorly understood physiopathology and no established biomarkers for the diagnosis of early stages and for prediction of disease progression. Several neuroimaging biomarkers have been studied recently, but these are susceptible to several sources of variability. In this context, an evaluation of the robustness of such biomarkers…
▽ More
Parkinson's disease (PD) is a common neurodegenerative disorder with a poorly understood physiopathology and no established biomarkers for the diagnosis of early stages and for prediction of disease progression. Several neuroimaging biomarkers have been studied recently, but these are susceptible to several sources of variability. In this context, an evaluation of the robustness of such biomarkers is essential. This study is part of a larger project investigating the replicability of potential neuroimaging biomarkers of PD. Here, we attempt to reproduce (same data, same method) and replicate (different data or method) the models described in Nguyen et al., 2021 to predict individual's PD current state and progression using demographic, clinical and neuroimaging features (fALFF and ReHo extracted from resting-state fMRI). We use the Parkinson's Progression Markers Initiative dataset (PPMI, ppmi-info.org), as in Nguyen et al.,2021 and aim to reproduce the original cohort, imaging features and machine learning models as closely as possible using the information available in the paper and the code. We also investigated methodological variations in cohort selection, feature extraction pipelines and sets of input features. The success of the reproduction was assessed using different criteria. Notably, we obtained significantly better than chance performance using the analysis pipeline closest to that in the original study (R2 > 0), which is consistent with its findings. The challenges encountered while reproducing and replicating the original work are likely explained by the complexity of neuroimaging studies, in particular in clinical settings. We provide recommendations to further facilitate the reproducibility of such studies in the future.
△ Less
Submitted 24 May, 2024; v1 submitted 20 February, 2024;
originally announced March 2024.
-
The Past, Present, and Future of the Brain Imaging Data Structure (BIDS)
Authors:
Russell A. Poldrack,
Christopher J. Markiewicz,
Stefan Appelhoff,
Yoni K. Ashar,
Tibor Auer,
Sylvain Baillet,
Shashank Bansal,
Leandro Beltrachini,
Christian G. Benar,
Giacomo Bertazzoli,
Suyash Bhogawar,
Ross W. Blair,
Marta Bortoletto,
Mathieu Boudreau,
Teon L. Brooks,
Vince D. Calhoun,
Filippo Maria Castelli,
Patricia Clement,
Alexander L Cohen,
Julien Cohen-Adad,
Sasha D'Ambrosio,
Gilles de Hollander,
María de la iglesia-Vayá,
Alejandro de la Vega,
Arnaud Delorme
, et al. (89 additional authors not shown)
Abstract:
The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities. This paper is meant as a history of how the standard has developed and grown over time. We outline the principles behind the project, the mechanisms by which it has been extended, and some of the challenges being addressed as it evolves.…
▽ More
The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities. This paper is meant as a history of how the standard has developed and grown over time. We outline the principles behind the project, the mechanisms by which it has been extended, and some of the challenges being addressed as it evolves. We also discuss the lessons learned through the project, with the aim of enabling researchers in other domains to learn from the success of BIDS.
△ Less
Submitted 8 January, 2024; v1 submitted 11 September, 2023;
originally announced September 2023.
-
Benchmarking missing-values approaches for predictive models on health databases
Authors:
Alexandre Perez-Lebel,
Gaël Varoquaux,
Marine Le Morvan,
Julie Josse,
Jean-Baptiste Poline
Abstract:
BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative -- rather than generative -- modeling,…
▽ More
BACKGROUND: As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values: incomplete observations. These large databases are well suited to train machine-learning models, for instance for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative -- rather than generative -- modeling, and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics. RESULTS: Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: four electronic health record datasets, a population brain imaging one, a health survey and two intensive care ones. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values-with missing incorporated attribute-leads to robust, fast, and well-performing predictive modeling. CONCLUSIONS: Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Recommendations for repositories and scientific gateways from a neuroscience perspective
Authors:
Malin Sandström,
Mathew Abrams,
Jan Bjaalie,
Mona Hicks,
David Kennedy,
Arvind Kumar,
JB Poline,
Prasun Roy,
Paul Tiesinga,
Thomas Wachtler,
Wojtek Goscinski
Abstract:
Digital services such as repositories and science gateways have become key resources for the neuroscience community, but users often have a hard time orienting themselves in the service landscape to find the best fit for their particular needs. INCF (International Neuroinformatics Coordinating Facility) has developed a set of recommendations and associated criteria for choosing or setting up and r…
▽ More
Digital services such as repositories and science gateways have become key resources for the neuroscience community, but users often have a hard time orienting themselves in the service landscape to find the best fit for their particular needs. INCF (International Neuroinformatics Coordinating Facility) has developed a set of recommendations and associated criteria for choosing or setting up and running a repository or scientific gateway, intended for the neuroscience community, with a FAIR neuroscience perspective. These recommendations have neurosciences as their primary use case but are often general. Considering the perspectives of researchers and providers of repositories as well as scientific gateways, the recommendations harmonize and complement existing work on criteria for repositories and best practices. The recommendations cover a range of important areas including accessibility, licensing, community responsibility and technical and financial sustainability of a service.
△ Less
Submitted 3 January, 2022;
originally announced January 2022.
-
Preventing dataset shift from breaking machine-learning biomarkers
Authors:
Jéroôme Dockès,
Gaël Varoquaux,
Jean-Baptiste Poline
Abstract:
Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new indiv…
▽ More
Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g. because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts breaks machine-learning extracted biomarkers, as well as detection and correction strategies.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
An algorithm-based multiple detection influence measure for high dimensional regression using expectile
Authors:
Amadou Barry,
Nikhil Bhagwat,
Bratislav Misic,
Jean-Baptiste Poline,
Celia M. T. Greenwood
Abstract:
The identification of influential observations is an important part of data analysis that can prevent erroneous conclusions drawn from biased estimators. However, in high dimensional data, this identification is challenging. Classical and recently-developed methods often perform poorly when there are multiple influential observations in the same dataset. In particular, current methods can fail whe…
▽ More
The identification of influential observations is an important part of data analysis that can prevent erroneous conclusions drawn from biased estimators. However, in high dimensional data, this identification is challenging. Classical and recently-developed methods often perform poorly when there are multiple influential observations in the same dataset. In particular, current methods can fail when there is masking several influential observations with similar characteristics, or swam** when the influential observations are near the boundary of the space spanned by well-behaved observations. Therefore, we propose an algorithm-based, multi-step, multiple detection procedure to identify influential observations that addresses current limitations. Our three-step algorithm to identify and capture undesirable variability in the data, $\asymMIP,$ is based on two complementary statistics, inspired by asymmetric correlations, and built on expectiles. Simulations demonstrate higher detection power than competing methods. Use of the resulting asymptotic distribution leads to detection of influential observations without the need for computationally demanding procedures such as the bootstrap. The application of our method to the Autism Brain Imaging Data Exchange neuroimaging dataset resulted in a more balanced and accurate prediction of brain maturity based on cortical thickness. See our GitHub for a free R package that implements our algorithm: \texttt{asymMIP} (\url{github.com/AmBarry/hidetify}).
△ Less
Submitted 25 May, 2021;
originally announced May 2021.
-
Teaching computational reproducibility for neuroimaging
Authors:
K. Jarrod Millman,
Matthew Brett,
Ross Barnowski,
Jean-Baptiste Poline
Abstract:
We describe a project-based introduction to reproducible and collaborative neuroimaging analysis. Traditional teaching on neuroimaging usually consists of a series of lectures that emphasize the big picture rather than the foundations on which the techniques are based. The lectures are often paired with practical workshops in which students run imaging analyses using the graphical interface of spe…
▽ More
We describe a project-based introduction to reproducible and collaborative neuroimaging analysis. Traditional teaching on neuroimaging usually consists of a series of lectures that emphasize the big picture rather than the foundations on which the techniques are based. The lectures are often paired with practical workshops in which students run imaging analyses using the graphical interface of specific neuroimaging software packages. Our experience suggests that this combination leaves the student with a superficial understanding of the underlying ideas, and an informal, inefficient, and inaccurate approach to analysis. To address these problems, we based our course around a substantial open-ended group project. This allowed us to teach: (a) computational tools to ensure computationally reproducible work, such as the Unix command line, structured code, version control, automated testing, and code review and (b) a clear understanding of the statistical techniques used for a basic analysis of a single run in an MRI scanner. The emphasis we put on the group project showed the importance of standard computational tools for accuracy, efficiency, and collaboration. The projects were broadly successful in engaging students in working reproducibly on real scientific questions. We propose that a course on this model should be the foundation for future programs in neuroimaging. We believe it will also serve as a model for teaching efficient and reproducible research in other fields of computational science.
△ Less
Submitted 15 June, 2018;
originally announced June 2018.
-
Grand Challenges for Global Brain Sciences
Authors:
Joshua T. Vogelstein,
Katrin Amunts,
Andreas Andreou,
Dora Angelaki,
Giorgio Ascoli,
Cori Bargmann,
Randal Burns,
Corrado Cali,
Frances Chance,
Miyoung Chun,
George Church,
Hollis Cline,
Todd Coleman,
Stephanie de La Rochefoucauld,
Winfried Denk,
Ana Belen Elgoyhen,
Ralph Etienne Cummings,
Alan Evans,
Kenneth Harris,
Michael Hausser,
Sean Hill,
Samuel Inverso,
Chad Jackson,
Viren Jain,
Rob Kass
, et al. (37 additional authors not shown)
Abstract:
The next grand challenges for society and science are in the brain sciences. A collection of 60+ scientists from around the world, together with 10+ observers from national, private, and foundations, spent two days together discussing the top challenges that we could solve as a global community in the next decade. We eventually settled on three challenges, spanning anatomy, physiology, and medicin…
▽ More
The next grand challenges for society and science are in the brain sciences. A collection of 60+ scientists from around the world, together with 10+ observers from national, private, and foundations, spent two days together discussing the top challenges that we could solve as a global community in the next decade. We eventually settled on three challenges, spanning anatomy, physiology, and medicine. Addressing all three challenges requires novel computational infrastructure. The group proposed the advent of The International Brain Station (TIBS), to address these challenges, and launch brain sciences to the next level of understanding.
△ Less
Submitted 27 October, 2016; v1 submitted 23 August, 2016;
originally announced August 2016.
-
PyXNAT: XNAT in Python
Authors:
Yannick Schwartz,
Alexis Barbot,
Benjamin Thyreau,
Vincent Frouin,
Gaël Varoquaux,
Aditya Siram,
Daniel Marcus,
Jean-Baptiste Poline
Abstract:
As neuroimaging databases grow in size and complexity, the time researchers spend investigating and managing the data increases to the expense of data analysis. As a result, investigators rely more and more heavily on scripting using high-level languages to automate data management and processing tasks. For this, a structured and programmatic access to the data store is necessary. Web services are…
▽ More
As neuroimaging databases grow in size and complexity, the time researchers spend investigating and managing the data increases to the expense of data analysis. As a result, investigators rely more and more heavily on scripting using high-level languages to automate data management and processing tasks. For this, a structured and programmatic access to the data store is necessary. Web services are a first step toward this goal. They however lack in functionality and ease of use because they provide only low level interfaces to databases. We introduce here PyXNAT, a Python module that interacts with The Extensible Neuroimaging Archive Toolkit (XNAT) through native Python calls across multiple operating systems. The choice of Python enables PyXNAT to expose the XNAT Web Services and unify their features with a higher level and more expressive language. PyXNAT provides XNAT users direct access to all the scientific packages in Python. Finally PyXNAT aims to be efficient and easy to use, both as a backend library to build XNAT clients and as an alternative frontend from the command line.
△ Less
Submitted 29 January, 2013;
originally announced January 2013.
-
Improving accuracy and power with transfer learning using a meta-analytic database
Authors:
Yannick Schwartz,
Gaël Varoquaux,
Christophe Pallier,
Philippe Pinel,
Jean-Baptiste Poline,
Bertrand Thirion
Abstract:
Typical cohorts in brain imaging studies are not large enough for systematic testing of all the information contained in the images. To build testable working hypotheses, investigators thus rely on analysis of previous work, sometimes formalized in a so-called meta-analysis. In brain imaging, this approach underlies the specification of regions of interest (ROIs) that are usually selected on the b…
▽ More
Typical cohorts in brain imaging studies are not large enough for systematic testing of all the information contained in the images. To build testable working hypotheses, investigators thus rely on analysis of previous work, sometimes formalized in a so-called meta-analysis. In brain imaging, this approach underlies the specification of regions of interest (ROIs) that are usually selected on the basis of the coordinates of previously detected effects. In this paper, we propose to use a database of images, rather than coordinates, and frame the problem as transfer learning: learning a discriminant model on a reference task to apply it to a different but related new task. To facilitate statistical analysis of small cohorts, we use a sparse discriminant model that selects predictive voxels on the reference task and thus provides a principled procedure to define ROIs. The benefits of our approach are twofold. First it uses the reference database for prediction, i.e. to provide potential biomarkers in a clinical setting. Second it increases statistical power on the new task. We demonstrate on a set of 18 pairs of functional MRI experimental conditions that our approach gives good prediction. In addition, on a specific transfer situation involving different scanners at different locations, we show that voxel selection based on transfer learning leads to higher detection power on small cohorts.
△ Less
Submitted 28 September, 2012; v1 submitted 24 September, 2012;
originally announced September 2012.
-
Markov models for fMRI correlation structure: is brain functional connectivity small world, or decomposable into networks?
Authors:
Gaël Varoquaux,
Alexandre Gramfort,
Jean Baptiste Poline,
Bertrand Thirion
Abstract:
Correlations in the signal observed via functional Magnetic Resonance Imaging (fMRI), are expected to reveal the interactions in the underlying neural populations through hemodynamic response. In particular, they highlight distributed set of mutually correlated regions that correspond to brain networks related to different cognitive functions. Yet graph-theoretical studies of neural connections gi…
▽ More
Correlations in the signal observed via functional Magnetic Resonance Imaging (fMRI), are expected to reveal the interactions in the underlying neural populations through hemodynamic response. In particular, they highlight distributed set of mutually correlated regions that correspond to brain networks related to different cognitive functions. Yet graph-theoretical studies of neural connections give a different picture: that of a highly integrated system with small-world properties: local clustering but with short pathways across the complete structure. We examine the conditional independence properties of the fMRI signal, i.e. its Markov structure, to find realistic assumptions on the connectivity structure that are required to explain the observed functional connectivity. In particular we seek a decomposition of the Markov structure into segregated functional networks using decomposable graphs: a set of strongly-connected and partially overlap** cliques. We introduce a new method to efficiently extract such cliques on a large, strongly-connected graph. We compare methods learning different graph structures from functional connectivity by testing the goodness of fit of the model they learn on new data. We find that summarizing the structure as strongly-connected networks can give a good description only for very large and overlap** networks. These results highlight that Markov models are good tools to identify the structure of brain connectivity from fMRI signals, but for this purpose they must reflect the small-world properties of the underlying neural systems.
△ Less
Submitted 3 February, 2012;
originally announced February 2012.
-
Brain covariance selection: better individual functional connectivity models using population prior
Authors:
Gaël Varoquaux,
Alexandre Gramfort,
Jean Baptiste Poline,
Bertrand Thirion
Abstract:
Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration…
▽ More
Spontaneous brain activity, as observed in functional neuroimaging, has been shown to display reproducible structure that expresses brain architecture and carries markers of brain pathologies. An important view of modern neuroscience is that such large-scale structure of coherent activity reflects modularity properties of brain connectivity graphs. However, to date, there has been no demonstration that the limited and noisy data available in spontaneous activity observations could be used to learn full-brain probabilistic models that generalize to new data. Learning such models entails two main challenges: i) modeling full brain connectivity is a difficult estimation problem that faces the curse of dimensionality and ii) variability between subjects, coupled with the variability of functional signals between experimental runs, makes the use of multiple datasets challenging. We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population. We show that individual models learned from functional Magnetic Resonance Imaging (fMRI) data using this population prior generalize better to unseen data than models based on alternative regularization schemes. To our knowledge, this is the first report of a cross-validated model of spontaneous brain activity. Finally, we use the estimated graphical model to explore the large-scale characteristics of functional architecture and show for the first time that known cognitive networks appear as the integrated communities of functional connectivity graph.
△ Less
Submitted 12 November, 2010; v1 submitted 30 August, 2010;
originally announced August 2010.
-
ICA-based sparse feature recovery from fMRI datasets
Authors:
Gaël Varoquaux,
Merlin Keller,
Jean Baptiste Poline,
Philippe Ciuciu,
Bertrand Thirion
Abstract:
Spatial Independent Components Analysis (ICA) is increasingly used in the context of functional Magnetic Resonance Imaging (fMRI) to study cognition and brain pathologies. Salient features present in some of the extracted Independent Components (ICs) can be interpreted as brain networks, but the segmentation of the corresponding regions from ICs is still ill-controlled. Here we propose a new ICA-b…
▽ More
Spatial Independent Components Analysis (ICA) is increasingly used in the context of functional Magnetic Resonance Imaging (fMRI) to study cognition and brain pathologies. Salient features present in some of the extracted Independent Components (ICs) can be interpreted as brain networks, but the segmentation of the corresponding regions from ICs is still ill-controlled. Here we propose a new ICA-based procedure for extraction of sparse features from fMRI datasets. Specifically, we introduce a new thresholding procedure that controls the deviation from isotropy in the ICA mixing model. Unlike current heuristics, our procedure guarantees an exact, possibly conservative, level of specificity in feature detection. We evaluate the sensitivity and specificity of the method on synthetic and fMRI data and show that it outperforms state-of-the-art approaches.
△ Less
Submitted 11 June, 2010;
originally announced June 2010.
-
A group model for stable multi-subject ICA on fMRI datasets
Authors:
G. Varoquaux,
S. Sadaghiani,
P. Pinel,
A. Kleinschmidt,
J. B. Poline,
B. Thirion
Abstract:
Spatial Independent Component Analysis (ICA) is an increasingly used data-driven method to analyze functional Magnetic Resonance Imaging (fMRI) data. To date, it has been used to extract sets of mutually correlated brain regions without prior information on the time course of these regions. Some of these sets of regions, interpreted as functional networks, have recently been used to provide marker…
▽ More
Spatial Independent Component Analysis (ICA) is an increasingly used data-driven method to analyze functional Magnetic Resonance Imaging (fMRI) data. To date, it has been used to extract sets of mutually correlated brain regions without prior information on the time course of these regions. Some of these sets of regions, interpreted as functional networks, have recently been used to provide markers of brain diseases and open the road to paradigm-free population comparisons. Such group studies raise the question of modeling subject variability within ICA: how can the patterns representative of a group be modeled and estimated via ICA for reliable inter-group comparisons? In this paper, we propose a hierarchical model for patterns in multi-subject fMRI datasets, akin to mixed-effect group models used in linear-model-based analysis. We introduce an estimation procedure, CanICA (Canonical ICA), based on i) probabilistic dimension reduction of the individual data, ii) canonical correlation analysis to identify a data subspace common to the group iii) ICA-based pattern extraction. In addition, we introduce a procedure based on cross-validation to quantify the stability of ICA patterns at the level of the group. We compare our method with state-of-the-art multi-subject fMRI ICA methods and show that the features extracted using our procedure are more reproducible at the group level on two datasets of 12 healthy controls: a resting-state and a functional localizer study.
△ Less
Submitted 11 June, 2010;
originally announced June 2010.
-
CanICA: Model-based extraction of reproducible group-level ICA patterns from fMRI time series
Authors:
Gaël Varoquaux,
Sepideh Sadaghiani,
Jean Baptiste Poline,
Bertrand Thirion
Abstract:
Spatial Independent Component Analysis (ICA) is an increasingly used data-driven method to analyze functional Magnetic Resonance Imaging (fMRI) data. To date, it has been used to extract meaningful patterns without prior information. However, ICA is not robust to mild data variation and remains a parameter-sensitive algorithm. The validity of the extracted patterns is hard to establish, as well…
▽ More
Spatial Independent Component Analysis (ICA) is an increasingly used data-driven method to analyze functional Magnetic Resonance Imaging (fMRI) data. To date, it has been used to extract meaningful patterns without prior information. However, ICA is not robust to mild data variation and remains a parameter-sensitive algorithm. The validity of the extracted patterns is hard to establish, as well as the significance of differences between patterns extracted from different groups of subjects. We start from a generative model of the fMRI group data to introduce a probabilistic ICA pattern-extraction algorithm, called CanICA (Canonical ICA). Thanks to an explicit noise model and canonical correlation analysis, our method is auto-calibrated and identifies the group-reproducible data subspace before performing ICA. We compare our method to state-of-the-art multi-subject fMRI ICA methods and show that the features extracted are more reproducible.
△ Less
Submitted 24 November, 2009;
originally announced November 2009.