-
Building multiscale models with PhysiBoSS, an agent-based modeling tool
Authors:
Marco Ruscone,
Andrea Checcoli,
Randy Heiland,
Emmanuel Barillot,
Paul Macklin,
Laurence Calzone,
Vincent Noël
Abstract:
Multiscale models provide a unique tool for studying complex processes that study events occurring at different scales across space and time. In the context of biological systems, such models can simulate mechanisms happening at the intracellular level such as signaling, and at the extracellular level where cells communicate and coordinate with other cells. They aim to understand the impact of gen…
▽ More
Multiscale models provide a unique tool for studying complex processes that study events occurring at different scales across space and time. In the context of biological systems, such models can simulate mechanisms happening at the intracellular level such as signaling, and at the extracellular level where cells communicate and coordinate with other cells. They aim to understand the impact of genetic or environmental deregulation observed in complex diseases, describe the interplay between a pathological tissue and the immune system, and suggest strategies to revert the diseased phenotypes. The construction of these multiscale models remains a very complex task, including the choice of the components to consider, the level of details of the processes to simulate, or the fitting of the parameters to the data. One additional difficulty is the expert knowledge needed to program these models in languages such as C++ or Python, which may discourage the participation of non-experts. Simplifying this process through structured description formalisms -- coupled with a graphical interface -- is crucial in making modeling more accessible to the broader scientific community, as well as streamlining the process for advanced users. This article introduces three examples of multiscale models which rely on the framework PhysiBoSS, an add-on of PhysiCell that includes intracellular descriptions as continuous time Boolean models to the agent-based approach. The article demonstrates how to easily construct such models, relying on PhysiCell Studio, the PhysiCell Graphical User Interface. A step-by-step tutorial is provided as a Supplementary Material and all models are provided at: https://physiboss.github.io/tutorial/.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Trajectories, bifurcations and pseudotime in large clinical datasets: applications to myocardial infarction and diabetes data
Authors:
Sergey E. Golovenkin,
Jonathan Bac,
Alexander Chervov,
Evgeny M. Mirkes,
Yuliya V. Orlova,
Emmanuel Barillot,
Alexander N. Gorban,
Andrei Zinovyev
Abstract:
Large observational clinical datasets become increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete pathology develops through a number of stereotypical routes, characterized by `points of no return' and `final states' (such a…
▽ More
Large observational clinical datasets become increasingly available for mining associations between various disease traits and administered therapy. These datasets can be considered as representations of the landscape of all possible disease conditions, in which a concrete pathology develops through a number of stereotypical routes, characterized by `points of no return' and `final states' (such as lethal or recovery states). Extracting this information directly from the data remains challenging, especially in the case of synchronic (with a short-term follow up) observations. Here we suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values, through modeling the geometrical data structure as a bouquet of bifurcating clinical trajectories. The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations. The methodology allows positioning a patient on a particular clinical trajectory (pathological scenario) and characterizing the degree of progression along it with a qualitative estimate of the uncertainty of the prognosis. Overall, our pseudo-time quantification-based approach gives a possibility to apply the methods developed for dynamical disease phenoty** and illness trajectory analysis (diachronic data analysis) to synchronic observational data. We developed a tool $ClinTrajan$ for clinical trajectory analysis implemented in Python programming language. We test the methodology in two large publicly available datasets: myocardial infarction complications and readmission of diabetic patients data.
△ Less
Submitted 5 October, 2020; v1 submitted 7 July, 2020;
originally announced July 2020.
-
Robust And Scalable Learning Of Complex Dataset Topologies Via Elpigraph
Authors:
Luca Albergante,
Evgeny M. Mirkes,
Huidong Chen,
Alexis Martin,
Louis Faure,
Emmanuel Barillot,
Luca Pinello,
Alexander N. Gorban,
Andrei Zinovyev
Abstract:
Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of develo** embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computa…
▽ More
Large datasets represented by multidimensional data point clouds often possess non-trivial distributions with branching trajectories and excluded regions, with the recent single-cell transcriptomic studies of develo** embryo being notable examples. Reducing the complexity and producing compact and interpretable representations of such data remains a challenging task. Most of the existing computational methods are based on exploring the local data point neighbourhood relations, a step that can perform poorly in the case of multidimensional and noisy data. Here we present ElPiGraph, a scalable and robust method for approximation of datasets with complex structures which does not require computing the complete data distance matrix or the data point neighbourhood graph. This method is able to withstand high levels of noise and is capable of approximating complex topologies via principal graph ensembles that can be combined into a consensus principal graph. ElPiGraph deals efficiently with large and complex datasets in various fields from biology, where it can be used to infer gene dynamics from single-cell RNA-Seq, to astronomy, where it can be used to explore complex structures in the distribution of galaxies.
△ Less
Submitted 20 June, 2018; v1 submitted 20 April, 2018;
originally announced April 2018.
-
Identification of microRNA clusters cooperatively acting on Epithelial to Mesenchymal Transition in Triple Negative Breast Cancer
Authors:
Laura Cantini,
Gloria Bertoli,
Claudia Cava,
Thierry Dubois,
Andrei Zinovyev,
Michele Caselle,
Isabella Castiglioni,
Emmanuel Barillot,
Loredana Martignetti
Abstract:
MicroRNAs play important roles in many biological processes. Their aberrant expression can have oncogenic or tumor suppressor function directly participating to carcinogenesis, malignant transformation, invasiveness and metastasis. Indeed, miRNA profiles can distinguish not only between normal and cancerous tissue but they can also successfully classify different subtypes of a particular cancer. H…
▽ More
MicroRNAs play important roles in many biological processes. Their aberrant expression can have oncogenic or tumor suppressor function directly participating to carcinogenesis, malignant transformation, invasiveness and metastasis. Indeed, miRNA profiles can distinguish not only between normal and cancerous tissue but they can also successfully classify different subtypes of a particular cancer. Here, we focus on a particular class of transcripts encoding polycistronic miRNA genes that yields multiple miRNA components. We describe clustered MiRNA Master Regulator Analysis (ClustMMRA), a fully redesigned release of the MMRA computational pipeline (MiRNA Master Regulator Analysis), developed to search for clustered miRNAs potentially driving cancer molecular subty**. Genomically clustered miRNAs are frequently co-expressed to target different components of pro-tumorigenic signalling pathways. By applying ClustMMRA to breast cancer patient data, we identified key miRNA clusters driving the phenotype of different tumor subgroups. The pipeline was applied to two independent breast cancer datasets, providing statistically concordant results between the two analysis. We validated in cell lines the miR-199/miR-214 as a novel cluster of miRNAs promoting the triple negative subtype phenotype through its control of proliferation and EMT.
△ Less
Submitted 5 April, 2018;
originally announced April 2018.
-
Predicting genetic interactions from Boolean models of biological networks
Authors:
Laurence Calzone,
Emmanuel Barillot,
Andrei Zinovyev
Abstract:
Genetic interaction can be defined as a deviation of the phenotypic quantitative effect of a double gene mutation from the effect predicted from single mutations using a simple (e.g., multiplicative or linear additive) statistical model. Experimentally characterized genetic interaction networks in model organisms provide important insights into relationships between different biological functions.…
▽ More
Genetic interaction can be defined as a deviation of the phenotypic quantitative effect of a double gene mutation from the effect predicted from single mutations using a simple (e.g., multiplicative or linear additive) statistical model. Experimentally characterized genetic interaction networks in model organisms provide important insights into relationships between different biological functions. We describe a computational methodology allowing to systematically and quantitatively characterize a Boolean mathematical model of a biological network in terms of genetic interactions between all loss of function and gain of function mutations with respect to all model phenotypes or outputs. We use the probabilistic framework defined in MaBoSS software, based on continuous time Markov chains and stochastic simulations. In addition, we suggest several computational tools for studying the distribution of double mutants in the space of model phenotype probabilities. We demonstrate this methodology on three published models for each of which we derive the genetic interaction networks and analyze their properties. We classify the obtained interactions according to their class of epistasis, dependence on the chosen initial conditions and phenotype. The use of this methodology for validating mathematical models from experimental data and designing new experiments is discussed.
△ Less
Submitted 23 April, 2015;
originally announced April 2015.
-
DeDaL: Cytoscape 3.0 app for producing and morphing data-driven and structure-driven network layouts
Authors:
Urszula Czerwinska,
Laurence Calzone,
Emmanuel Barillot,
Andrei Zinovyev
Abstract:
Visualization and analysis of molecular profiling data together with biological networks are able to provide new mechanistical insights into biological functions. Currently, high-throughput data are usually visualized on top of predefined network layouts which are not always adapted to a given data analysis task. We developed a Cytoscape app which allows to construct biological network layouts bas…
▽ More
Visualization and analysis of molecular profiling data together with biological networks are able to provide new mechanistical insights into biological functions. Currently, high-throughput data are usually visualized on top of predefined network layouts which are not always adapted to a given data analysis task. We developed a Cytoscape app which allows to construct biological network layouts based on the data from molecular profiles imported as values of nodes attributes. DeDaL is a Cytoscape 3.0 app which uses linear and non-linear algorithms of dimension reduction to produce data-driven network layouts based on multidimensional data (typically gene expression). DeDaL implements several data pre-processing and layout post-processing steps such as continuous morphing between two arbitrary network layouts and aligning one network layout with respect to another one by rotating and mirroring. Combining these possibilities facilitates creating insightful network layouts representing both structural network features and the correlation patterns in multivariate data. DeDaL is the first method allowing to construct biological network layouts from high-throughput data. DeDaL is freely available for downloading together with step-by-step tutorial at http://bioinfo-out.curie.fr/projects/dedal/.
△ Less
Submitted 2 February, 2015; v1 submitted 24 January, 2015;
originally announced January 2015.
-
NaviCell: a web-based environment for navigation, curation and maintenance of large molecular interaction maps
Authors:
Inna Kuperstein,
David PA Cohen,
Stuart Pook,
Laurence Calzone,
Emmanuel Barillot,
Andrei Zinovyev
Abstract:
Molecular biology knowledge can be systematically represented in a computer-readable form as a comprehensive map of molecular interactions. There exist a number of maps of molecular interactions containing detailed description of various cell mechanisms. It is difficult to explore these large maps, to comment their content and to maintain them. Though there exist several tools addressing these pro…
▽ More
Molecular biology knowledge can be systematically represented in a computer-readable form as a comprehensive map of molecular interactions. There exist a number of maps of molecular interactions containing detailed description of various cell mechanisms. It is difficult to explore these large maps, to comment their content and to maintain them. Though there exist several tools addressing these problems individually, the scientific community still lacks an environment that combines these three capabilities together. NaviCell is a web-based environment for exploiting large maps of molecular interactions, created in CellDesigner, allowing their easy exploration, curation and maintenance. NaviCell combines three features: (1) efficient map browsing based on Google Maps engine; (2) semantic zooming for viewing different levels of details or of abstraction of the map and (3) integrated web-based blog for collecting the community feedback. NaviCell can be easily used by experts in the field of molecular biology for studying molecular entities of their interest in the context of signaling pathways and cross-talks between pathways within a global signaling network. NaviCell allows both exploration of detailed molecular mechanisms represented on the map and a more abstract view of the map up to a top-level modular representation. NaviCell facilitates curation, maintenance and updating the comprehensive maps of molecular interactions in an interactive fashion due to an imbedded blogging system. NaviCell provides an easy way to explore large-scale maps of molecular interactions, thanks to the Google Maps and WordPress interfaces, already familiar to many users. Semantic zooming used for navigating geographical maps is adopted for molecular maps in NaviCell, making any level of visualization meaningful to the user. In addition, NaviCell provides a framework for community-based map curation.
△ Less
Submitted 31 January, 2013;
originally announced January 2013.
-
Cell death and life in cancer: mathematical modeling of cell fate decisions
Authors:
Andrei Zinovyev,
Simon Fourquet,
Laurent Tournier,
Laurence Calzone,
Emmanuel Barillot
Abstract:
Tumor development is characterized by a compromised balance between cell life and death decision mechanisms, which are tighly regulated in normal cells. Understanding this process provides insights for develo** new treatments for fighting with cancer. We present a study of a mathematical model describing cellular choice between survival and two alternative cell death modalities: apoptosis and ne…
▽ More
Tumor development is characterized by a compromised balance between cell life and death decision mechanisms, which are tighly regulated in normal cells. Understanding this process provides insights for develo** new treatments for fighting with cancer. We present a study of a mathematical model describing cellular choice between survival and two alternative cell death modalities: apoptosis and necrosis. The model is implemented in discrete modeling formalism and allows to predict probabilities of having a particular cellular phenotype in response to engagement of cell death receptors. Using an original parameter sensitivity analysis developed for discrete dynamic systems, we determine the critical parameters affecting cellular fate decision variables that appear to be critical in the cellular fate decision and discuss how they are exploited by existing cancer therapies.
△ Less
Submitted 10 January, 2013;
originally announced January 2013.
-
Continuous time Boolean modeling for biological signaling: application of Gillespie algorithm
Authors:
Gautier Stoll,
Eric Viara,
Emmanuel Barillot,
Laurence Calzone
Abstract:
This article presents an algorithm that allows modeling of biological networks in a qualitative framework with continuous time. Mathematical modeling is used as a systems biology tool to answer biological questions, and more precisely, to validate a network that describes biological observations and to predict the effect of perturbations.
We propose a modeling approach that is intrinsically cont…
▽ More
This article presents an algorithm that allows modeling of biological networks in a qualitative framework with continuous time. Mathematical modeling is used as a systems biology tool to answer biological questions, and more precisely, to validate a network that describes biological observations and to predict the effect of perturbations.
We propose a modeling approach that is intrinsically continuous in time. The algorithm presented here fills the gap between qualitative and quantitative modeling. It is based on continuous time Markov process applied on a Boolean state space. In order to describe the temporal evolution, we explicitly specify the transition rates for each node. For that purpose, we built a language that can be seen as a generalization of Boolean equations. The values of transition rates have a natural interpretation: it is the inverse of the time for the transition to occur. Mathematically, this approach can be translated in a set of ordinary differential equations on probability distributions; therefore, it can be seen as an approach in between quantitative and qualitative.
We developed a C++ software, MaBoSS, that is able to simulate such a system by applying Kinetic Monte-Carlo (or Gillespie algorithm) in the Boolean state space. This software, parallelized and optimized, computes temporal evolution of probability distributions and can also estimate stationary distributions. Applications of Boolean Kinetic Monte-Carlo have been demonstrated for two qualitative models: a toy model and a published p53/Mdm2 model. Our approach allows to describe kinetic phenomena which were difficult to handle in the original models. In particular, transient effects are represented by time dependent probability distributions, interpretable in terms of cell populations.
△ Less
Submitted 29 May, 2012; v1 submitted 4 April, 2012;
originally announced April 2012.
-
Dynamical modeling of microRNA action on the protein translation process
Authors:
Andrei Zinovyev,
Nadya Morozova,
Nora Nonne,
Emmanuel Barillot,
Annick Harel-Bellan,
Alexander N. Gorban
Abstract:
Protein translation is a multistep process which can be represented as a cascade of biochemical reactions (initiation, ribosome assembly, elongation, etc.), the rate of which can be regulated by small non-coding microRNAs through multiple mechanisms. It remains unclear what mechanisms of microRNA action are most dominant: moreover, many experimental reports deliver controversal messages on what…
▽ More
Protein translation is a multistep process which can be represented as a cascade of biochemical reactions (initiation, ribosome assembly, elongation, etc.), the rate of which can be regulated by small non-coding microRNAs through multiple mechanisms. It remains unclear what mechanisms of microRNA action are most dominant: moreover, many experimental reports deliver controversal messages on what is the concrete mechanism actually observed in the experiment. Parker and Nissan (Parker and Nissan, RNA, 2008) demonstrated that it is impossible to distinguish alternative biological hypotheses using the steady state data on the rate of protein synthesis. For their analysis they used two simple kinetic models of protein translation. In contrary, we show that dynamical data allow to discriminate some of the mechanisms of microRNA action. We demonstrate this using the same models as in (Parker and Nissan, RNA, 2008) for the sake of comparison but the methods developed (asymptotology of biochemical networks) can be used for other models. As one of the results of our analysis, we formulate a hypothesis that the effect of microRNA action is measurable and observable only if it affects the dominant system (generalization of the limiting step notion for complex networks) of the protein translation machinery. The dominant system can vary in different experimental conditions that can partially explain the existing controversy of some of the experimental data.
△ Less
Submitted 9 November, 2009;
originally announced November 2009.
-
Classification of arrayCGH data using a fused SVM
Authors:
Franck Rapaport,
Emmanuel Barillot,
Jean-Philippe Vert
Abstract:
Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profil…
▽ More
Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profiles are characterized by a large number of variables usually measured on a limited number of samples. However, arrayCGH profiles have a particular structure of correlations between variables, due to the spatial organization of BACs along the genome. This suggests that classical classification methods, often based on the selection of a small number of discriminative features, may not be the most accurate methods and may not produce easily interpretable prediction rules.
Results: We propose a new method for supervised classification of arrayCGH data. The method is a variant of support vector machine (SVM) that incorporates the biological specificities of DNA copy number variations along the genome as prior knowledge. The resulting classifier is a sparse linear classifier based on a limited number of regions automatically selected on the chromosomes, leading to easy interpretation and identification of discriminative regions of the genome. We test this method on three classification problems for bladder and uveal cancer, involving both diagnosis and prognosis. We demonstrate that the introduction of the new prior on the classifier leads not only to more accurate predictions, but also to the identification of known and new regions of interest in the genome.
Availability: All data and algorithms are publicly available.
△ Less
Submitted 18 January, 2008;
originally announced January 2008.
-
Spectral analysis of gene expression profiles using gene networks
Authors:
Franck Rapaport,
Andrei Zinovyev,
Marie Dutreix,
Emmanuel Barillot,
Jean-Philippe Vert
Abstract:
Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks to elucidate the functions perturbed at the level of pathways. However, integrating a prior…
▽ More
Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks to elucidate the functions perturbed at the level of pathways. However, integrating a priori knowledge of the gene networks could help in the statistical analysis of gene expression data and in their biological interpretation. Here we propose a method to integrate a priori the knowledge of a gene network in the analysis of gene expression data. The approach is based on the spectral decomposition of gene expression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles with respect to the topology of the graph. We show how to derive unsupervised and supervised classification algorithms of expression profiles, resulting in classifiers with biological relevance. We applied the method to the analysis of a set of expression profiles from irradiated and non-irradiated yeast strains. It performed at least as well as the usual classification but provides much more biologically relevant results and allows a direct biological interpretation.
△ Less
Submitted 26 March, 2006;
originally announced March 2006.