-
Deep-learning-powered data analysis in plankton ecology
Authors:
Harshith Bachimanchi,
Matthew I. M. Pinder,
Chloé Robert,
Pierre De Wit,
Jonathan Havenhand,
Alexandra Kinnby,
Daniel Midtvedt,
Erik Selander,
Giovanni Volpe
Abstract:
The implementation of deep learning algorithms has brought new perspectives to plankton ecology. Emerging as an alternative approach to established methods, deep learning offers objective schemes to investigate plankton organisms in diverse environments. We provide an overview of deep-learning-based methods including detection and classification of phyto- and zooplankton images, foraging and swimm…
▽ More
The implementation of deep learning algorithms has brought new perspectives to plankton ecology. Emerging as an alternative approach to established methods, deep learning offers objective schemes to investigate plankton organisms in diverse environments. We provide an overview of deep-learning-based methods including detection and classification of phyto- and zooplankton images, foraging and swimming behaviour analysis, and finally ecological modelling. Deep learning has the potential to speed up the analysis and reduce the human experimental bias, thus enabling data acquisition at relevant temporal and spatial scales with improved reproducibility. We also discuss shortcomings and show how deep learning architectures have evolved to mitigate imprecise readouts. Finally, we suggest opportunities where deep learning is particularly likely to catalyze plankton research. The examples are accompanied by detailed tutorials and code samples that allow readers to apply the methods described in this review to their own data.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Cancer Gene Profiling through Unsupervised Discovery
Authors:
Enzo Battistella,
Maria Vakalopoulou,
Roger Sun,
Théo Estienne,
Marvin Lerousseau,
Sergey Nikolaev,
Emilie Alvarez Andres,
Alexandre Carré,
Stéphane Niyoteka,
Charlotte Robert,
Nikos Paragios,
Eric Deutsch
Abstract:
Precision medicine is a paradigm shift in healthcare relying heavily on genomics data. However, the complexity of biological interactions, the large number of genes as well as the lack of comparisons on the analysis of data, remain a tremendous bottleneck regarding clinical adoption. In this paper, we introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarke…
▽ More
Precision medicine is a paradigm shift in healthcare relying heavily on genomics data. However, the complexity of biological interactions, the large number of genes as well as the lack of comparisons on the analysis of data, remain a tremendous bottleneck regarding clinical adoption. In this paper, we introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers. Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm, that offers modularity as concerns metric functions and scalability, while being able to automatically determine the best number of clusters. Our evaluation includes both mathematical and biological criteria. The recovered signature is applied to a variety of biological tasks, including screening of biological pathways and functions, and characterization relevance on tumor types and subtypes. Quantitative comparisons among different distance metrics, commonly used clustering methods and a referential gene signature used in the literature, confirm state of the art performance of our approach. In particular, our signature, that is based on 27 genes, reports at least $30$ times better mathematical significance (average Dunn's Index) and 25% better biological significance (average Enrichment in Protein-Protein Interaction) than those produced by other referential clustering methods. Finally, our signature reports promising results on distinguishing immune inflammatory and immune desert tumors, while reporting a high balanced accuracy of 92% on tumor types classification and averaged balanced accuracy of 68% on tumor subtypes classification, which represents, respectively 7% and 9% higher performance compared to the referential signature.
△ Less
Submitted 11 February, 2021;
originally announced February 2021.
-
Reliable ABC model choice via random forests
Authors:
Pierre Pudlo,
Jean-Michel Marin,
Arnaud Estoup,
Jean-Marie Cornuet,
Mathieu Gautier,
Christian P. Robert
Abstract:
Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selec…
▽ More
Approximate Bayesian computation (ABC) methods provide an elaborate approach to Bayesian inference on complex models, including model choice. Both theoretical arguments and simulation experiments indicate, however, that model posterior probabilities may be poorly evaluated by standard ABC techniques. We propose a novel approach based on a machine learning tool named random forests to conduct selection among the highly complex models covered by ABC algorithms. We thus modify the way Bayesian model selection is both understood and operated, in that we rephrase the inferential goal as a classification problem, first predicting the model that best fits the data with random forests and postponing the approximation of the posterior probability of the predicted MAP for a second stage also relying on random forests. Compared with earlier implementations of ABC model choice, the ABC random forest approach offers several potential improvements: (i) it often has a larger discriminative power among the competing models, (ii) it is more robust against the number and choice of statistics summarizing the data, (iii) the computing effort is drastically reduced (with a gain in computation efficiency of at least fifty), and (iv) it includes an approximation of the posterior probability of the selected model. The call to random forests will undoubtedly extend the range of size of datasets and complexity of models that ABC can handle. We illustrate the power of this novel methodology by analyzing controlled experiments as well as genuine population genetics datasets. The proposed methodologies are implemented in the R package abcrf available on the CRAN.
△ Less
Submitted 2 September, 2015; v1 submitted 24 June, 2014;
originally announced June 2014.
-
Bayesian computation via empirical likelihood
Authors:
K. L. Mengersen,
P. Pudlo,
C. P. Robert
Abstract:
Approximate Bayesian computation (ABC) has become an essential tool for the analysis of complex stochastic models when the likelihood function is numerically unavailable. However, the well-established statistical method of empirical likelihood provides another route to such settings that bypasses simulations from the model and the choices of the ABC parameters (summary statistics, distance, tolera…
▽ More
Approximate Bayesian computation (ABC) has become an essential tool for the analysis of complex stochastic models when the likelihood function is numerically unavailable. However, the well-established statistical method of empirical likelihood provides another route to such settings that bypasses simulations from the model and the choices of the ABC parameters (summary statistics, distance, tolerance), while being convergent in the number of observations. Furthermore, bypassing model simulations may lead to significant time savings in complex models, for instance those found in population genetics. The BCel algorithm we develop in this paper also provides an evaluation of its own performance through an associated effective sample size. The method is illustrated using several examples, including estimation of standard distributions, time series, and population genetics models.
△ Less
Submitted 5 December, 2012; v1 submitted 25 May, 2012;
originally announced May 2012.
-
Relevant statistics for Bayesian model choice
Authors:
J. -M. Marin,
N. Pillai,
C. P. Robert,
J. Rousseau
Abstract:
The choice of the summary statistics used in Bayesian inference and in particular in ABC algorithms has bearings on the validation of the resulting inference. Those statistics are nonetheless customarily used in ABC algorithms without consistency checks. We derive necessary and sufficient conditions on summary statistics for the corresponding Bayes factor to be convergent, namely to asymptotically…
▽ More
The choice of the summary statistics used in Bayesian inference and in particular in ABC algorithms has bearings on the validation of the resulting inference. Those statistics are nonetheless customarily used in ABC algorithms without consistency checks. We derive necessary and sufficient conditions on summary statistics for the corresponding Bayes factor to be convergent, namely to asymptotically select the true model. Those conditions, which amount to the expectations of the summary statistics to asymptotically differ under both models, are quite natural and can be exploited in ABC settings to infer whether or not a choice of summary statistics is appropriate, via a Monte Carlo validation.
△ Less
Submitted 22 August, 2013; v1 submitted 21 October, 2011;
originally announced October 2011.
-
Lack of confidence in ABC model choice
Authors:
Christian P. Robert,
Jean-Marie Cornuet,
Jean-Michel Marin,
Natesh Pillai
Abstract:
Approximate Bayesian computation (ABC) have become a essential tool for the analysis of complex stochastic models. Earlier, Grelaud et al. (2009) advocated the use of ABC for Bayesian model choice in the specific case of Gibbs random fields, relying on a inter-model sufficiency property to show that the approximation was legitimate. Having implemented ABC-based model choice in a wide range of phyl…
▽ More
Approximate Bayesian computation (ABC) have become a essential tool for the analysis of complex stochastic models. Earlier, Grelaud et al. (2009) advocated the use of ABC for Bayesian model choice in the specific case of Gibbs random fields, relying on a inter-model sufficiency property to show that the approximation was legitimate. Having implemented ABC-based model choice in a wide range of phylogenetic models in the DIY-ABC software (Cornuet et al., 2008), we now present theoretical background as to why a generic use of ABC for model choice is ungrounded, since it depends on an unknown amount of information loss induced by the use of insufficient summary statistics. The approximation error of the posterior probabilities of the models under comparison may thus be unrelated with the computational effort spent in running an ABC algorithm. We then conclude that additional empirical verifications of the performances of the ABC procedure as those available in DIYABC are necessary to conduct model choice.
△ Less
Submitted 20 June, 2011; v1 submitted 22 February, 2011;
originally announced February 2011.
-
Why approximate Bayesian computational (ABC) methods cannot handle model choice problems
Authors:
Christian Robert,
Jean-Michel Marin,
Natesh S. Pillai
Abstract:
Approximate Bayesian computation (ABC), also known as likelihood-free methods, have become a favourite tool for the analysis of complex stochastic models, primarily in population genetics but also in financial analyses. We advocated in Grelaud et al. (2009) the use of ABC for Bayesian model choice in the specific case of Gibbs random fields (GRF), relying on a sufficiency property mainly enjoyed b…
▽ More
Approximate Bayesian computation (ABC), also known as likelihood-free methods, have become a favourite tool for the analysis of complex stochastic models, primarily in population genetics but also in financial analyses. We advocated in Grelaud et al. (2009) the use of ABC for Bayesian model choice in the specific case of Gibbs random fields (GRF), relying on a sufficiency property mainly enjoyed by GRFs to show that the approach was legitimate. Despite having previously suggested the use of ABC for model choice in a wider range of models in the DIY ABC software (Cornuet et al., 2008), we present theoretical evidence that the general use of ABC for model choice is fraught with danger in the sense that no amount of computation, however large, can guarantee a proper approximation of the posterior probabilities of the models under comparison.
△ Less
Submitted 27 January, 2011; v1 submitted 26 January, 2011;
originally announced January 2011.
-
About incoherent inference
Authors:
Christian P. Robert
Abstract:
In Templeton (2010), the Approximate Bayesian Computation (ABC) algorithm (see, e.g., Pritchard et al., 1999, Beaumont et al., 2002, Marjoram et al., 2003, Ratmann et al., 2009) is criticised on mathematical and logical grounds: "the [Bayesian] inference is mathematically incorrect and formally illogical". Since those criticisms turn out to be bearing on Bayesian foundations rather than on the com…
▽ More
In Templeton (2010), the Approximate Bayesian Computation (ABC) algorithm (see, e.g., Pritchard et al., 1999, Beaumont et al., 2002, Marjoram et al., 2003, Ratmann et al., 2009) is criticised on mathematical and logical grounds: "the [Bayesian] inference is mathematically incorrect and formally illogical". Since those criticisms turn out to be bearing on Bayesian foundations rather than on the computational methodology they are primarily directed at, we endeavour to point out in this note the statistical errors and inconsistencies in Templeton (2010), refering to Beaumont et al. (2010) for a reply that is broader in scope since it also covers the phylogenetic aspects of nested clade versus a model-based approach.
△ Less
Submitted 19 June, 2010;
originally announced June 2010.
-
Evidence and Evolution: A Review
Authors:
Christian P. Robert
Abstract:
"Evidence and Evolution: the Logic behind the Science" was published in 2008 by Elliott Sober. It examines the philosophical foundations of the statistical arguments used to evaluate hypotheses in evolutionary biology, based on simple examples and likelihood ratios. The difficulty with reading the book from a statistician's perspective is the reluctance of the author to engage into model building…
▽ More
"Evidence and Evolution: the Logic behind the Science" was published in 2008 by Elliott Sober. It examines the philosophical foundations of the statistical arguments used to evaluate hypotheses in evolutionary biology, based on simple examples and likelihood ratios. The difficulty with reading the book from a statistician's perspective is the reluctance of the author to engage into model building and even less into parameter estimation. The first chapter nonetheless constitutes a splendid coverage of the most common statistical approaches to testing and model comparison, even though the advocation of the Akaike information criterion against Bayesian alternatives is rather forceful. The book also covers an examination of the "intelligent design" arguments against the Darwinian evolution theory, predictably if unnecessarily resorting to Popperian arguments to correctly argue that the creationist perspective fails to predict anything. The following chapters cover the more relevant issues of assessing selection versus drift and of testing for the presence of a common ancestor. While remaining a philosophy treatise, Evidence and Evolution is written in a way that is accessible to laymen, if rather unusual from a statistician viewpoint, and the insight about testing issues gained from Evidence and Evolution makes it a worthwhile read.
△ Less
Submitted 28 April, 2010;
originally announced April 2010.
-
Inferring population history with DIYABC: a user-friendly approach to Approximate Bayesian Computation
Authors:
Jean-Marie Cornuet,
Filipe Santos,
Mark A. Beaumont,
Christian P. Robert,
Jean-Michel Marin,
David J. Balding,
Thomas Guillemaud,
Arnaud Estoup
Abstract:
Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract this information (at least partially) but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We prop…
▽ More
Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract this information (at least partially) but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIYABC) for inference based on Approximate Bayesian Computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and stepwise population size changes. DIYABC can be used to compare competing scenarios, estimate parameters for one or more scenarios, and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article describes key methods used in the program and provides its main features. The analysis of one simulated and one real data set, both with complex evolutionary scenarios, illustrates the main possibilities of DIYABC
△ Less
Submitted 8 September, 2008; v1 submitted 28 April, 2008;
originally announced April 2008.