-
Comprehensive evaluation of differential expression analysis methods for RNA-seq data
Authors:
Franck Rapaport,
Raya Khanin,
Yupu Liang,
Azra Krek,
Paul Zumbo,
Christopher E. Mason,
Nicholas D. Socci,
Doron Betel
Abstract:
High-throughput sequencing of RNA transcripts (RNA-seq) has become the method of choice for detection of differential expression (DE). Concurrent with the growing popularity of this technology there has been a significant research effort devoted towards understanding the statistical properties of this data and the development of analysis methods. We report on a comprehensive evaluation of the comm…
▽ More
High-throughput sequencing of RNA transcripts (RNA-seq) has become the method of choice for detection of differential expression (DE). Concurrent with the growing popularity of this technology there has been a significant research effort devoted towards understanding the statistical properties of this data and the development of analysis methods. We report on a comprehensive evaluation of the commonly used DE methods using the SEQC benchmark data set. We evaluate a number of key features including: assessment of normalization, accuracy of DE detection, modeling of genes expressed in only one condition, and the impact of sequencing depth and number of replications on identifying DE genes. We find significant differences among the methods with no single method consistently outperforming the others. Furthermore, the performance of array-based approach is comparable to methods customized for RNA-seq data. Perhaps most importantly, our results demonstrate that increasing the number of replicate samples provides significantly more detection power than increased sequencing depth.
△ Less
Submitted 22 January, 2013; v1 submitted 22 January, 2013;
originally announced January 2013.
-
Classification of arrayCGH data using a fused SVM
Authors:
Franck Rapaport,
Emmanuel Barillot,
Jean-Philippe Vert
Abstract:
Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profil…
▽ More
Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profiles are characterized by a large number of variables usually measured on a limited number of samples. However, arrayCGH profiles have a particular structure of correlations between variables, due to the spatial organization of BACs along the genome. This suggests that classical classification methods, often based on the selection of a small number of discriminative features, may not be the most accurate methods and may not produce easily interpretable prediction rules.
Results: We propose a new method for supervised classification of arrayCGH data. The method is a variant of support vector machine (SVM) that incorporates the biological specificities of DNA copy number variations along the genome as prior knowledge. The resulting classifier is a sparse linear classifier based on a limited number of regions automatically selected on the chromosomes, leading to easy interpretation and identification of discriminative regions of the genome. We test this method on three classification problems for bladder and uveal cancer, involving both diagnosis and prognosis. We demonstrate that the introduction of the new prior on the classifier leads not only to more accurate predictions, but also to the identification of known and new regions of interest in the genome.
Availability: All data and algorithms are publicly available.
△ Less
Submitted 18 January, 2008;
originally announced January 2008.
-
Spectral analysis of gene expression profiles using gene networks
Authors:
Franck Rapaport,
Andrei Zinovyev,
Marie Dutreix,
Emmanuel Barillot,
Jean-Philippe Vert
Abstract:
Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks to elucidate the functions perturbed at the level of pathways. However, integrating a prior…
▽ More
Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks to elucidate the functions perturbed at the level of pathways. However, integrating a priori knowledge of the gene networks could help in the statistical analysis of gene expression data and in their biological interpretation. Here we propose a method to integrate a priori the knowledge of a gene network in the analysis of gene expression data. The approach is based on the spectral decomposition of gene expression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles with respect to the topology of the graph. We show how to derive unsupervised and supervised classification algorithms of expression profiles, resulting in classifiers with biological relevance. We applied the method to the analysis of a set of expression profiles from irradiated and non-irradiated yeast strains. It performed at least as well as the usual classification but provides much more biologically relevant results and allows a direct biological interpretation.
△ Less
Submitted 26 March, 2006;
originally announced March 2006.