Skip to main content

Showing 1–29 of 29 results for author: Vert, J

Searching in archive q-bio. Search in all archives.
.
  1. arXiv:1802.09381  [pdf, other

    q-bio.QM cs.CV q-bio.GN stat.ML

    DropLasso: A robust variant of Lasso for single cell RNA-seq data

    Authors: Beyrem Khalfaoui, Jean-Philippe Vert

    Abstract: Single-cell RNA sequencing (scRNA-seq) is a fast growing approach to measure the genome-wide transcriptome of many individual cells in parallel, but results in noisy data with many dropout events. Existing methods to learn molecular signatures from bulk transcriptomic data may therefore not be adapted to scRNA-seq data, in order to automatically classify individual cells into predefined classes. W… ▽ More

    Submitted 26 February, 2018; originally announced February 2018.

  2. arXiv:1802.05980  [pdf, other

    q-bio.QM cs.LG stat.ML

    WHInter: A Working set algorithm for High-dimensional sparse second order Interaction models

    Authors: Marine Le Morvan, Jean-Philippe Vert

    Abstract: Learning sparse linear models with two-way interactions is desirable in many application domains such as genomics. l1-regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate two-way interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features… ▽ More

    Submitted 16 February, 2018; originally announced February 2018.

  3. arXiv:1706.00244  [pdf, other

    stat.ML cs.LG q-bio.QM

    Supervised Quantile Normalisation

    Authors: Marine Le Morvan, Jean-Philippe Vert

    Abstract: Quantile normalisation is a popular normalisation method for data subject to unwanted variations such as images, speech, or genomic data. It applies a monotonic transformation to the feature values of each sample to ensure that after normalisation, they follow the same target distribution for each sample. Choosing a "good" target distribution remains however largely empirical and heuristic, and is… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

  4. arXiv:1506.07251  [pdf, other

    stat.ML cs.LG q-bio.QM

    Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data

    Authors: Kévin Vervier, Pierre Mahé, Jean-Baptiste Veyrieras, Jean-Philippe Vert

    Abstract: Microbial identification is a central issue in microbiology, in particular in the fields of infectious diseases diagnosis and industrial quality control. The concept of species is tightly linked to the concept of biological and clinical classification where the proximity between species is generally measured in terms of evolutionary distances and/or clinical phenotypes. Surprisingly, the informati… ▽ More

    Submitted 24 June, 2015; originally announced June 2015.

  5. arXiv:1505.06915  [pdf, other

    q-bio.QM cs.CE cs.LG q-bio.GN stat.ML

    Large-scale Machine Learning for Metagenomics Sequence Classification

    Authors: Kévin Vervier, Pierre Mahé, Maud Tournoud, Jean-Baptiste Veyrieras, Jean-Philippe Vert

    Abstract: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Due to the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasona… ▽ More

    Submitted 26 May, 2015; originally announced May 2015.

  6. arXiv:1205.1181  [pdf, other

    stat.ML q-bio.QM

    TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

    Authors: Anne-Claire Haury, Fantine Mordelet, Paola Vera-Licona, Jean-Philippe Vert

    Abstract: Inferring the structure of gene regulatory networks (GRN) from gene expression data has many applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. In this paper, we formulate GRN inference as a sparse regression problem and investi… ▽ More

    Submitted 6 May, 2012; originally announced May 2012.

  7. arXiv:1108.2588  [pdf, ps, other

    q-bio.MN physics.data-an

    Analysis of the impact degree distribution in metabolic networks using branching process approximation

    Authors: Kazuhiro Takemoto, Takeyuki Tamura, Yang Cong, Wai-Ki Ching, Jean-Philippe Vert, Tatsuya Akutsu

    Abstract: Theoretical frameworks to estimate the tolerance of metabolic networks to various failures are important to evaluate the robustness of biological complex systems in systems biology. In this paper, we focus on a measure for robustness in metabolic networks, namely, the impact degree, and propose an approximation method to predict the probability distribution of impact degrees from metabolic network… ▽ More

    Submitted 12 August, 2011; originally announced August 2011.

    Comments: 17 pages, 4 figures, 4 tables

    Journal ref: Physica A 391, 379 (2012)

  8. arXiv:1106.4199  [pdf, ps, other

    q-bio.QM stat.ML

    The group fused Lasso for multiple change-point detection

    Authors: Kevin Bleakley, Jean-Philippe Vert

    Abstract: We present the group fused Lasso for detection of multiple change-points shared by a set of co-occurring one-dimensional signals. Change-points are detected by approximating the original signals with a constraint on the multidimensional total variation, leading to piecewise-constant approximations. Fast algorithms are proposed to solve the resulting optimization problems, either exactly or approxi… ▽ More

    Submitted 21 June, 2011; originally announced June 2011.

  9. arXiv:1106.0134  [pdf, ps, other

    q-bio.QM stat.ML

    ProDiGe: PRioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples

    Authors: Fantine Mordelet, Jean-Philippe Vert

    Abstract: Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to pr… ▽ More

    Submitted 1 June, 2011; originally announced June 2011.

  10. arXiv:1101.5008  [pdf, ps, other

    q-bio.QM stat.AP stat.ML

    The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

    Authors: Anne-Claire Haury, Pierre Gestraud, Jean-Philippe Vert

    Abstract: Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. Methods: We compare 32 feature selection methods on 4 public gene… ▽ More

    Submitted 23 June, 2011; v1 submitted 26 January, 2011; originally announced January 2011.

    Journal ref: PLoS ONE (2011) 6(12): e28210

  11. arXiv:1001.3109  [pdf, ps, other

    stat.ML q-bio.GN q-bio.QM stat.AP

    Increasing stability and interpretability of gene expression signatures

    Authors: Anne-Claire Haury, Laurent Jacob, Jean-Philippe Vert

    Abstract: Motivation : Molecular signatures for diagnosis or prognosis estimated from large-scale gene expression data often lack robustness and stability, rendering their biological interpretation challenging. Increasing the signature's interpretability and stability across perturbations of a given dataset and, if possible, across datasets, is urgently needed to ease the discovery of important biological… ▽ More

    Submitted 18 January, 2010; originally announced January 2010.

  12. arXiv:0910.1167  [pdf, ps, other

    q-bio.QM

    Joint segmentation of many aCGH profiles using fast group LARS

    Authors: Kevin Bleakley, Jean-Philippe Vert

    Abstract: Array-Based Comparative Genomic Hybridization (aCGH) is a method used to search for genomic regions with copy numbers variations. For a given aCGH profile, one challenge is to accurately segment it into regions of constant copy number. Subjects sharing the same disease status, for example a type of cancer, often have aCGH profiles with similar copy number variations, due to duplications and dele… ▽ More

    Submitted 7 October, 2009; originally announced October 2009.

  13. arXiv:0907.1531  [pdf, ps, other

    stat.ML q-bio.BM q-bio.QM

    A new protein binding pocket similarity measure based on comparison of 3D atom clouds: application to ligand prediction

    Authors: Brice Hoffmann, Mikhail Zaslavskiy, Jean-Philippe Vert, Véronique Stoven

    Abstract: Motivation: Prediction of ligands for proteins of known 3D structure is important to understand structure-function relationship, predict molecular function, or design new drugs. Results: We explore a new approach for ligand prediction in which binding pockets are represented by atom clouds. Each target pocket is compared to an ensemble of pockets of known ligands. Pockets are aligned in 3D space… ▽ More

    Submitted 9 July, 2009; originally announced July 2009.

  14. arXiv:0905.1106  [pdf, ps, other

    math.OC math.CO q-bio.MN q-bio.QM

    Global alignment of protein-protein interaction networks by graph matching methods

    Authors: Mikhail Zaslavskiy, Francis Bach, Jean-Philippe Vert

    Abstract: Aligning protein-protein interaction (PPI) networks of different species has drawn a considerable interest recently. This problem is important to investigate evolutionary conserved pathways or protein complexes across species, and to help in the identification of functional orthologs through the detection of conserved interactions. It is however a difficult combinatorial problem, for which only… ▽ More

    Submitted 7 May, 2009; originally announced May 2009.

    Comments: Preprint version

  15. arXiv:0806.0215  [pdf, ps, other

    q-bio.QM

    Reconstruction of biological networks by supervised machine learning approaches

    Authors: Jean-Philippe Vert

    Abstract: We review a recent trend in computational systems biology which aims at using pattern recognition algorithms to infer the structure of large-scale biological networks from heterogeneous genomic data. We present several strategies that have been proposed and that lead to different pattern recognition problems and algorithms. The strenght of these approaches is illustrated on the reconstruction of… ▽ More

    Submitted 22 September, 2008; v1 submitted 2 June, 2008; originally announced June 2008.

  16. SIRENE: Supervised Inference of Regulatory Networks

    Authors: Fantine Mordelet, Jean-Philippe Vert

    Abstract: Living cells are the product of gene expression programs that involve the regulated transcription of thousands of genes. The elucidation of transcriptional regulatory networks in thus needed to understand the cell's working mechanism, and can for example be useful for the discovery of novel therapeutic targets. Although several methods have been proposed to infer gene regulatory networks from ge… ▽ More

    Submitted 27 February, 2008; originally announced February 2008.

    Journal ref: Bioinformatics 24, 16 (2008) i76-82

  17. arXiv:0801.4301  [pdf, ps, other

    q-bio.QM

    Virtual screening of GPCRs: an in silico chemogenomics approach

    Authors: Laurent Jacob, Brice Hoffmann, Véronique Stoven, Jean-Philippe Vert

    Abstract: The G-protein coupled receptor (GPCR) superfamily is currently the largest class of therapeutic targets. \textit{In silico} prediction of interactions between GPCRs and small molecules is therefore a crucial step in the drug discovery process, which remains a daunting task due to the difficulty to characterize the 3D structure of most GPCRs, and to the limited amount of known ligands for some me… ▽ More

    Submitted 28 January, 2008; originally announced January 2008.

  18. arXiv:0801.3007  [pdf, other

    q-bio.GN

    Classification of arrayCGH data using a fused SVM

    Authors: Franck Rapaport, Emmanuel Barillot, Jean-Philippe Vert

    Abstract: Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profil… ▽ More

    Submitted 18 January, 2008; originally announced January 2008.

  19. arXiv:0709.3931  [pdf, ps, other

    q-bio.QM

    Kernel methods for in silico chemogenomics

    Authors: Laurent Jacob, Jean-Philippe Vert

    Abstract: Predicting interactions between small molecules and proteins is a crucial ingredient of the drug discovery process. In particular, accurate predictive models are increasingly used to preselect potential lead compounds from large molecule databases, or to screen for side-effects. While classical in silico approaches focus on predicting interactions with a given specific target, new chemogenomics… ▽ More

    Submitted 25 September, 2007; originally announced September 2007.

  20. arXiv:0708.0171  [pdf, ps, other

    q-bio.QM cs.LG

    Virtual screening with support vector machines and structure kernels

    Authors: Pierre Mahé, Jean-Philippe Vert

    Abstract: Support vector machines and kernel methods have recently gained considerable attention in chemoinformatics. They offer generally good performance for problems of supervised classification or regression, and provide a flexible and computationally efficient framework to include relevant information and prior knowledge about the data and problems to be handled. In particular, with kernel methods mo… ▽ More

    Submitted 1 August, 2007; originally announced August 2007.

  21. arXiv:q-bio/0702054  [pdf, ps, other

    q-bio.QM math.ST

    Kernel matrix regression

    Authors: Yoshihiro Yamanishi, Jean-Philippe Vert

    Abstract: We address the problem of filling missing entries in a kernel Gram matrix, given a related full Gram matrix. We attack this problem from the viewpoint of regression, assuming that the two kernel matrices can be considered as explanatory variables and response variables, respectively. We propose a variant of the regression model based on the underlying features in the reproducing kernel Hilbert s… ▽ More

    Submitted 26 February, 2007; originally announced February 2007.

  22. arXiv:q-bio/0702008  [pdf, ps, other

    q-bio.QM

    Epitope prediction improved by multitask support vector machines

    Authors: Laurent Jacob, Jean-Philippe Vert

    Abstract: Motivation: In silico methods for the prediction of antigenic peptides binding to MHC class I molecules play an increasingly important role in the identification of T-cell epitopes. Statistical and machine learning methods, in particular, are widely used to score candidate epitopes based on their similarity with known epitopes and non epitopes. The genes coding for the MHC molecules, however, ar… ▽ More

    Submitted 6 February, 2007; originally announced February 2007.

    Journal ref: We use various multitask kernels in order to improve MHC-I-peptide binding prediction, in particular for MHC alleles for which few training data is available. (05/02/2007)

  23. arXiv:q-bio/0610040  [pdf, ps, other

    q-bio.QM cs.LG

    Metric learning pairwise kernel for graph inference

    Authors: Jean-Philippe Vert, Jian Qiu, William Stafford Noble

    Abstract: Much recent work in bioinformatics has focused on the inference of various types of biological networks, representing gene regulation, metabolic processes, protein-protein interactions, etc. A common setting involves inferring network edges in a supervised fashion from a set of high-confidence edges, possibly characterized by multiple, heterogeneous data sets (protein sequence, gene expression,… ▽ More

    Submitted 21 October, 2006; originally announced October 2006.

  24. arXiv:q-bio/0609024  [pdf, ps, other

    q-bio.QM

    Graph kernels based on tree patterns for molecules

    Authors: Pierre Mahé, Jean-Philippe Vert

    Abstract: Motivated by chemical applications, we revisit and extend a family of positive definite kernels for graphs based on the detection of common subtrees, initially proposed by Ramon et al. (2003). We propose new kernels with a parameter to control the complexity of the subtrees used as features to represent the graphs. This parameter allows to smoothly interpolate between classical graph kernels bas… ▽ More

    Submitted 15 September, 2006; originally announced September 2006.

  25. arXiv:q-bio/0603030  [pdf, ps, other

    q-bio.QM

    Spectral analysis of gene expression profiles using gene networks

    Authors: Franck Rapaport, Andrei Zinovyev, Marie Dutreix, Emmanuel Barillot, Jean-Philippe Vert

    Abstract: Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks to elucidate the functions perturbed at the level of pathways. However, integrating a prior… ▽ More

    Submitted 26 March, 2006; originally announced March 2006.

  26. arXiv:q-bio/0603006  [pdf, ps, other

    q-bio.QM

    The pharmacophore kernel for virtual screening with support vector machines

    Authors: Pierre Mahé, Liva Ralaivola, Véronique Stoven, Jean-Philippe Vert

    Abstract: We introduce a family of positive definite kernels specifically optimized for the manipulation of 3D structures of molecules with kernel methods. The kernels are based on the comparison of the three-points pharmacophores present in the 3D structures of molecul es, a set of molecular features known to be particularly relevant for virtual screening applications. We present a computationally demand… ▽ More

    Submitted 3 March, 2006; originally announced March 2006.

  27. arXiv:q-bio/0510032  [pdf, ps, other

    q-bio.QM

    Kernel methods in genomics and computational biology

    Authors: Jean-Philippe Vert

    Abstract: Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the nat… ▽ More

    Submitted 17 October, 2005; originally announced October 2005.

  28. A covariance kernel for proteins

    Authors: Marco Cuturi, Jean-Philippe Vert

    Abstract: We propose a new kernel for biological sequences which borrows ideas and techniques from information theory and data compression. This kernel can be used in combination with any kernel method, in particular Support Vector Machines for protein classification. By incorporating prior biological assumptions on the properties of amino-acid sequences and using a Bayesian averaging framework, we comput… ▽ More

    Submitted 16 October, 2003; originally announced October 2003.

    Comments: 12 pages, 1 figure

  29. arXiv:physics/0206055  [pdf, ps, other

    physics.bio-ph physics.data-an q-bio.MN

    Graph-driven features extraction from microarray data

    Authors: Jean-Philippe Vert, Minoru Kanehisa

    Abstract: Gene function prediction from microarray data is a first step toward better understanding the machinery of the cell from relatively cheap and easy-to-produce data. In this paper we investigate whether the knowledge of many metabolic pathways and their catalyzing enzymes accumulated over the years can help improve the performance of classifiers for this problem. The complex network of known bio… ▽ More

    Submitted 17 June, 2002; originally announced June 2002.

    Comments: 31 pages, 2 figures