Skip to main content

Showing 1–50 of 59 results for author: Vert, J

.
  1. arXiv:2211.05641  [pdf, other

    cs.LG cs.AI stat.ML

    Regression as Classification: Influence of Task Formulation on Neural Network Features

    Authors: Lawrence Stewart, Francis Bach, Quentin Berthet, Jean-Philippe Vert

    Abstract: Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss. However, practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance. By focusing on two-layer ReLU networks, which can be fully characterized by measures over their feature spa… ▽ More

    Submitted 1 March, 2023; v1 submitted 10 November, 2022; originally announced November 2022.

  2. arXiv:2206.06929  [pdf, other

    cs.LG stat.ML

    Scaling ResNets in the Large-depth Regime

    Authors: Pierre Marion, Adeline Fermanian, Gérard Biau, Jean-Philippe Vert

    Abstract: Deep ResNets are recognized for achieving state-of-the-art results in complex machine learning tasks. However, the remarkable performance of these architectures relies on a training procedure that needs to be carefully crafted to avoid vanishing or exploding gradients, particularly as the depth $L$ increases. No consensus has been reached on how to mitigate this issue, although a widely discussed… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: 44 pages, 9 figures. Updated with clarifications and additional references

  3. arXiv:2106.01202  [pdf, other

    stat.ML cs.LG

    Framing RNN as a kernel method: A neural ODE approach

    Authors: Adeline Fermanian, Pierre Marion, Jean-Philippe Vert, Gérard Biau

    Abstract: Building on the interpretation of a recurrent neural network (RNN) as a continuous-time neural differential equation, we show, under appropriate conditions, that the solution of a RNN can be viewed as a linear function of a specific feature set of the input sequence, known as the signature. This connection allows us to frame a RNN as a kernel method in a suitable reproducing kernel Hilbert space.… ▽ More

    Submitted 29 October, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

    Comments: 33 pages, 7 figures, accepted for an oral presentation at NeurIPS 2021

  4. arXiv:2105.15183  [pdf, other

    cs.LG math.NA stat.ML

    Efficient and Modular Implicit Differentiation

    Authors: Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert

    Abstract: Automatic differentiation (autodiff) has revolutionized machine learning. It allows to express complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization layers, and in bi-level problems suc… ▽ More

    Submitted 12 October, 2022; v1 submitted 31 May, 2021; originally announced May 2021.

    Comments: V3: added more related work and Jacobian precision figure

  5. arXiv:2011.08047  [pdf, other

    stat.ME

    Causal inference methods for combining randomized trials and observational studies: a review

    Authors: Bénédicte Colnet, Imke Mayer, Guanhua Chen, Awa Dieng, Ruohong Li, Gaël Varoquaux, Jean-Philippe Vert, Julie Josse, Shu Yang

    Abstract: With increasing data availability, causal effects can be evaluated across different data sets, both randomized controlled trials (RCTs) and observational studies. RCTs isolate the effect of the treatment from that of unwanted (confounding) co-occurring effects but they may suffer from unrepresentativeness, and thus lack external validity. On the other hand, large observational samples are often mo… ▽ More

    Submitted 10 January, 2023; v1 submitted 16 November, 2020; originally announced November 2020.

  6. arXiv:2010.08354  [pdf, other

    cs.LG stat.ML

    Differentiable Divergences Between Time Series

    Authors: Mathieu Blondel, Arthur Mensch, Jean-Philippe Vert

    Abstract: Computing the discrepancy between time series of variable sizes is notoriously challenging. While dynamic time war** (DTW) is popularly used for this purpose, it is not differentiable everywhere and is known to lead to bad local optima when used as a "loss". Soft-DTW addresses these issues, but it is not a positive definite divergence: due to the bias introduced by entropic regularization, it ca… ▽ More

    Submitted 25 February, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

    Comments: V3: AISTATS 2021 camera-ready

  7. arXiv:2006.06049  [pdf, other

    cs.LG stat.ML

    On Mixup Regularization

    Authors: Luigi Carratino, Moustapha Cissé, Rodolphe Jenatton, Jean-Philippe Vert

    Abstract: Mixup is a data augmentation technique that creates new examples as convex combinations of training points and labels. This simple technique has empirically shown to improve the accuracy of many state-of-the-art models in different settings and applications, but the reasons behind this empirical success remain poorly understood. In this paper we take a substantial step in explaining the theoretica… ▽ More

    Submitted 17 October, 2022; v1 submitted 10 June, 2020; originally announced June 2020.

  8. arXiv:2004.12508  [pdf, other

    stat.ME cs.LG stat.AP

    Noisy Adaptive Group Testing using Bayesian Sequential Experimental Design

    Authors: Marco Cuturi, Olivier Teboul, Quentin Berthet, Arnaud Doucet, Jean-Philippe Vert

    Abstract: When the infection prevalence of a disease is low, Dorfman showed 80 years ago that testing groups of people can prove more efficient than testing people individually. Our goal in this paper is to propose new group testing algorithms that can operate in a noisy setting (tests can be mistaken) to decide adaptively (looking at past results) which groups to test next, with the goal to converge to a g… ▽ More

    Submitted 22 July, 2020; v1 submitted 26 April, 2020; originally announced April 2020.

    Comments: Latest version, with updated experiments, new conclusions on LBP vs SMC decoding and new approach

  9. arXiv:2002.10837  [pdf, ps, other

    stat.ME cs.LG stat.ML

    MissDeepCausal: Causal Inference from Incomplete Data Using Deep Latent Variable Models

    Authors: Imke Mayer, Julie Josse, Félix Raimundo, Jean-Philippe Vert

    Abstract: Inferring causal effects of a treatment, intervention or policy from observational data is central to many applications. However, state-of-the-art methods for causal inference seldom consider the possibility that covariates have missing values, which is ubiquitous in many real-world analyses. Missing data greatly complicate causal inference procedures as they require an adapted unconfoundedness hy… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

  10. arXiv:2002.08676  [pdf, other

    cs.LG math.OC stat.ML

    Learning with Differentiable Perturbed Optimizers

    Authors: Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

    Abstract: Machine learning pipelines often rely on optimization procedures to make discrete decisions (e.g., sorting, picking closest neighbors, or shortest paths). Although these discrete decisions are easily computed, they break the back-propagation of computational graphs. In order to expand the scope of learning problems that can be solved in an end-to-end fashion, we propose a systematic method to tran… ▽ More

    Submitted 9 June, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

  11. arXiv:2002.03229  [pdf, other

    cs.LG stat.ML

    Supervised Quantile Normalization for Low-rank Matrix Approximation

    Authors: Marco Cuturi, Olivier Teboul, Jonathan Niles-Weed, Jean-Philippe Vert

    Abstract: Low rank matrix factorization is a fundamental building block in machine learning, used for instance to summarize gene expression profile data or word-document counts. To be robust to outliers and differences in scale across features, a matrix factorization step is usually preceded by ad-hoc feature normalization steps, such as \texttt{tf-idf} scaling or data whitening. We propose in this work to… ▽ More

    Submitted 3 July, 2020; v1 submitted 8 February, 2020; originally announced February 2020.

    Comments: new version with genomics experiments

    Journal ref: ICML 2020

  12. arXiv:1910.09036  [pdf, other

    cs.LG stat.ML

    Differentiable Deep Clustering with Cluster Size Constraints

    Authors: Aude Genevay, Gabriel Dulac-Arnold, Jean-Philippe Vert

    Abstract: Clustering is a fundamental unsupervised learning approach. Many clustering algorithms -- such as $k$-means -- rely on the euclidean distance as a similarity measure, which is often not the most relevant metric for high dimensional data such as images. Learning a lower-dimensional embedding that can better reflect the geometry of the dataset is therefore instrumental for performance. We propose a… ▽ More

    Submitted 20 October, 2019; originally announced October 2019.

  13. arXiv:1909.09819  [pdf, other

    stat.ML cs.LG

    ASNI: Adaptive Structured Noise Injection for shallow and deep neural networks

    Authors: Beyrem Khalfaoui, Joseph Boyd, Jean-Philippe Vert

    Abstract: Dropout is a regularisation technique in neural network training where unit activations are randomly set to zero with a given probability \emph{independently}. In this work, we propose a generalisation of dropout and other multiplicative noise injection schemes for shallow and deep neural networks, where the random noise applied to different units is not independent but follows a joint distributio… ▽ More

    Submitted 21 September, 2019; originally announced September 2019.

    Comments: All code concerning the real data experiments is available at \url{https://github.com/BeyremKh/ASNI}\\

  14. arXiv:1905.12909  [pdf, other

    cs.LG stat.ML

    Deep multi-class learning from label proportions

    Authors: Gabriel Dulac-Arnold, Neil Zeghidour, Marco Cuturi, Lucas Beyer, Jean-Philippe Vert

    Abstract: We propose a learning algorithm capable of learning from label proportions instead of direct data labels. In this scenario, our data are arranged into various bags of a certain size, and only the proportions of each label within a given bag are known. This is a common situation in cases where per-data labeling is lengthy, but a more general label is easily accessible. Several approaches have been… ▽ More

    Submitted 26 June, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

  15. arXiv:1905.11885  [pdf, other

    cs.LG stat.ML

    Differentiable Ranks and Sorting using Optimal Transport

    Authors: Marco Cuturi, Olivier Teboul, Jean-Philippe Vert

    Abstract: Sorting an array is a fundamental routine in machine learning, one that is used to compute rank-based statistics, cumulative distribution functions (CDFs), quantiles, or to select closest neighbors and labels. The sorting function is however piece-wise constant (the sorting permutation of a vector does not change if the entries of that vector are infinitesimally perturbed) and therefore has no gra… ▽ More

    Submitted 2 November, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

  16. arXiv:1806.00664  [pdf, other

    math.OC

    Robust Seriation and Applications to Cancer Genomics

    Authors: Antoine Recanati, Nicolas Servant, Jean-Philippe Vert, Alexandre d'Aspremont

    Abstract: The seriation problem seeks to reorder a set of elements given pairwise similarity information, so that elements with higher similarity are closer in the resulting sequence. When a global ordering consistent with the similarity information exists, an exact spectral solution recovers it in the noiseless case and seriation is equivalent to the combinatorial 2-SUM problem over permutations, for which… ▽ More

    Submitted 2 June, 2018; originally announced June 2018.

  17. arXiv:1805.07943  [pdf, other

    cs.LG stat.ML

    Relating Leverage Scores and Density using Regularized Christoffel Functions

    Authors: Edouard Pauwels, Francis Bach, Jean-Philippe Vert

    Abstract: Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature. Yet, the very nature of this quantity is barely understood. Borrowing ideas from the orthogonal polynomial literature, we introduce the regularized Christoffel function associated to a positive definite k… ▽ More

    Submitted 21 November, 2018; v1 submitted 21 May, 2018; originally announced May 2018.

  18. arXiv:1802.09381  [pdf, other

    q-bio.QM cs.CV q-bio.GN stat.ML

    DropLasso: A robust variant of Lasso for single cell RNA-seq data

    Authors: Beyrem Khalfaoui, Jean-Philippe Vert

    Abstract: Single-cell RNA sequencing (scRNA-seq) is a fast growing approach to measure the genome-wide transcriptome of many individual cells in parallel, but results in noisy data with many dropout events. Existing methods to learn molecular signatures from bulk transcriptomic data may therefore not be adapted to scRNA-seq data, in order to automatically classify individual cells into predefined classes. W… ▽ More

    Submitted 26 February, 2018; originally announced February 2018.

  19. arXiv:1802.08526  [pdf, other

    stat.ML cs.LG

    The Weighted Kendall and High-order Kernels for Permutations

    Authors: Yunlong Jiao, Jean-Philippe Vert

    Abstract: We propose new positive definite kernels for permutations. First we introduce a weighted version of the Kendall kernel, which allows to weight unequally the contributions of different item pairs in the permutations depending on their ranks. Like the Kendall kernel, we show that the weighted version is invariant to relabeling of items and can be computed efficiently in $O(n \ln(n))$ operations, whe… ▽ More

    Submitted 12 June, 2018; v1 submitted 23 February, 2018; originally announced February 2018.

    Comments: Published in ICML 2018

  20. arXiv:1802.05980  [pdf, other

    q-bio.QM cs.LG stat.ML

    WHInter: A Working set algorithm for High-dimensional sparse second order Interaction models

    Authors: Marine Le Morvan, Jean-Philippe Vert

    Abstract: Learning sparse linear models with two-way interactions is desirable in many application domains such as genomics. l1-regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate two-way interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features… ▽ More

    Submitted 16 February, 2018; originally announced February 2018.

  21. arXiv:1706.00244  [pdf, other

    stat.ML cs.LG q-bio.QM

    Supervised Quantile Normalisation

    Authors: Marine Le Morvan, Jean-Philippe Vert

    Abstract: Quantile normalisation is a popular normalisation method for data subject to unwanted variations such as images, speech, or genomic data. It applies a monotonic transformation to the feature values of each sample to ensure that after normalisation, they follow the same target distribution for each sample. Choosing a "good" target distribution remains however largely empirical and heuristic, and is… ▽ More

    Submitted 1 June, 2017; originally announced June 2017.

  22. arXiv:1506.07251  [pdf, other

    stat.ML cs.LG q-bio.QM

    Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data

    Authors: Kévin Vervier, Pierre Mahé, Jean-Baptiste Veyrieras, Jean-Philippe Vert

    Abstract: Microbial identification is a central issue in microbiology, in particular in the fields of infectious diseases diagnosis and industrial quality control. The concept of species is tightly linked to the concept of biological and clinical classification where the proximity between species is generally measured in terms of evolutionary distances and/or clinical phenotypes. Surprisingly, the informati… ▽ More

    Submitted 24 June, 2015; originally announced June 2015.

  23. arXiv:1505.06915  [pdf, other

    q-bio.QM cs.CE cs.LG q-bio.GN stat.ML

    Large-scale Machine Learning for Metagenomics Sequence Classification

    Authors: Kévin Vervier, Pierre Mahé, Maud Tournoud, Jean-Baptiste Veyrieras, Jean-Philippe Vert

    Abstract: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Due to the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasona… ▽ More

    Submitted 26 May, 2015; originally announced May 2015.

  24. arXiv:1407.5158  [pdf, ps, other

    stat.ML cs.LG math.ST

    Tight convex relaxations for sparse matrix factorization

    Authors: Emile Richard, Guillaume Obozinski, Jean-Philippe Vert

    Abstract: Based on a new atomic norm, we propose a new convex formulation for sparse matrix factorization problems in which the number of nonzero elements of the factors is assumed fixed and known. The formulation counts sparse PCA with multiple factors, subspace clustering and low-rank sparse bilinear regression as potential applications. We compute slow rates and an upper bound on the statistical dimensio… ▽ More

    Submitted 4 December, 2014; v1 submitted 19 July, 2014; originally announced July 2014.

  25. arXiv:1405.2881  [pdf, ps, other

    math.ST stat.ML

    Consistency of random forests

    Authors: Erwan Scornet, Gérard Biau, Jean-Philippe Vert

    Abstract: Random forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (2001) 5--32] that combines several randomized decision trees and aggregates their predictions by averaging. Despite its wide usage and outstanding practical performance, little is known about the mathematical properties of the procedure. This disparity between theory and practice originates in the difficulty to simultane… ▽ More

    Submitted 8 August, 2015; v1 submitted 12 May, 2014; originally announced May 2014.

    Journal ref: Annals of Statistics, Institute of Mathematical Statistics (IMS), 2015, 43 (4), pp.1716-1741

  26. arXiv:1205.1181  [pdf, other

    stat.ML q-bio.QM

    TIGRESS: Trustful Inference of Gene REgulation using Stability Selection

    Authors: Anne-Claire Haury, Fantine Mordelet, Paola Vera-Licona, Jean-Philippe Vert

    Abstract: Inferring the structure of gene regulatory networks (GRN) from gene expression data has many applications, from the elucidation of complex biological processes to the identification of potential drug targets. It is however a notoriously difficult problem, for which the many existing methods reach limited accuracy. In this paper, we formulate GRN inference as a sparse regression problem and investi… ▽ More

    Submitted 6 May, 2012; originally announced May 2012.

  27. arXiv:1110.0413  [pdf, other

    stat.ML cs.LG

    Group Lasso with Overlaps: the Latent Group Lasso approach

    Authors: Guillaume Obozinski, Laurent Jacob, Jean-Philippe Vert

    Abstract: We study a norm for structured sparsity which leads to sparse linear predictors whose supports are unions of prede ned overlap** groups of variables. We call the obtained formulation latent group Lasso, since it is based on applying the usual group Lasso penalty on a set of latent variables. A detailed analysis of the norm and its properties is presented and we characterize conditions under whic… ▽ More

    Submitted 3 October, 2011; originally announced October 2011.

  28. arXiv:1108.2588  [pdf, ps, other

    q-bio.MN physics.data-an

    Analysis of the impact degree distribution in metabolic networks using branching process approximation

    Authors: Kazuhiro Takemoto, Takeyuki Tamura, Yang Cong, Wai-Ki Ching, Jean-Philippe Vert, Tatsuya Akutsu

    Abstract: Theoretical frameworks to estimate the tolerance of metabolic networks to various failures are important to evaluate the robustness of biological complex systems in systems biology. In this paper, we focus on a measure for robustness in metabolic networks, namely, the impact degree, and propose an approximation method to predict the probability distribution of impact degrees from metabolic network… ▽ More

    Submitted 12 August, 2011; originally announced August 2011.

    Comments: 17 pages, 4 figures, 4 tables

    Journal ref: Physica A 391, 379 (2012)

  29. arXiv:1106.4199  [pdf, ps, other

    q-bio.QM stat.ML

    The group fused Lasso for multiple change-point detection

    Authors: Kevin Bleakley, Jean-Philippe Vert

    Abstract: We present the group fused Lasso for detection of multiple change-points shared by a set of co-occurring one-dimensional signals. Change-points are detected by approximating the original signals with a constraint on the multidimensional total variation, leading to piecewise-constant approximations. Fast algorithms are proposed to solve the resulting optimization problems, either exactly or approxi… ▽ More

    Submitted 21 June, 2011; originally announced June 2011.

  30. arXiv:1106.0134  [pdf, ps, other

    q-bio.QM stat.ML

    ProDiGe: PRioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples

    Authors: Fantine Mordelet, Jean-Philippe Vert

    Abstract: Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to pr… ▽ More

    Submitted 1 June, 2011; originally announced June 2011.

  31. arXiv:1101.5008  [pdf, ps, other

    q-bio.QM stat.AP stat.ML

    The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures

    Authors: Anne-Claire Haury, Pierre Gestraud, Jean-Philippe Vert

    Abstract: Motivation: Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in biology and medicine. It is also extremely challenging from a statistical viewpoint, but surprisingly few studies have investigated the relative strengths and weaknesses of the plethora of existing feature selection methods. Methods: We compare 32 feature selection methods on 4 public gene… ▽ More

    Submitted 23 June, 2011; v1 submitted 26 January, 2011; originally announced January 2011.

    Journal ref: PLoS ONE (2011) 6(12): e28210

  32. arXiv:1010.0772  [pdf, ps, other

    stat.ML

    A bagging SVM to learn from positive and unlabeled examples

    Authors: Fantine Mordelet, Jean-Philippe Vert

    Abstract: We consider the problem of learning a binary classifier from a training set of positive and unlabeled examples, both in the inductive and in the transductive setting. This problem, often referred to as \emph{PU learning}, differs from the standard supervised classification problem by the lack of negative examples in the training set. It corresponds to an ubiquitous situation in many applications s… ▽ More

    Submitted 5 October, 2010; originally announced October 2010.

  33. arXiv:1004.4965  [pdf, ps, other

    stat.ML cs.CV

    Many-to-Many Graph Matching: a Continuous Relaxation Approach

    Authors: Mikhail Zaslavskiy, Francis Bach, Jean-Philippe Vert

    Abstract: Graphs provide an efficient tool for object representation in various computer vision applications. Once graph-based representations are constructed, an important question is how to compare graphs. This problem is often formulated as a graph matching problem where one seeks a map** between vertices of two graphs which optimally aligns their structure. In the classical formulation of graph matchi… ▽ More

    Submitted 28 April, 2010; originally announced April 2010.

    Comments: 19

  34. arXiv:1001.3109  [pdf, ps, other

    stat.ML q-bio.GN q-bio.QM stat.AP

    Increasing stability and interpretability of gene expression signatures

    Authors: Anne-Claire Haury, Laurent Jacob, Jean-Philippe Vert

    Abstract: Motivation : Molecular signatures for diagnosis or prognosis estimated from large-scale gene expression data often lack robustness and stability, rendering their biological interpretation challenging. Increasing the signature's interpretability and stability across perturbations of a given dataset and, if possible, across datasets, is urgently needed to ease the discovery of important biological… ▽ More

    Submitted 18 January, 2010; originally announced January 2010.

  35. arXiv:0910.1167  [pdf, ps, other

    q-bio.QM

    Joint segmentation of many aCGH profiles using fast group LARS

    Authors: Kevin Bleakley, Jean-Philippe Vert

    Abstract: Array-Based Comparative Genomic Hybridization (aCGH) is a method used to search for genomic regions with copy numbers variations. For a given aCGH profile, one challenge is to accurately segment it into regions of constant copy number. Subjects sharing the same disease status, for example a type of cancer, often have aCGH profiles with similar copy number variations, due to duplications and dele… ▽ More

    Submitted 7 October, 2009; originally announced October 2009.

  36. arXiv:0907.1531  [pdf, ps, other

    stat.ML q-bio.BM q-bio.QM

    A new protein binding pocket similarity measure based on comparison of 3D atom clouds: application to ligand prediction

    Authors: Brice Hoffmann, Mikhail Zaslavskiy, Jean-Philippe Vert, Véronique Stoven

    Abstract: Motivation: Prediction of ligands for proteins of known 3D structure is important to understand structure-function relationship, predict molecular function, or design new drugs. Results: We explore a new approach for ligand prediction in which binding pockets are represented by atom clouds. Each target pocket is compared to an ensemble of pockets of known ligands. Pockets are aligned in 3D space… ▽ More

    Submitted 9 July, 2009; originally announced July 2009.

  37. arXiv:0905.1106  [pdf, ps, other

    math.OC math.CO q-bio.MN q-bio.QM

    Global alignment of protein-protein interaction networks by graph matching methods

    Authors: Mikhail Zaslavskiy, Francis Bach, Jean-Philippe Vert

    Abstract: Aligning protein-protein interaction (PPI) networks of different species has drawn a considerable interest recently. This problem is important to investigate evolutionary conserved pathways or protein complexes across species, and to help in the identification of functional orthologs through the detection of conserved interactions. It is however a difficult combinatorial problem, for which only… ▽ More

    Submitted 7 May, 2009; originally announced May 2009.

    Comments: Preprint version

  38. arXiv:0809.2085  [pdf, ps, other

    cs.LG

    Clustered Multi-Task Learning: A Convex Formulation

    Authors: Laurent Jacob, Francis Bach, Jean-Philippe Vert

    Abstract: In multi-task learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they… ▽ More

    Submitted 11 September, 2008; originally announced September 2008.

  39. arXiv:0806.0215  [pdf, ps, other

    q-bio.QM

    Reconstruction of biological networks by supervised machine learning approaches

    Authors: Jean-Philippe Vert

    Abstract: We review a recent trend in computational systems biology which aims at using pattern recognition algorithms to infer the structure of large-scale biological networks from heterogeneous genomic data. We present several strategies that have been proposed and that lead to different pattern recognition problems and algorithms. The strenght of these approaches is illustrated on the reconstruction of… ▽ More

    Submitted 22 September, 2008; v1 submitted 2 June, 2008; originally announced June 2008.

  40. SIRENE: Supervised Inference of Regulatory Networks

    Authors: Fantine Mordelet, Jean-Philippe Vert

    Abstract: Living cells are the product of gene expression programs that involve the regulated transcription of thousands of genes. The elucidation of transcriptional regulatory networks in thus needed to understand the cell's working mechanism, and can for example be useful for the discovery of novel therapeutic targets. Although several methods have been proposed to infer gene regulatory networks from ge… ▽ More

    Submitted 27 February, 2008; originally announced February 2008.

    Journal ref: Bioinformatics 24, 16 (2008) i76-82

  41. arXiv:0802.1430  [pdf, ps, other

    cs.LG

    A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization

    Authors: Jacob Abernethy, Francis Bach, Theodoros Evgeniou, Jean-Philippe Vert

    Abstract: We present a general approach for collaborative filtering (CF) using spectral regularization to learn linear operators from "users" to the "objects" they rate. Recent low-rank type matrix completion approaches to CF are shown to be special cases. However, unlike existing regularization based CF methods, our approach can be used to also incorporate information such as attributes of the users or t… ▽ More

    Submitted 19 December, 2008; v1 submitted 11 February, 2008; originally announced February 2008.

  42. arXiv:0801.4301  [pdf, ps, other

    q-bio.QM

    Virtual screening of GPCRs: an in silico chemogenomics approach

    Authors: Laurent Jacob, Brice Hoffmann, Véronique Stoven, Jean-Philippe Vert

    Abstract: The G-protein coupled receptor (GPCR) superfamily is currently the largest class of therapeutic targets. \textit{In silico} prediction of interactions between GPCRs and small molecules is therefore a crucial step in the drug discovery process, which remains a daunting task due to the difficulty to characterize the 3D structure of most GPCRs, and to the limited amount of known ligands for some me… ▽ More

    Submitted 28 January, 2008; originally announced January 2008.

  43. arXiv:0801.4061  [pdf, ps, other

    cs.LG

    The optimal assignment kernel is not positive definite

    Authors: Jean-Philippe Vert

    Abstract: We prove that the optimal assignment kernel, proposed recently as an attempt to embed labeled graphs and more generally tuples of basic data to a Hilbert space, is in fact not always positive definite.

    Submitted 26 January, 2008; originally announced January 2008.

  44. arXiv:0801.3654  [pdf, ps, other

    cs.CV cs.DM

    A path following algorithm for the graph matching problem

    Authors: Mikhail Zaslavskiy, Francis Bach, Jean-Philippe Vert

    Abstract: We propose a convex-concave programming approach for the labeled weighted graph matching problem. The convex-concave programming formulation is obtained by rewriting the weighted graph matching problem as a least-square problem on the set of permutation matrices and relaxing it to two different optimization problems: a quadratic convex and a quadratic concave optimization problem on the set of d… ▽ More

    Submitted 27 October, 2008; v1 submitted 23 January, 2008; originally announced January 2008.

    Comments: 23 pages, 13 figures,typo correction, new results in sections 4,5,6

  45. arXiv:0801.3007  [pdf, other

    q-bio.GN

    Classification of arrayCGH data using a fused SVM

    Authors: Franck Rapaport, Emmanuel Barillot, Jean-Philippe Vert

    Abstract: Motivation: Array-based comparative genomic hybridization (arrayCGH) has recently become a popular tool to identify DNA copy number variations along the genome. These profiles are starting to be used as markers to improve prognosis or diagnosis of cancer, which implies that methods for automated supervised classification of arrayCGH data are needed. Like gene expression profiles, arrayCGH profil… ▽ More

    Submitted 18 January, 2008; originally announced January 2008.

  46. arXiv:0709.3931  [pdf, ps, other

    q-bio.QM

    Kernel methods for in silico chemogenomics

    Authors: Laurent Jacob, Jean-Philippe Vert

    Abstract: Predicting interactions between small molecules and proteins is a crucial ingredient of the drug discovery process. In particular, accurate predictive models are increasingly used to preselect potential lead compounds from large molecule databases, or to screen for side-effects. While classical in silico approaches focus on predicting interactions with a given specific target, new chemogenomics… ▽ More

    Submitted 25 September, 2007; originally announced September 2007.

  47. arXiv:0708.0171  [pdf, ps, other

    q-bio.QM cs.LG

    Virtual screening with support vector machines and structure kernels

    Authors: Pierre Mahé, Jean-Philippe Vert

    Abstract: Support vector machines and kernel methods have recently gained considerable attention in chemoinformatics. They offer generally good performance for problems of supervised classification or regression, and provide a flexible and computationally efficient framework to include relevant information and prior knowledge about the data and problems to be handled. In particular, with kernel methods mo… ▽ More

    Submitted 1 August, 2007; originally announced August 2007.

  48. arXiv:q-bio/0702054  [pdf, ps, other

    q-bio.QM math.ST

    Kernel matrix regression

    Authors: Yoshihiro Yamanishi, Jean-Philippe Vert

    Abstract: We address the problem of filling missing entries in a kernel Gram matrix, given a related full Gram matrix. We attack this problem from the viewpoint of regression, assuming that the two kernel matrices can be considered as explanatory variables and response variables, respectively. We propose a variant of the regression model based on the underlying features in the reproducing kernel Hilbert s… ▽ More

    Submitted 26 February, 2007; originally announced February 2007.

  49. arXiv:q-bio/0702008  [pdf, ps, other

    q-bio.QM

    Epitope prediction improved by multitask support vector machines

    Authors: Laurent Jacob, Jean-Philippe Vert

    Abstract: Motivation: In silico methods for the prediction of antigenic peptides binding to MHC class I molecules play an increasingly important role in the identification of T-cell epitopes. Statistical and machine learning methods, in particular, are widely used to score candidate epitopes based on their similarity with known epitopes and non epitopes. The genes coding for the MHC molecules, however, ar… ▽ More

    Submitted 6 February, 2007; originally announced February 2007.

    Journal ref: We use various multitask kernels in order to improve MHC-I-peptide binding prediction, in particular for MHC alleles for which few training data is available. (05/02/2007)

  50. arXiv:cs/0611124  [pdf, ps, other

    cs.LG cs.AI cs.IR

    Low-rank matrix factorization with attributes

    Authors: Jacob Abernethy, Francis Bach, Theodoros Evgeniou, Jean-Philippe Vert

    Abstract: We develop a new collaborative filtering (CF) method that combines both previously known users' preferences, i.e. standard CF, as well as product/user attributes, i.e. classical function approximation, to predict a given user's interest in a particular product. Our method is a generalized low rank matrix completion problem, where we learn a function whose inputs are pairs of vectors -- the stand… ▽ More

    Submitted 24 November, 2006; originally announced November 2006.

    Comments: 12 pages, 2 figures

    Report number: N-24/06/MM