Search | arXiv e-print repository

Sketch In, Sketch Out: Accelerating both Learning and Inference for Structured Prediction with Kernels

Authors: Tamim El Ahmad, Luc Brogat-Motte, Pierre Laforgue, Florence d'Alché-Buc

Abstract: Leveraging the kernel trick in both the input and output spaces, surrogate kernel methods are a flexible and theoretically grounded solution to structured output prediction. If they provide state-of-the-art performance on complex data sets of moderate size (e.g., in chemoinformatics), these approaches however fail to scale. We propose to equip surrogate kernel methods with sketching-based approxim… ▽ More Leveraging the kernel trick in both the input and output spaces, surrogate kernel methods are a flexible and theoretically grounded solution to structured output prediction. If they provide state-of-the-art performance on complex data sets of moderate size (e.g., in chemoinformatics), these approaches however fail to scale. We propose to equip surrogate kernel methods with sketching-based approximations, applied to both the input and output feature maps. We prove excess risk bounds on the original structured prediction problem, showing how to attain close-to-optimal rates with a reduced sketch size that depends on the eigendecay of the input/output covariance operators. From a computational perspective, we show that the two approximations have distinct but complementary impacts: sketching the input kernel mostly reduces training time, while sketching the output kernel decreases the inference time. Empirically, our approach is shown to scale, achieving state-of-the-art performance on benchmark data sets where non-sketched methods are intractable. △ Less

Submitted 6 May, 2024; v1 submitted 20 February, 2023; originally announced February 2023.

Journal ref: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR 238:109-117, 2024

arXiv:2211.08958 [pdf, other]

Vector-Valued Least-Squares Regression under Output Regularity Assumptions

Authors: Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d'Alché-Buc

Abstract: We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output. We derive learning bounds for our method, and study under which setting statistical performance is improved in comparison to full-rank method. Our analysis extends the interest of reduced-rank regression beyond the standard low-rank setting to more general output regularity… ▽ More We propose and analyse a reduced-rank method for solving least-squares regression problems with infinite dimensional output. We derive learning bounds for our method, and study under which setting statistical performance is improved in comparison to full-rank method. Our analysis extends the interest of reduced-rank regression beyond the standard low-rank setting to more general output regularity assumptions. We illustrate our theoretical insights on synthetic least-squares problems. Then, we propose a surrogate structured prediction method derived from this reduced-rank method. We assess its benefits on three different problems: image reconstruction, multi-label classification, and metabolite identification. △ Less

Submitted 16 November, 2022; originally announced November 2022.

arXiv:2202.03813 [pdf, other]

Learning to Predict Graphs with Fused Gromov-Wasserstein Barycenters

Authors: Luc Brogat-Motte, Rémi Flamary, Céline Brouard, Juho Rousu, Florence d'Alché-Buc

Abstract: This paper introduces a novel and generic framework to solve the flagship task of supervised labeled graph prediction by leveraging Optimal Transport tools. We formulate the problem as regression with the Fused Gromov-Wasserstein (FGW) loss and propose a predictive model relying on a FGW barycenter whose weights depend on inputs. First we introduce a non-parametric estimator based on kernel ridge… ▽ More This paper introduces a novel and generic framework to solve the flagship task of supervised labeled graph prediction by leveraging Optimal Transport tools. We formulate the problem as regression with the Fused Gromov-Wasserstein (FGW) loss and propose a predictive model relying on a FGW barycenter whose weights depend on inputs. First we introduce a non-parametric estimator based on kernel ridge regression for which theoretical results such as consistency and excess risk bound are proved. Next we propose an interpretable parametric model where the barycenter weights are modeled with a neural network and the graphs on which the FGW barycenter is calculated are additionally learned. Numerical experiments show the strength of the method and its ability to interpolate in the labeled graph space on simulated data and on a difficult metabolic identification problem where it can reach very good performance with very little engineering. △ Less

Submitted 24 June, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

arXiv:2007.14703 [pdf, other]

Learning Output Embeddings in Structured Prediction

Authors: Luc Brogat-Motte, Alessandro Rudi, Céline Brouard, Juho Rousu, Florence d'Alché-Buc

Abstract: A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension by means of output kernels, and then, solving a regression problem in this output space. A prediction in the original space is computed by solving a pre-image problem. In such an approach, the embedding, linked to the target loss… ▽ More A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension by means of output kernels, and then, solving a regression problem in this output space. A prediction in the original space is computed by solving a pre-image problem. In such an approach, the embedding, linked to the target loss, is defined prior to the learning phase. In this work, we propose to jointly learn a finite approximation of the output embedding and the regression function into the new feature space. For that purpose, we leverage a priori information on the outputs and also unexploited unsupervised output data, which are both often available in structured prediction problems. We prove that the resulting structured predictor is a consistent estimator, and derive an excess risk bound. Moreover, the novel structured prediction tool enjoys a significantly smaller computational complexity than former output kernel methods. The approach empirically tested on various structured prediction problems reveals to be versatile and able to handle large datasets. △ Less

Submitted 2 November, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

arXiv:1910.04621 [pdf, other]

Duality in RKHSs with Infinite Dimensional Outputs: Application to Robust Losses

Authors: Pierre Laforgue, Alex Lambert, Luc Brogat-Motte, Florence d'Alché-Buc

Abstract: Operator-Valued Kernels (OVKs) and associated vector-valued Reproducing Kernel Hilbert Spaces provide an elegant way to extend scalar kernel methods when the output space is a Hilbert space. Although primarily used in finite dimension for problems like multi-task regression, the ability of this framework to deal with infinite dimensional output spaces unlocks many more applications, such as functi… ▽ More Operator-Valued Kernels (OVKs) and associated vector-valued Reproducing Kernel Hilbert Spaces provide an elegant way to extend scalar kernel methods when the output space is a Hilbert space. Although primarily used in finite dimension for problems like multi-task regression, the ability of this framework to deal with infinite dimensional output spaces unlocks many more applications, such as functional regression, structured output prediction, and structured data representation. However, these sophisticated schemes crucially rely on the kernel trick in the output space, so that most of previous works have focused on the square norm loss function, completely neglecting robustness issues that may arise in such surrogate problems. To overcome this limitation, this paper develops a duality approach that allows to solve OVK machines for a wide range of loss functions. The infinite dimensional Lagrange multipliers are handled through a Double Representer Theorem, and algorithms for $ε$-insensitive losses and the Huber loss are thoroughly detailed. Robustness benefits are emphasized by a theoretical stability analysis, as well as empirical improvements on structured data applications. △ Less

Submitted 21 August, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

Showing 1–5 of 5 results for author: Brogat-Motte, L