Search | arXiv e-print repository

Match-And-Deform: Time Series Domain Adaptation through Optimal Transport and Temporal Alignment

Authors: François Painblanc, Laetitia Chapel, Nicolas Courty, Chloé Friguet, Charlotte Pelletier, Romain Tavenard

Abstract: While large volumes of unlabeled data are usually available, associated labels are often scarce. The unsupervised domain adaptation problem aims at exploiting labels from a source domain to classify data from a related, yet different, target domain. When time series are at stake, new difficulties arise as temporal shifts may appear in addition to the standard feature distribution shift. In this pa… ▽ More While large volumes of unlabeled data are usually available, associated labels are often scarce. The unsupervised domain adaptation problem aims at exploiting labels from a source domain to classify data from a related, yet different, target domain. When time series are at stake, new difficulties arise as temporal shifts may appear in addition to the standard feature distribution shift. In this paper, we introduce the Match-And-Deform (MAD) approach that aims at finding correspondences between the source and target time series while allowing temporal distortions. The associated optimization problem simultaneously aligns the series thanks to an optimal transport loss and the time stamps through dynamic time war**. When embedded into a deep neural network, MAD helps learning new representations of time series that both align the domains and maximize the discriminative power of the network. Empirical studies on benchmark datasets and remote sensing data demonstrate that MAD makes meaningful sample-to-sample pairing and time shift estimation, reaching similar or better classification performance than state-of-the-art deep time series domain adaptation strategies. △ Less

Submitted 25 August, 2023; v1 submitted 24 August, 2023; originally announced August 2023.

arXiv:2307.01770 [pdf, other]

Fast Optimal Transport through Sliced Wasserstein Generalized Geodesics

Authors: Guillaume Mahey, Laetitia Chapel, Gilles Gasso, Clément Bonet, Nicolas Courty

Abstract: Wasserstein distance (WD) and the associated optimal transport plan have been proven useful in many applications where probability measures are at stake. In this paper, we propose a new proxy of the squared WD, coined min-SWGG, that is based on the transport map induced by an optimal one-dimensional projection of the two input distributions. We draw connections between min-SWGG and Wasserstein gen… ▽ More Wasserstein distance (WD) and the associated optimal transport plan have been proven useful in many applications where probability measures are at stake. In this paper, we propose a new proxy of the squared WD, coined min-SWGG, that is based on the transport map induced by an optimal one-dimensional projection of the two input distributions. We draw connections between min-SWGG and Wasserstein generalized geodesics in which the pivot measure is supported on a line. We notably provide a new closed form for the exact Wasserstein distance in the particular case of one of the distributions supported on a line allowing us to derive a fast computational scheme that is amenable to gradient descent optimization. We show that min-SWGG is an upper bound of WD and that it has a complexity similar to as Sliced-Wasserstein, with the additional feature of providing an associated transport plan. We also investigate some theoretical properties such as metricity, weak convergence, computational and topological properties. Empirical evidences support the benefits of min-SWGG in various contexts, from gradient flows, shape matching and image colorization, among others. △ Less

Submitted 30 October, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: Main: 10 pages,4 Figures Tables Supplementary: 19 pages, 13 Figures ,1 Table. Sumbitted to Neurips 2023

MSC Class: 62; 65 ACM Class: G.3

arXiv:2211.10066 [pdf, other]

Hyperbolic Sliced-Wasserstein via Geodesic and Horospherical Projections

Authors: Clément Bonet, Laetitia Chapel, Lucas Drumetz, Nicolas Courty

Abstract: It has been shown beneficial for many types of data which present an underlying hierarchical structure to be embedded in hyperbolic spaces. Consequently, many tools of machine learning were extended to such spaces, but only few discrepancies to compare probability distributions defined over those spaces exist. Among the possible candidates, optimal transport distances are well defined on such Riem… ▽ More It has been shown beneficial for many types of data which present an underlying hierarchical structure to be embedded in hyperbolic spaces. Consequently, many tools of machine learning were extended to such spaces, but only few discrepancies to compare probability distributions defined over those spaces exist. Among the possible candidates, optimal transport distances are well defined on such Riemannian manifolds and enjoy strong theoretical properties, but suffer from high computational cost. On Euclidean spaces, sliced-Wasserstein distances, which leverage a closed-form of the Wasserstein distance in one dimension, are more computationally efficient, but are not readily available on hyperbolic spaces. In this work, we propose to derive novel hyperbolic sliced-Wasserstein discrepancies. These constructions use projections on the underlying geodesics either along horospheres or geodesics. We study and compare them on different tasks where hyperbolic representations are relevant, such as sampling or image classification. △ Less

Submitted 26 June, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

Comments: Accepted at the TAG-ML 2023 ICML Workshop

arXiv:2106.04145 [pdf, other]

Unbalanced Optimal Transport through Non-negative Penalized Linear Regression

Authors: Laetitia Chapel, Rémi Flamary, Haoran Wu, Cédric Févotte, Gilles Gasso

Abstract: This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to… ▽ More This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to propose novel algorithms inspired from inverse problems and nonnegative matrix factorization. In particular, we consider majorization-minimization which leads in our setting to efficient multiplicative updates for a variety of penalties. Furthermore, we derive for the first time an efficient algorithm to compute the regularization path of UOT with quadratic penalties. The proposed algorithm provides a continuity of piece-wise linear OT plans converging to the solution of balanced OT (corresponding to infinite penalty weights). We perform several numerical experiments on simulated and real data illustrating the new algorithms, and provide a detailed discussion about more sophisticated optimization tools that can further be used to solve OT problems thanks to our reformulation. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: Laetitia Chapel and Rémi Flamary have equal contribution

arXiv:2002.08276 [pdf, other]

Partial Optimal Transport with Applications on Positive-Unlabeled Learning

Authors: Laetitia Chapel, Mokhtar Z. Alaya, Gilles Gasso

Abstract: Classical optimal transport problem seeks a transportation map that preserves the total mass betwenn two probability distributions, requiring their mass to be the same. This may be too restrictive in certain applications such as color or shape matching, since the distributions may have arbitrary masses and/or that only a fraction of the total mass has to be transported. Several algorithms have bee… ▽ More Classical optimal transport problem seeks a transportation map that preserves the total mass betwenn two probability distributions, requiring their mass to be the same. This may be too restrictive in certain applications such as color or shape matching, since the distributions may have arbitrary masses and/or that only a fraction of the total mass has to be transported. Several algorithms have been devised for computing partial Wasserstein metrics that rely on an entropic regularization, but when it comes with exact solutions, almost no partial formulation of neither Wasserstein nor Gromov-Wasserstein are available yet. This precludes from working with distributions that do not lie in the same metric space or when invariance to rotation or translation is needed. In this paper, we address the partial Wasserstein and Gromov-Wasserstein problems and propose exact algorithms to solve them. We showcase the new formulation in a positive-unlabeled (PU) learning application. To the best of our knowledge, this is the first application of optimal transport in this context and we first highlight that partial Wasserstein-based metrics prove effective in usual PU learning settings. We then demonstrate that partial Gromov-Wasserstein metrics is efficient in scenario where point clouds come from different domains or have different features. △ Less

Submitted 12 June, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

arXiv:2002.03848 [pdf, other]

Time Series Alignment with Global Invariances

Authors: Titouan Vayer, Romain Tavenard, Laetitia Chapel, Nicolas Courty, Rémi Flamary, Yann Soullard

Abstract: Multivariate time series are ubiquitous objects in signal processing. Measuring a distance or similarity between two such objects is of prime interest in a variety of applications, including machine learning, but can be very difficult as soon as the temporal dynamics and the representation of the time series, {\em i.e.} the nature of the observed quantities, differ from one another. In this work,… ▽ More Multivariate time series are ubiquitous objects in signal processing. Measuring a distance or similarity between two such objects is of prime interest in a variety of applications, including machine learning, but can be very difficult as soon as the temporal dynamics and the representation of the time series, {\em i.e.} the nature of the observed quantities, differ from one another. In this work, we propose a novel distance accounting both feature space and temporal variabilities by learning a latent global transformation of the feature space together with a temporal alignment, cast as a joint optimization problem. The versatility of our framework allows for several variants depending on the invariance class at stake. Among other contributions, we define a differentiable loss for time series and present two algorithms for the computation of time series barycenters under this new geometry. We illustrate the interest of our approach on both simulated and real world data and show the robustness of our approach compared to state-of-the-art methods. △ Less

Submitted 1 November, 2022; v1 submitted 10 February, 2020; originally announced February 2020.

Comments: Published in Transactions on Machine Learning (Oct 2022)

arXiv:1905.10124 [pdf, other]

Sliced Gromov-Wasserstein

Authors: Titouan Vayer, Rémi Flamary, Romain Tavenard, Laetitia Chapel, Nicolas Courty

Abstract: Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions whose supports do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) en… ▽ More Recently used in various machine learning contexts, the Gromov-Wasserstein distance (GW) allows for comparing distributions whose supports do not necessarily lie in the same metric space. However, this Optimal Transport (OT) distance requires solving a complex non convex quadratic program which is most of the time very costly both in time and memory. Contrary to GW, the Wasserstein distance (W) enjoys several properties (e.g. duality) that permit large scale optimization. Among those, the solution of W on the real line, that only requires sorting discrete samples in 1D, allows defining the Sliced Wasserstein (SW) distance. This paper proposes a new divergence based on GW akin to SW. We first derive a closed form for GW when dealing with 1D distributions, based on a new result for the related quadratic assignment problem. We then define a novel OT discrepancy that can deal with large scale distributions via a slicing approach and we show how it relates to the GW distance while being $O(n\log(n))$ to compute. We illustrate the behavior of this so called Sliced Gromov-Wasserstein (SGW) discrepancy in experiments where we demonstrate its ability to tackle similar problems as GW while being several order of magnitudes faster to compute. △ Less

Submitted 20 October, 2022; v1 submitted 24 May, 2019; originally announced May 2019.

arXiv:1811.02834 [pdf, other]

Fused Gromov-Wasserstein distance for structured objects: theoretical foundations and mathematical properties

Authors: Titouan Vayer, Laetita Chapel, Rémi Flamary, Romain Tavenard, Nicolas Courty

Abstract: Optimal transport theory has recently found many applications in machine learning thanks to its capacity for comparing various machine learning objects considered as distributions. The Kantorovitch formulation, leading to the Wasserstein distance, focuses on the features of the elements of the objects but treat them independently, whereas the Gromov-Wasserstein distance focuses only on the relatio… ▽ More Optimal transport theory has recently found many applications in machine learning thanks to its capacity for comparing various machine learning objects considered as distributions. The Kantorovitch formulation, leading to the Wasserstein distance, focuses on the features of the elements of the objects but treat them independently, whereas the Gromov-Wasserstein distance focuses only on the relations between the elements, depicting the structure of the object, yet discarding its features. In this paper we propose to extend these distances in order to encode simultaneously both the feature and structure informations, resulting in the Fused Gromov-Wasserstein distance. We develop the mathematical framework for this novel distance, prove its metric and interpolation properties and provide a concentration result for the convergence of finite samples. We also illustrate and interpret its use in various contexts where structured objects are involved. △ Less

Submitted 7 November, 2018; originally announced November 2018.

arXiv:1805.09114 [pdf, other]

Optimal Transport for structured data with application on graphs

Authors: Titouan Vayer, Laetitia Chapel, Rémi Flamary, Romain Tavenard, Nicolas Courty

Abstract: This work considers the problem of computing distances between structured objects such as undirected graphs, seen as probability distributions in a specific metric space. We consider a new transportation distance (i.e. that minimizes a total cost of transporting probability masses) that unveils the geometric nature of the structured objects space. Unlike Wasserstein or Gromov-Wasserstein metrics t… ▽ More This work considers the problem of computing distances between structured objects such as undirected graphs, seen as probability distributions in a specific metric space. We consider a new transportation distance (i.e. that minimizes a total cost of transporting probability masses) that unveils the geometric nature of the structured objects space. Unlike Wasserstein or Gromov-Wasserstein metrics that focus solely and respectively on features (by considering a metric in the feature space) or structure (by seeing structure as a metric space), our new distance exploits jointly both information, and is consequently called Fused Gromov-Wasserstein (FGW). After discussing its properties and computational aspects, we show results on a graph classification task, where our method outperforms both graph kernels and deep graph convolutional networks. Exploiting further on the metric properties of FGW, interesting geometric objects such as Fréchet means or barycenters of graphs are illustrated and discussed in a clustering context. △ Less

Submitted 13 May, 2019; v1 submitted 23 May, 2018; originally announced May 2018.

arXiv:1607.02654 [pdf, other]

Combining multiple resolutions into hierarchical representations for kernel-based image classification

Authors: Yanwei Cui, Sébastien Lefevre, Laetitia Chapel, Anne Puissant

Abstract: Geographic object-based image analysis (GEOBIA) framework has gained increasing interest recently. Following this popular paradigm, we propose a novel multiscale classification approach operating on a hierarchical image representation built from two images at different resolutions. They capture the same scene with different sensors and are naturally fused together through the hierarchical represen… ▽ More Geographic object-based image analysis (GEOBIA) framework has gained increasing interest recently. Following this popular paradigm, we propose a novel multiscale classification approach operating on a hierarchical image representation built from two images at different resolutions. They capture the same scene with different sensors and are naturally fused together through the hierarchical representation, where coarser levels are built from a Low Spatial Resolution (LSR) or Medium Spatial Resolution (MSR) image while finer levels are generated from a High Spatial Resolution (HSR) or Very High Spatial Resolution (VHSR) image. Such a representation allows one to benefit from the context information thanks to the coarser levels, and subregions spatial arrangement information thanks to the finer levels. Two dedicated structured kernels are then used to perform machine learning directly on the constructed hierarchical representation. This strategy overcomes the limits of conventional GEOBIA classification procedures that can handle only one or very few pre-selected scales. Experiments run on an urban classification task show that the proposed approach can highly improve the classification accuracy w.r.t. conventional approaches working on a single scale. △ Less

Submitted 12 July, 2016; v1 submitted 9 July, 2016; originally announced July 2016.

Comments: International Conference on Geographic Object-Based Image Analysis (GEOBIA 2016), University of Twente in Enschede, The Netherlands

arXiv:1606.04985 [pdf, other]

Combining multiscale features for classification of hyperspectral images: a sequence based kernel approach

Authors: Yanwei Cui, Laetitia Chapel, Sébastien Lefèvre

Abstract: Nowadays, hyperspectral image classification widely copes with spatial information to improve accuracy. One of the most popular way to integrate such information is to extract hierarchical features from a multiscale segmentation. In the classification context, the extracted features are commonly concatenated into a long vector (also called stacked vector), on which is applied a conventional vector… ▽ More Nowadays, hyperspectral image classification widely copes with spatial information to improve accuracy. One of the most popular way to integrate such information is to extract hierarchical features from a multiscale segmentation. In the classification context, the extracted features are commonly concatenated into a long vector (also called stacked vector), on which is applied a conventional vector-based machine learning technique (e.g. SVM with Gaussian kernel). In this paper, we rather propose to use a sequence structured kernel: the spectrum kernel. We show that the conventional stacked vector-based kernel is actually a special case of this kernel. Experiments conducted on various publicly available hyperspectral datasets illustrate the improvement of the proposed kernel w.r.t. conventional ones using the same hierarchical spatial features. △ Less

Submitted 15 June, 2016; originally announced June 2016.

Comments: 8th IEEE GRSS Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS 2016), UCLA in Los Angeles, California, U.S

arXiv:1604.01787 [pdf, other]

doi 10.1007/978-3-319-18224-7_4

A Subpath Kernel for Learning Hierarchical Image Representations

Authors: Yanwei Cui, Laetitia Chapel, Sébastien Lefèvre

Abstract: Tree kernels have demonstrated their ability to deal with hierarchical data, as the intrinsic tree structure often plays a discriminative role. While such kernels have been successfully applied to various domains such as nature language processing and bioinformatics, they mostly concentrate on ordered trees and whose nodes are described by symbolic data. Meanwhile, hierarchical representations hav… ▽ More Tree kernels have demonstrated their ability to deal with hierarchical data, as the intrinsic tree structure often plays a discriminative role. While such kernels have been successfully applied to various domains such as nature language processing and bioinformatics, they mostly concentrate on ordered trees and whose nodes are described by symbolic data. Meanwhile, hierarchical representations have gained increasing interest to describe image content. This is particularly true in remote sensing, where such representations allow for revealing different objects of interest at various scales through a tree structure. However, the induced trees are unordered and the nodes are equipped with numerical features. In this paper, we propose a new structured kernel for hierarchical image representations which is built on the concept of subpath kernel. Experimental results on both artificial and remote sensing datasets show that the proposed kernel manages to deal with the hierarchical nature of the data, leading to better classification rates. △ Less

Submitted 6 April, 2016; originally announced April 2016.

Comments: 10th IAPR-TC-15 International Workshop, GbRPR 2015, Bei**g, China, May 13-15, 2015. Proceedings

arXiv:1601.01799 [pdf, other]

Dense Bag-of-Temporal-SIFT-Words for Time Series Classification

Authors: Adeline Bailly, Simon Malinowski, Romain Tavenard, Thomas Guyet, Laetitia Chapel

Abstract: Time series classification is an application of particular interest with the increase of data to monitor. Classical techniques for time series classification rely on point-to-point distances. Recently, Bag-of-Words approaches have been used in this context. Words are quantized versions of simple features extracted from sliding windows. The SIFT framework has proved efficient for image classificati… ▽ More Time series classification is an application of particular interest with the increase of data to monitor. Classical techniques for time series classification rely on point-to-point distances. Recently, Bag-of-Words approaches have been used in this context. Words are quantized versions of simple features extracted from sliding windows. The SIFT framework has proved efficient for image classification. In this paper, we design a time series classification scheme that builds on the SIFT framework adapted to time series to feed a Bag-of-Words. We then refine our method by studying the impact of normalized Bag-of-Words, as well as densely extract point descriptors. Proposed adjustements achieve better performance. The evaluation shows that our method outperforms classical techniques in terms of classification. △ Less

Submitted 13 January, 2016; v1 submitted 8 January, 2016; originally announced January 2016.

Showing 1–13 of 13 results for author: Chapel, L