Search | arXiv e-print repository

doi 10.1016/j.cviu.2019.102863

An Entropic Optimal Transport Loss for Learning Deep Neural Networks under Label Noise in Remote Sensing Images

Authors: Bharath Bhushan Damodaran, Rémi Flamary, Viven Seguy, Nicolas Courty

Abstract: Deep neural networks have established as a powerful tool for large scale supervised classification tasks. The state-of-the-art performances of deep neural networks are conditioned to the availability of large number of accurately labeled samples. In practice, collecting large scale accurately labeled datasets is a challenging and tedious task in most scenarios of remote sensing image analysis, thu… ▽ More Deep neural networks have established as a powerful tool for large scale supervised classification tasks. The state-of-the-art performances of deep neural networks are conditioned to the availability of large number of accurately labeled samples. In practice, collecting large scale accurately labeled datasets is a challenging and tedious task in most scenarios of remote sensing image analysis, thus cheap surrogate procedures are employed to label the dataset. Training deep neural networks on such datasets with inaccurate labels easily overfits to the noisy training labels and degrades the performance of the classification tasks drastically. To mitigate this effect, we propose an original solution with entropic optimal transportation. It allows to learn in an end-to-end fashion deep neural networks that are, to some extent, robust to inaccurately labeled samples. We empirically demonstrate on several remote sensing datasets, where both scene and pixel-based hyperspectral images are considered for classification. Our method proves to be highly tolerant to significant amounts of label noise and achieves favorable results against state-of-the-art methods. △ Less

Submitted 2 October, 2018; originally announced October 2018.

Comments: Under Consideration at Computer Vision and Image Understanding

Journal ref: Computer Vision and Image Understanding, Volume 191, 2020, 102863, ISSN 1077-3142

arXiv:1802.07625 [pdf, other]

doi 10.1073/pnas.1712674115

Fast flow-based algorithm for creating density-equalizing map projections

Authors: Michael T. Gastner, Vivien Seguy, Pratyush More

Abstract: Cartograms are maps that rescale geographic regions (e.g., countries, districts) such that their areas are proportional to quantitative demographic data (e.g., population size, gross domestic product). Unlike conventional bar or pie charts, cartograms can represent correctly which regions share common borders, resulting in insightful visualizations that can be the basis for further spatial statist… ▽ More Cartograms are maps that rescale geographic regions (e.g., countries, districts) such that their areas are proportional to quantitative demographic data (e.g., population size, gross domestic product). Unlike conventional bar or pie charts, cartograms can represent correctly which regions share common borders, resulting in insightful visualizations that can be the basis for further spatial statistical analysis. Computer programs can assist data scientists in preparing cartograms, but develo** an algorithm that can quickly transform every coordinate on the map (including points that are not exactly on a border) while generating recognizable images has remained a challenge. Methods that translate the cartographic deformations into physics-inspired equations of motion have become popular, but solving these equations with sufficient accuracy can still take several minutes on current hardware. Here we introduce a flow-based algorithm whose equations of motion are numerically easier to solve compared with previous methods. The equations allow straightforward parallelization so that the calculation takes only a few seconds even for complex and detailed input. Despite the speedup, the proposed algorithm still keeps the advantages of previous techniques: with comparable quantitative measures of shape distortion, it accurately scales all areas, correctly fits the regions together and generates a map projection for every point. We demonstrate the use of our algorithm with applications to the 2016 US election results, the gross domestic products of Indian states and Chinese provinces, and the spatial distribution of deaths in the London borough of Kensington and Chelsea between 2011 and 2014. △ Less

Submitted 21 February, 2018; originally announced February 2018.

Comments: 16 pages (including supplementary text), 8 figures

Journal ref: Proc. Natl. Acad. Sci. U.S.A.115(10):E2156-E2164 (2018)

arXiv:1802.05429 [pdf, ps, other]

doi 10.1186/s13634-018-0576-2

Blind Source Separation with Optimal Transport Non-negative Matrix Factorization

Authors: Antoine Rolet, Vivien Seguy, Mathieu Blondel, Hiroshi Sawada

Abstract: Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention. Building upon recent advances in computational optimal transport, we develop an optimal transport non-negative matrix factorization (NMF) algorithm for supervised speech blind source separation (BSS). Optimal transport allows us to design and leverage a cost between short-time Fourier tra… ▽ More Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention. Building upon recent advances in computational optimal transport, we develop an optimal transport non-negative matrix factorization (NMF) algorithm for supervised speech blind source separation (BSS). Optimal transport allows us to design and leverage a cost between short-time Fourier transform (STFT) spectrogram frequencies, which takes into account how humans perceive sound. We give empirical evidence that using our proposed optimal transport NMF leads to perceptually better results than Euclidean NMF, for both isolated voice reconstruction and BSS tasks. Finally, we demonstrate how to use optimal transport for cross domain sound processing tasks, where frequencies represented in the input spectrograms may be different from one spectrogram to another. △ Less

Submitted 15 February, 2018; originally announced February 2018.

Comments: 22 pages, 7 figures, 2 additional files

arXiv:1711.02283 [pdf, other]

Large-Scale Optimal Transport and Map** Estimation

Authors: Vivien Seguy, Bharath Bhushan Damodaran, Rémi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel

Abstract: This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach… ▽ More This paper presents a novel two-step approach for the fundamental problem of learning an optimal map from one distribution to another. First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions. To that end, we propose a stochastic dual approach of regularized OT, and show empirically that it scales better than a recent related approach when the amount of samples is very large. Second, we estimate a \textit{Monge map} as a deep neural network learned by approximating the barycentric projection of the previously-obtained OT plan. This parameterization allows generalization of the map** outside the support of the input measure. We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures. We showcase our proposed approach on two applications: domain adaptation and generative modeling. △ Less

Submitted 25 February, 2018; v1 submitted 6 November, 2017; originally announced November 2017.

Comments: 15 pages, 4 figures. To appear in the Proceedings of the International Conference on Learning Representations (ICLR) 2018

arXiv:1710.06276 [pdf, other]

Smooth and Sparse Optimal Transport

Authors: Mathieu Blondel, Vivien Seguy, Antoine Rolet

Abstract: Entropic regularization is quickly emerging as a new standard in optimal transport (OT). It enables to cast the OT computation as a differentiable and unconstrained convex optimization problem, which can be efficiently solved using the Sinkhorn algorithm. However, entropy keeps the transportation plan strictly positive and therefore completely dense, unlike unregularized OT. This lack of sparsity… ▽ More Entropic regularization is quickly emerging as a new standard in optimal transport (OT). It enables to cast the OT computation as a differentiable and unconstrained convex optimization problem, which can be efficiently solved using the Sinkhorn algorithm. However, entropy keeps the transportation plan strictly positive and therefore completely dense, unlike unregularized OT. This lack of sparsity can be problematic in applications where the transportation plan itself is of interest. In this paper, we explore regularizing the primal and dual OT formulations with a strongly convex term, which corresponds to relaxing the dual and primal constraints with smooth approximations. We show how to incorporate squared $2$-norm and group lasso regularizations within that framework, leading to sparse and group-sparse transportation plans. On the theoretical side, we bound the approximation error introduced by regularizing the primal and dual formulations. Our results suggest that, for the regularized primal, the approximation error can often be smaller with squared $2$-norm than with entropic regularization. We showcase our proposed framework on the task of color transfer. △ Less

Submitted 20 February, 2018; v1 submitted 17 October, 2017; originally announced October 2017.

Comments: Accepted to AISTATS 2018

arXiv:1708.08143 [pdf, other]

Log-PCA versus Geodesic PCA of histograms in the Wasserstein space

Authors: Elsa Cazelles, Vivien Seguy, Jérémie Bigot, Marco Cuturi, Nicolas Papadakis

Abstract: This paper is concerned by the statistical analysis of data sets whose elements are random histograms. For the purpose of learning principal modes of variation from such data, we consider the issue of computing the PCA of histograms with respect to the 2-Wasserstein distance between probability measures. To this end, we propose to compare the methods of log-PCA and geodesic PCA in the Wasserstein… ▽ More This paper is concerned by the statistical analysis of data sets whose elements are random histograms. For the purpose of learning principal modes of variation from such data, we consider the issue of computing the PCA of histograms with respect to the 2-Wasserstein distance between probability measures. To this end, we propose to compare the methods of log-PCA and geodesic PCA in the Wasserstein space as introduced by Bigot et al. (2015) and Seguy and Cuturi (2015). Geodesic PCA involves solving a non-convex optimization problem. To solve it approximately, we propose a novel forward-backward algorithm. This allows a detailed comparison between log-PCA and geodesic PCA of one-dimensional histograms, which we carry out using various data sets, and stress the benefits and drawbacks of each method. We extend these results for two-dimensional data and compare both methods in that setting. △ Less

Submitted 27 August, 2017; originally announced August 2017.

Comments: 32 pages, 12 figures

arXiv:1506.07944 [pdf, other]

Principal Geodesic Analysis for Probability Measures under the Optimal Transport Metric

Authors: Vivien Seguy, Marco Cuturi

Abstract: Given a family of probability measures in P(X), the space of probability measures on a Hilbert space X, our goal in this paper is to highlight one ore more curves in P(X) that summarize efficiently that family. We propose to study this problem under the optimal transport (Wasserstein) geometry, using curves that are restricted to be geodesic segments under that metric. We show that concepts that p… ▽ More Given a family of probability measures in P(X), the space of probability measures on a Hilbert space X, our goal in this paper is to highlight one ore more curves in P(X) that summarize efficiently that family. We propose to study this problem under the optimal transport (Wasserstein) geometry, using curves that are restricted to be geodesic segments under that metric. We show that concepts that play a key role in Euclidean PCA, such as data centering or orthogonality of principal directions, find a natural equivalent in the optimal transport geometry, using Wasserstein means and differential geometry. The implementation of these ideas is, however, computationally challenging. To achieve scalable algorithms that can handle thousands of measures, we propose to use a relaxed definition for geodesics and regularized optimal transport distances. The interest of our approach is demonstrated on images seen either as shapes or color histograms. △ Less

Submitted 22 November, 2015; v1 submitted 25 June, 2015; originally announced June 2015.

Comments: 9 pages, 8 figures. To appear in Advances in Neural Information Processing Systems (NIPS) 2015

Showing 1–7 of 7 results for author: Seguy, V