Search | arXiv e-print repository

The Graph Cut Kernel for Ranked Data

Authors: Michelangelo Conserva, Marc Peter Deisenroth, K S Sesh Kumar

Abstract: Many algorithms for ranked data become computationally intractable as the number of objects grows due to the complex geometric structure induced by rankings. An additional challenge is posed by partial rankings, i.e. rankings in which the preference is only known for a subset of all objects. For these reasons, state-of-the-art methods cannot scale to real-world applications, such as recommender sy… ▽ More Many algorithms for ranked data become computationally intractable as the number of objects grows due to the complex geometric structure induced by rankings. An additional challenge is posed by partial rankings, i.e. rankings in which the preference is only known for a subset of all objects. For these reasons, state-of-the-art methods cannot scale to real-world applications, such as recommender systems. We address this challenge by exploiting the geometric structure of ranked data and additional available information about the objects to derive a kernel for ranking based on the graph cut function. The graph cut kernel combines the efficiency of submodular optimization with the theoretical properties of kernel-based methods. The graph cut kernel combines the efficiency of submodular optimization with the theoretical properties of kernel-based methods. △ Less

Submitted 17 July, 2022; v1 submitted 26 May, 2021; originally announced May 2021.

Journal ref: Transactions on Machine Learning Research (2022)

arXiv:2102.07115 [pdf, other]

Sliced Multi-Marginal Optimal Transport

Authors: Samuel Cohen, Alexander Terenin, Yannik Pitcan, Brandon Amos, Marc Peter Deisenroth, K S Sesh Kumar

Abstract: Multi-marginal optimal transport enables one to compare multiple probability measures, which increasingly finds application in multi-task learning problems. One practical limitation of multi-marginal transport is computational scalability in the number of measures, samples and dimensionality. In this work, we propose a multi-marginal optimal transport paradigm based on random one-dimensional proje… ▽ More Multi-marginal optimal transport enables one to compare multiple probability measures, which increasingly finds application in multi-task learning problems. One practical limitation of multi-marginal transport is computational scalability in the number of measures, samples and dimensionality. In this work, we propose a multi-marginal optimal transport paradigm based on random one-dimensional projections, whose (generalized) distance we term the sliced multi-marginal Wasserstein distance. To construct this distance, we introduce a characterization of the one-dimensional multi-marginal Kantorovich problem and use it to highlight a number of properties of the sliced multi-marginal Wasserstein distance. In particular, we show that (i) the sliced multi-marginal Wasserstein distance is a (generalized) metric that induces the same topology as the standard Wasserstein distance, (ii) it admits a dimension-free sample complexity, (iii) it is tightly connected with the problem of barycentric averaging under the sliced-Wasserstein metric. We conclude by illustrating the sliced multi-marginal Wasserstein on multi-task density estimation and multi-dynamics reinforcement learning problems. △ Less

Submitted 23 November, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

Journal ref: NeurIPS Workshop on Optimal Transport and Machine Learning, 2021

arXiv:1905.11327 [pdf, ps, other]

Fast Decomposable Submodular Function Minimization using Constrained Total Variation

Authors: K S Sesh Kumar, Francis Bach, Thomas Pock

Abstract: We consider the problem of minimizing the sum of submodular set functions assuming minimization oracles of each summand function. Most existing approaches reformulate the problem as the convex minimization of the sum of the corresponding Lovász extensions and the squared Euclidean norm, leading to algorithms requiring total variation oracles of the summand functions; without further assumptions, t… ▽ More We consider the problem of minimizing the sum of submodular set functions assuming minimization oracles of each summand function. Most existing approaches reformulate the problem as the convex minimization of the sum of the corresponding Lovász extensions and the squared Euclidean norm, leading to algorithms requiring total variation oracles of the summand functions; without further assumptions, these more complex oracles require many calls to the simpler minimization oracles often available in practice. In this paper, we consider a modified convex problem requiring constrained version of the total variation oracles that can be solved with significantly fewer calls to the simple minimization oracles. We support our claims by showing results on graph cuts for 2D and 3D graphs △ Less

Submitted 27 May, 2019; originally announced May 2019.

arXiv:1905.04873 [pdf, ps, other]

Differentially Private Empirical Risk Minimization with Sparsity-Inducing Norms

Authors: K S Sesh Kumar, Marc Peter Deisenroth

Abstract: Differential privacy is concerned about the prediction quality while measuring the privacy impact on individuals whose information is contained in the data. We consider differentially private risk minimization problems with regularizers that induce structured sparsity. These regularizers are known to be convex but they are often non-differentiable. We analyze the standard differentially private al… ▽ More Differential privacy is concerned about the prediction quality while measuring the privacy impact on individuals whose information is contained in the data. We consider differentially private risk minimization problems with regularizers that induce structured sparsity. These regularizers are known to be convex but they are often non-differentiable. We analyze the standard differentially private algorithms, such as output perturbation, Frank-Wolfe and objective perturbation. Output perturbation is a differentially private algorithm that is known to perform well for minimizing risks that are strongly convex. Previous works have derived excess risk bounds that are independent of the dimensionality. In this paper, we assume a particular class of convex but non-smooth regularizers that induce structured sparsity and loss functions for generalized linear models. We also consider differentially private Frank-Wolfe algorithms to optimize the dual of the risk minimization problem. We derive excess risk bounds for both these algorithms. Both the bounds depend on the Gaussian width of the unit ball of the dual norm. We also show that objective perturbation of the risk minimization problems is equivalent to the output perturbation of a dual optimization problem. This is the first work that analyzes the dual optimization problems of risk minimization problems in the context of differential privacy. △ Less

Submitted 13 May, 2019; originally announced May 2019.

arXiv:1902.10675 [pdf, other]

High-dimensional Bayesian optimization using low-dimensional feature spaces

Authors: Riccardo Moriconi, Marc P. Deisenroth, K. S. Sesh Kumar

Abstract: Bayesian optimization (BO) is a powerful approach for seeking the global optimum of expensive black-box functions and has proven successful for fine tuning hyper-parameters of machine learning models. However, BO is practically limited to optimizing 10--20 parameters. To scale BO to high dimensions, we usually make structural assumptions on the decomposition of the objective and\slash or exploit t… ▽ More Bayesian optimization (BO) is a powerful approach for seeking the global optimum of expensive black-box functions and has proven successful for fine tuning hyper-parameters of machine learning models. However, BO is practically limited to optimizing 10--20 parameters. To scale BO to high dimensions, we usually make structural assumptions on the decomposition of the objective and\slash or exploit the intrinsic lower dimensionality of the problem, e.g. by using linear projections. We could achieve a higher compression rate with nonlinear projections, but learning these nonlinear embeddings typically requires much data. This contradicts the BO objective of a relatively small evaluation budget. To address this challenge, we propose to learn a low-dimensional feature space jointly with (a) the response surface and (b) a reconstruction map**. Our approach allows for optimization of BO's acquisition function in the lower-dimensional subspace, which significantly simplifies the optimization problem. We reconstruct the original parameter space from the lower-dimensional subspace for evaluating the black-box function. For meaningful exploration, we solve a constrained optimization problem. △ Less

Submitted 25 September, 2020; v1 submitted 27 February, 2019; originally announced February 2019.

arXiv:1506.02852 [pdf, ps, other]

Active-set Methods for Submodular Minimization Problems

Authors: K. S. Sesh Kumar, Francis Bach

Abstract: We consider the submodular function minimization (SFM) and the quadratic minimization problemsregularized by the Lov'asz extension of the submodular function. These optimization problemsare intimately related; for example,min-cut problems and total variation denoising problems, wherethe cut function is submodular and its Lov'asz extension is given by the associated total variation.When a quadratic… ▽ More We consider the submodular function minimization (SFM) and the quadratic minimization problemsregularized by the Lov'asz extension of the submodular function. These optimization problemsare intimately related; for example,min-cut problems and total variation denoising problems, wherethe cut function is submodular and its Lov'asz extension is given by the associated total variation.When a quadratic loss is regularized by the total variation of a cut function, it thus becomes atotal variation denoising problem and we use the same terminology in this paper for "general" submodularfunctions. We propose a new active-set algorithm for total variation denoising with theassumption of an oracle that solves the corresponding SFM problem. This can be seen as localdescent algorithm over ordered partitions with explicit convergence guarantees. It is more flexiblethan the existing algorithms with the ability for warm-restarts using the solution of a closely relatedproblem. Further, we also consider the case when a submodular function can be decomposed intothe sum of two submodular functions F1 and F2 and assume SFM oracles for these two functions.We propose a new active-set algorithm for total variation denoising (and hence SFM by thresholdingthe solution at zero). This algorithm also performs local descent over ordered partitions and itsability to warm start considerably improves the performance of the algorithm. In the experiments,we compare the performance of the proposed algorithms with state-of-the-art algorithms, showingthat it reduces the calls to SFM oracles. △ Less

Submitted 6 February, 2018; v1 submitted 9 June, 2015; originally announced June 2015.

Journal ref: Journal of Machine Learning Research, Journal of Machine Learning Research, 2017, pp.1-31

arXiv:1503.01563 [pdf, other]

Convex Optimization for Parallel Energy Minimization

Authors: K. S. Sesh Kumar, Alvaro Barbero, Stefanie Jegelka, Suvrit Sra, Francis Bach

Abstract: Energy minimization has been an intensely studied core problem in computer vision. With growing image sizes (2D and 3D), it is now highly desirable to run energy minimization algorithms in parallel. But many existing algorithms, in particular, some efficient combinatorial algorithms, are difficult to par-allelize. By exploiting results from convex and submodular theory, we reformulate the quadrati… ▽ More Energy minimization has been an intensely studied core problem in computer vision. With growing image sizes (2D and 3D), it is now highly desirable to run energy minimization algorithms in parallel. But many existing algorithms, in particular, some efficient combinatorial algorithms, are difficult to par-allelize. By exploiting results from convex and submodular theory, we reformulate the quadratic energy minimization problem as a total variation denoising problem, which, when viewed geometrically, enables the use of projection and reflection based convex methods. The resulting min-cut algorithm (and code) is conceptually very simple, and solves a sequence of TV denoising problems. We perform an extensive empirical evaluation comparing state-of-the-art combinatorial algorithms and convex optimization techniques. On small problems the iterative convex methods match the combinatorial max-flow algorithms, while on larger problems they offer other flexibility and important gains: (a) their memory footprint is small; (b) their straightforward parallelizability fits multi-core platforms; (c) they can easily be warm-started; and (d) they quickly reach approximately good solutions, thereby enabling faster "inexact" solutions. A key consequence of our approach based on submodularity and convexity is that it is allows to combine any arbitrary combinatorial or convex methods as subroutines, which allows one to obtain hybrid combinatorial and convex optimization algorithms that benefit from the strengths of both. △ Less

Submitted 5 March, 2015; originally announced March 2015.

arXiv:1309.2593 [pdf, ps, other]

Maximizing submodular functions using probabilistic graphical models

Authors: K. S. Sesh Kumar, Francis Bach

Abstract: We consider the problem of maximizing submodular functions; while this problem is known to be NP-hard, several numerically efficient local search techniques with approximation guarantees are available. In this paper, we propose a novel convex relaxation which is based on the relationship between submodular functions, entropies and probabilistic graphical models. In a graphical model, the entropy o… ▽ More We consider the problem of maximizing submodular functions; while this problem is known to be NP-hard, several numerically efficient local search techniques with approximation guarantees are available. In this paper, we propose a novel convex relaxation which is based on the relationship between submodular functions, entropies and probabilistic graphical models. In a graphical model, the entropy of the joint distribution decomposes as a sum of marginal entropies of subsets of variables; moreover, for any distribution, the entropy of the closest distribution factorizing in the graphical model provides an bound on the entropy. For directed graphical models, this last property turns out to be a direct consequence of the submodularity of the entropy function, and allows the generalization of graphical-model-based upper bounds to any submodular functions. These upper bounds may then be jointly maximized with respect to a set, while minimized with respect to the graph, leading to a convex variational inference scheme for maximizing submodular functions, based on outer approximations of the marginal polytope and maximum likelihood bounded treewidth structures. By considering graphs of increasing treewidths, we may then explore the trade-off between computational complexity and tightness of the relaxation. We also present extensions to constrained problems and maximizing the difference of submodular functions, which include all possible set functions. △ Less

Submitted 10 September, 2013; originally announced September 2013.

arXiv:1212.2573 [pdf, ps, other]

Convex Relaxations for Learning Bounded Treewidth Decomposable Graphs

Authors: K. S. Sesh Kumar, Francis Bach

Abstract: We consider the problem of learning the structure of undirected graphical models with bounded treewidth, within the maximum likelihood framework. This is an NP-hard problem and most approaches consider local search techniques. In this paper, we pose it as a combinatorial optimization problem, which is then relaxed to a convex optimization problem that involves searching over the forest and hyperfo… ▽ More We consider the problem of learning the structure of undirected graphical models with bounded treewidth, within the maximum likelihood framework. This is an NP-hard problem and most approaches consider local search techniques. In this paper, we pose it as a combinatorial optimization problem, which is then relaxed to a convex optimization problem that involves searching over the forest and hyperforest polytopes with special structures, independently. A supergradient method is used to solve the dual problem, with a run-time complexity of $O(k^3 n^{k+2} \log n)$ for each iteration, where $n$ is the number of variables and $k$ is a bound on the treewidth. We compare our approach to state-of-the-art methods on synthetic datasets and classical benchmarks, showing the gains of the novel convex approach. △ Less

Submitted 11 December, 2012; originally announced December 2012.

Showing 1–9 of 9 results for author: Kumar, K S S