-
Shrunk subspaces via operator Sinkhorn iteration
Authors:
Cole Franks,
Tasuku Soma,
Michel X. Goemans
Abstract:
A recent breakthrough in Edmonds' problem showed that the noncommutative rank can be computed in deterministic polynomial time, and various algorithms for it were devised. However, only quite complicated algorithms are known for finding a so-called shrunk subspace, which acts as a dual certificate for the value of the noncommutative rank. In particular, the operator Sinkhorn algorithm, perhaps the…
▽ More
A recent breakthrough in Edmonds' problem showed that the noncommutative rank can be computed in deterministic polynomial time, and various algorithms for it were devised. However, only quite complicated algorithms are known for finding a so-called shrunk subspace, which acts as a dual certificate for the value of the noncommutative rank. In particular, the operator Sinkhorn algorithm, perhaps the simplest algorithm to compute the noncommutative rank with operator scaling, does not find a shrunk subspace. Finding a shrunk subspace plays a key role in applications, such as separation in the Brascamp-Lieb polytope, one-parameter subgroups in the null-cone membership problem, and primal-dual algorithms for matroid intersection and fractional matroid matching.
In this paper, we provide a simple Sinkhorn-style algorithm to find the smallest shrunk subspace over the complex field in deterministic polynomial time. To this end, we introduce a generalization of the operator scaling problem, where the spectra of the marginals must be majorized by specified vectors. Then we design an efficient Sinkhorn-style algorithm for the generalized operator scaling problem. Applying this to the shrunk subspace problem, we show that a sufficiently long run of the algorithm also finds an approximate shrunk subspace close to the minimum exact shrunk subspace. Finally, we show that the approximate shrunk subspace can be rounded if it is sufficiently close. Along the way, we also provide a simple randomized algorithm to find the smallest shrunk subspace.
As applications, we design a faster algorithm for fractional linear matroid matching and efficient weak membership and optimization algorithms for the rank-2 Brascamp-Lieb polytope.
△ Less
Submitted 17 July, 2022;
originally announced July 2022.
-
Near optimal sample complexity for matrix and tensor normal models via geodesic convexity
Authors:
Cole Franks,
Rafael Oliveira,
Akshay Ramachandran,
Michael Walter
Abstract:
The matrix normal model, the family of Gaussian matrix-variate distributions whose covariance matrix is the Kronecker product of two lower dimensional factors, is frequently used to model matrix-variate data. The tensor normal model generalizes this family to Kronecker products of three or more factors. We study the estimation of the Kronecker factors of the covariance matrix in the matrix and ten…
▽ More
The matrix normal model, the family of Gaussian matrix-variate distributions whose covariance matrix is the Kronecker product of two lower dimensional factors, is frequently used to model matrix-variate data. The tensor normal model generalizes this family to Kronecker products of three or more factors. We study the estimation of the Kronecker factors of the covariance matrix in the matrix and tensor models. We show nonasymptotic bounds for the error achieved by the maximum likelihood estimator (MLE) in several natural metrics. In contrast to existing bounds, our results do not rely on the factors being well-conditioned or sparse. For the matrix normal model, all our bounds are minimax optimal up to logarithmic factors, and for the tensor normal model our bound for the largest factor and overall covariance matrix are minimax optimal up to constant factors provided there are enough samples for any estimator to obtain constant Frobenius error. In the same regimes as our sample complexity bounds, we show that an iterative procedure to compute the MLE known as the flip-flop algorithm converges linearly with high probability. Our main tool is geodesic strong convexity in the geometry on positive-definite matrices induced by the Fisher information metric. This strong convexity is determined by the expansion of certain random quantum channels. We also provide numerical evidence that combining the flip-flop algorithm with a simple shrinkage estimator can improve performance in the undersampled regime.
△ Less
Submitted 11 November, 2021; v1 submitted 14 October, 2021;
originally announced October 2021.
-
Barriers for recent methods in geodesic optimization
Authors:
Cole Franks,
Philipp Reichenbach
Abstract:
We study a class of optimization problems including matrix scaling, matrix balancing, multidimensional array scaling, operator scaling, and tensor scaling that arise frequently in theory and in practice. Some of these problems, such as matrix and array scaling, are convex in the Euclidean sense, but others such as operator scaling and tensor scaling are geodesically convex on a different Riemannia…
▽ More
We study a class of optimization problems including matrix scaling, matrix balancing, multidimensional array scaling, operator scaling, and tensor scaling that arise frequently in theory and in practice. Some of these problems, such as matrix and array scaling, are convex in the Euclidean sense, but others such as operator scaling and tensor scaling are geodesically convex on a different Riemannian manifold. Trust region methods, which include box-constrained Newton's method, are known to produce high precision solutions very quickly for matrix scaling and matrix balancing (Cohen et. al., FOCS 2017, Allen-Zhu et. al. FOCS 2017), and result in polynomial time algorithms for some geodesically convex problems like operator scaling (Garg et. al. STOC 2018, Bürgisser et. al. FOCS 2019). One is led to ask whether these guarantees also hold for multidimensional array scaling and tensor scaling.
We show that this is not the case by exhibiting instances with exponential diameter bound: we construct polynomial-size instances of 3-dimensional array scaling and 3-tensor scaling whose approximate solutions all have doubly exponential condition number. Moreover, we study convex-geometric notions of complexity known as margin and gap, which are used to bound the running times of all existing optimization algorithms for such problems. We show that margin and gap are exponentially small for several problems including array scaling, tensor scaling and polynomial scaling. Our results suggest that it is impossible to prove polynomial running time bounds for tensor scaling based on diameter bounds alone. Therefore, our work motivates the search for analogues of more sophisticated algorithms, such as interior point methods, for geodesically convex optimization that do not rely on polynomial diameter bounds.
△ Less
Submitted 17 May, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Rigorous Guarantees for Tyler's M-estimator via quantum expansion
Authors:
Cole Franks,
Ankur Moitra
Abstract:
Estimating the shape of an elliptical distribution is a fundamental problem in statistics. One estimator for the shape matrix, Tyler's M-estimator, has been shown to have many appealing asymptotic properties. It performs well in numerical experiments and can be quickly computed in practice by a simple iterative procedure. Despite the many years the estimator has been studied in the statistics comm…
▽ More
Estimating the shape of an elliptical distribution is a fundamental problem in statistics. One estimator for the shape matrix, Tyler's M-estimator, has been shown to have many appealing asymptotic properties. It performs well in numerical experiments and can be quickly computed in practice by a simple iterative procedure. Despite the many years the estimator has been studied in the statistics community, there was neither a tight non-asymptotic bound on the rate of the estimator nor a proof that the iterative procedure converges in polynomially many steps.
Here we observe a surprising connection between Tyler's M-estimator and operator scaling, which has been intensively studied in recent years in part because of its connections to the Brascamp-Lieb inequality in analysis. We use this connection, together with novel results on quantum expanders, to show that Tyler's M-estimator has the optimal rate up to factors logarithmic in the dimension, and that in the generative model the iterative procedure has a linear convergence rate even without regularization.
△ Less
Submitted 14 September, 2021; v1 submitted 31 January, 2020;
originally announced February 2020.
-
Towards a theory of non-commutative optimization: geodesic first and second order methods for moment maps and polytopes
Authors:
Peter Bürgisser,
Cole Franks,
Ankit Garg,
Rafael Oliveira,
Michael Walter,
Avi Wigderson
Abstract:
This paper initiates a systematic development of a theory of non-commutative optimization. It aims to unify and generalize a growing body of work from the past few years which developed and analyzed algorithms for natural geodesically convex optimization problems on Riemannian manifolds that arise from the symmetries of non-commutative groups. These algorithms minimize the moment map (a non-commut…
▽ More
This paper initiates a systematic development of a theory of non-commutative optimization. It aims to unify and generalize a growing body of work from the past few years which developed and analyzed algorithms for natural geodesically convex optimization problems on Riemannian manifolds that arise from the symmetries of non-commutative groups. These algorithms minimize the moment map (a non-commutative notion of the usual gradient) and test membership in null cones and moment polytopes (a vast class of polytopes, typically of exponential vertex and facet complexity, which arise from this a priori non-convex, non-linear setting). This setting captures a diverse set of problems in different areas of computer science, mathematics, and physics. Several of them were solved efficiently for the first time using non-commutative methods; the corresponding algorithms also lead to solutions of purely structural problems and to many new connections between disparate fields.
In the spirit of standard convex optimization, we develop two general methods in the geodesic setting, a first order and a second order method, which respectively receive first and second order information on the "derivatives" of the function to be optimized. These in particular subsume all past results. The main technical work goes into identifying the key parameters of the underlying group actions which control convergence to the optimum in each of these methods. These non-commutative analogues of "smoothness" are far more complex and require significant algebraic and analytic machinery. Despite this complexity, the way in which these parameters control convergence in both methods is quite simple and elegant. We show how to bound these parameters and hence obtain efficient algorithms for null cone membership in several concrete situations. Our work points to intriguing open problems and suggests further research directions.
△ Less
Submitted 26 July, 2021; v1 submitted 27 October, 2019;
originally announced October 2019.
-
A simplified disproof of Beck's three permutations conjecture and an application to root-mean-squared discrepancy
Authors:
Cole Franks
Abstract:
A $k$-permutation family on $n$ vertices is a set system consisting of the intervals of $k$ permutations of the integers $1$ through $n$. The discrepancy of a set system is the minimum over all red-blue vertex colorings of the maximum difference between the number of red and blue vertices in any set in the system. In 2011, Newman and Nikolov disproved a conjecture of Beck that the discrepancy of a…
▽ More
A $k$-permutation family on $n$ vertices is a set system consisting of the intervals of $k$ permutations of the integers $1$ through $n$. The discrepancy of a set system is the minimum over all red-blue vertex colorings of the maximum difference between the number of red and blue vertices in any set in the system. In 2011, Newman and Nikolov disproved a conjecture of Beck that the discrepancy of any $3$-permutation family is at most a constant independent of $n$. Here we give a simpler proof that Newman and Nikolov's sequence of $3$-permutation families has discrepancy $Ω(\log n)$. We also exhibit a sequence of $6$-permutation families with root-mean-squared discrepancy $Ω(\sqrt{\log n})$; that is, in any red-blue vertex coloring, the square root of the expected difference between the number of red and blue vertices in an interval of the system is $Ω(\sqrt{\log n})$.
△ Less
Submitted 27 November, 2018; v1 submitted 2 November, 2018;
originally announced November 2018.
-
Efficient algorithms for tensor scaling, quantum marginals and moment polytopes
Authors:
Peter Bürgisser,
Cole Franks,
Ankit Garg,
Rafael Oliveira,
Michael Walter,
Avi Wigderson
Abstract:
We present a polynomial time algorithm to approximately scale tensors of any format to arbitrary prescribed marginals (whenever possible). This unifies and generalizes a sequence of past works on matrix, operator and tensor scaling. Our algorithm provides an efficient weak membership oracle for the associated moment polytopes, an important family of implicitly-defined convex polytopes with exponen…
▽ More
We present a polynomial time algorithm to approximately scale tensors of any format to arbitrary prescribed marginals (whenever possible). This unifies and generalizes a sequence of past works on matrix, operator and tensor scaling. Our algorithm provides an efficient weak membership oracle for the associated moment polytopes, an important family of implicitly-defined convex polytopes with exponentially many facets and a wide range of applications. These include the entanglement polytopes from quantum information theory (in particular, we obtain an efficient solution to the notorious one-body quantum marginal problem) and the Kronecker polytopes from representation theory (which capture the asymptotic support of Kronecker coefficients). Our algorithm can be applied to succinct descriptions of the input tensor whenever the marginals can be efficiently computed, as in the important case of matrix product states or tensor-train decompositions, widely used in computational physics and numerical mathematics.
We strengthen and generalize the alternating minimization approach of previous papers by introducing the theory of highest weight vectors from representation theory into the numerical optimization framework. We show that highest weight vectors are natural potential functions for scaling algorithms and prove new bounds on their evaluations to obtain polynomial-time convergence. Our techniques are general and we believe that they will be instrumental to obtain efficient algorithms for moment polytopes beyond the ones consider here, and more broadly, for other optimization problems possessing natural symmetries.
△ Less
Submitted 15 April, 2018; v1 submitted 12 April, 2018;
originally announced April 2018.
-
Operator scaling with specified marginals
Authors:
Cole Franks
Abstract:
The completely positive maps, a generalization of the nonnegative matrices, are a well-studied class of maps from $n\times n$ matrices to $m\times m$ matrices. The existence of the operator analogues of doubly stochastic scalings of matrices is equivalent to a multitude of problems in computer science and mathematics, such rational identity testing in non-commuting variables, noncommutative rank o…
▽ More
The completely positive maps, a generalization of the nonnegative matrices, are a well-studied class of maps from $n\times n$ matrices to $m\times m$ matrices. The existence of the operator analogues of doubly stochastic scalings of matrices is equivalent to a multitude of problems in computer science and mathematics, such rational identity testing in non-commuting variables, noncommutative rank of symbolic matrices, and a basic problem in invariant theory (Garg, Gurvits, Oliveira and Wigderson, FOCS, 2016).
We study operator scaling with specified marginals, which is the operator analogue of scaling matrices to specified row and column sums. We characterize the operators which can be scaled to given marginals, much in the spirit of the Gurvits' algorithmic characterization of the operators that can be scaled to doubly stochastic (Gurvits, Journal of Computer and System Sciences, 2004). Our algorithm produces approximate scalings in time poly(n,m) whenever scalings exist. A central ingredient in our analysis is a reduction from the specified marginals setting to the doubly stochastic setting.
Operator scaling with specified marginals arises in diverse areas of study such as the Brascamp-Lieb inequalities, communication complexity, eigenvalues of sums of Hermitian matrices, and quantum information theory. Some of the known theorems in these areas, several of which had no effective proof, are straightforward consequences of our characterization theorem. For instance, we obtain a simple algorithm to find, when they exist, a tuple of Hermitian matrices with given spectra whose sum has a given spectrum. We also prove new theorems such as a generalization of Forster's theorem (Forster, Journal of Computer and System Sciences, 2002) concerning radial isotropic position.
△ Less
Submitted 25 June, 2018; v1 submitted 1 January, 2018;
originally announced January 2018.