-
$λ$-Regularized A-Optimal Design and its Approximation by $λ$-Regularized Proportional Volume Sampling
Authors:
Uthaipon Tantipongpipat
Abstract:
In this work, we study the $λ$-regularized $A$-optimal design problem and introduce the $λ$-regularized proportional volume sampling algorithm, generalized from [Nikolov, Singh, and Tantipongpipat, 2019], for this problem with the approximation guarantee that extends upon the previous work. In this problem, we are given vectors $v_1,\ldots,v_n\in\mathbb{R}^d$ in $d$ dimensions, a budget $k\leq n$,…
▽ More
In this work, we study the $λ$-regularized $A$-optimal design problem and introduce the $λ$-regularized proportional volume sampling algorithm, generalized from [Nikolov, Singh, and Tantipongpipat, 2019], for this problem with the approximation guarantee that extends upon the previous work. In this problem, we are given vectors $v_1,\ldots,v_n\in\mathbb{R}^d$ in $d$ dimensions, a budget $k\leq n$, and the regularizer parameter $λ\geq0$, and the goal is to find a subset $S\subseteq [n]$ of size $k$ that minimizes the trace of $\left(\sum_{i\in S}v_iv_i^\top + λI_d\right)^{-1}$ where $I_d$ is the $d\times d$ identity matrix. The problem is motivated from optimal design in ridge regression, where one tries to minimize the expected squared error of the ridge regression predictor from the true coefficient in the underlying linear model. We introduce $λ$-regularized proportional volume sampling and give its polynomial-time implementation to solve this problem. We show its $(1+\fracε{\sqrt{1+λ'}})$-approximation for $k=Ω\left(\frac dε+\frac{\log 1/ε}{ε^2}\right)$ where $λ'$ is proportional to $λ$, extending the previous bound in [Nikolov, Singh, and Tantipongpipat, 2019] to the case $λ>0$ and obtaining asymptotic optimality as $λ\rightarrow \infty$.
△ Less
Submitted 19 June, 2020;
originally announced June 2020.
-
Maximizing Determinants under Matroid Constraints
Authors:
Vivek Madan,
Aleksandar Nikolov,
Mohit Singh,
Uthaipon Tantipongpipat
Abstract:
Given vectors $v_1,\dots,v_n\in\mathbb{R}^d$ and a matroid $M=([n],I)$, we study the problem of finding a basis $S$ of $M$ such that $\det(\sum_{i \in S}v_i v_i^\top)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning. The current best results include an $e^{2k}$-estimation for any matroid of ran…
▽ More
Given vectors $v_1,\dots,v_n\in\mathbb{R}^d$ and a matroid $M=([n],I)$, we study the problem of finding a basis $S$ of $M$ such that $\det(\sum_{i \in S}v_i v_i^\top)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning. The current best results include an $e^{2k}$-estimation for any matroid of rank $k$ and a $(1+ε)^d$-approximation for a uniform matroid of rank $k\ge d+\frac dε$, where the rank $k\ge d$ denotes the desired size of the optimal set. Our main result is a new approximation algorithm with an approximation guarantee that depends only on the dimension $d$ of the vectors and not on the size $k$ of the output set. In particular, we show an $(O(d))^{d}$-estimation and an $(O(d))^{d^3}$-approximation for any matroid, giving a significant improvement over prior work when $k\gg d$.
Our result relies on the existence of an optimal solution to a convex programming relaxation for the problem which has sparse support; in particular, no more than $O(d^2)$ variables of the solution have fractional values. The sparsity results rely on the interplay between the first-order optimality conditions for the convex program and matroid theory. We believe that the techniques introduced to show sparsity of optimal solutions to convex programs will be of independent interest. We also give a randomized algorithm that rounds a sparse fractional solution to a feasible integral solution to the original problem. To show the approximation guarantee, we utilize recent works on strongly log-concave polynomials and show new relationships between different convex programs studied for the problem. Finally, we use the estimation algorithm and sparsity results to give an efficient deterministic approximation algorithm with an approximation guarantee that depends solely on the dimension $d$.
△ Less
Submitted 16 April, 2020;
originally announced April 2020.
-
Differentially Private Synthetic Mixed-Type Data Generation For Unsupervised Learning
Authors:
Uthaipon Tantipongpipat,
Chris Waites,
Digvijay Boob,
Amaresh Ankit Siva,
Rachel Cummings
Abstract:
We introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in raw sensitive data and privately train a model for generating synthetic data that will satisfy similar statistical properties as the original data. This learned m…
▽ More
We introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in raw sensitive data and privately train a model for generating synthetic data that will satisfy similar statistical properties as the original data. This learned model can generate an arbitrary amount of synthetic data, which can then be freely shared due to the post-processing guarantee of differential privacy. Our framework is applicable to unlabeled mixed-type data, that may include binary, categorical, and real-valued data. We implement this framework on both binary data (MIMIC-III) and mixed-type data (ADULT), and compare its performance with existing private algorithms on metrics in unsupervised settings. We also introduce a new quantitative metric able to detect diversity, or lack thereof, of synthetic data.
△ Less
Submitted 9 December, 2020; v1 submitted 6 December, 2019;
originally announced December 2019.
-
The Price of Fair PCA: One Extra Dimension
Authors:
Samira Samadi,
Uthaipon Tantipongpipat,
Jamie Morgenstern,
Mohit Singh,
Santosh Vempala
Abstract:
We investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations. We show on several real-world data sets, PCA has higher reconstruction error on population A than on B (for example, women versus men or lower- versus higher-educated individuals). This can happen even when the data set has…
▽ More
We investigate whether the standard dimensionality reduction technique of PCA inadvertently produces data representations with different fidelity for two different populations. We show on several real-world data sets, PCA has higher reconstruction error on population A than on B (for example, women versus men or lower- versus higher-educated individuals). This can happen even when the data set has a similar number of samples from A and B. This motivates our study of dimensionality reduction techniques which maintain similar fidelity for A and B. We define the notion of Fair PCA and give a polynomial-time algorithm for finding a low dimensional representation of the data which is nearly-optimal with respect to this measure. Finally, we show on real-world data sets that our algorithm can be used to efficiently generate a fair low dimensional representation of the data.
△ Less
Submitted 31 October, 2018;
originally announced November 2018.
-
Proportional Volume Sampling and Approximation Algorithms for A-Optimal Design
Authors:
Aleksandar Nikolov,
Mohit Singh,
Uthaipon Tao Tantipongpipat
Abstract:
We study the optimal design problems where the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector in $d$ dimensions. We study the $A$-optimal design variant where the objective is to minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement…
▽ More
We study the optimal design problems where the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector in $d$ dimensions. We study the $A$-optimal design variant where the objective is to minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. The problem also finds applications in sensor placement in wireless networks, sparse least squares regression, feature selection for $k$-means clustering, and matrix approximation. In this paper, we introduce proportional volume sampling to obtain improved approximation algorithms for $A$-optimal design. Our main result is to obtain improved approximation algorithms for the $A$-optimal design problem by introducing the proportional volume sampling algorithm. Our results nearly optimal bounds in the asymptotic regime when the number of measurements done, $k$, is significantly more than the dimension $d$. We also give first approximation algorithms when $k$ is small including when $k=d$. The proportional volume-sampling algorithm also gives approximation algorithms for other optimal design objectives such as $D$-optimal design and generalized ratio objective matching or improving previous best known results. Interestingly, we show that a similar guarantee cannot be obtained for the $E$-optimal design problem. We also show that the $A$-optimal design problem is NP-hard to approximate within a fixed constant when $k=d$.
△ Less
Submitted 17 July, 2018; v1 submitted 22 February, 2018;
originally announced February 2018.