-
Just Wing It: Optimal Estimation of Missing Mass in a Markovian Sequence
Authors:
Ashwin Pananjady,
Vidya Muthukumar,
Andrew Thangaraj
Abstract:
We study the problem of estimating the stationary mass -- also called the unigram mass -- that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem has several applications -- for example, estimating the stationary missing mass is critical for accurately smoothing probability estimates in sequence models. While the classical Good--Turing estimator from the 195…
▽ More
We study the problem of estimating the stationary mass -- also called the unigram mass -- that is missing from a single trajectory of a discrete-time, ergodic Markov chain. This problem has several applications -- for example, estimating the stationary missing mass is critical for accurately smoothing probability estimates in sequence models. While the classical Good--Turing estimator from the 1950s has appealing properties for i.i.d. data, it is known to be biased in the Markov setting, and other heuristic estimators do not come equipped with guarantees. Operating in the general setting in which the size of the state space may be much larger than the length $n$ of the trajectory, we develop a linear-runtime estimator called \emph{Windowed Good--Turing} (\textsc{WingIt}) and show that its risk decays as $\widetilde{\mathcal{O}}(\mathsf{T_{mix}}/n)$, where $\mathsf{T_{mix}}$ denotes the mixing time of the chain in total variation distance. Notably, this rate is independent of the size of the state space and minimax-optimal up to a logarithmic factor in $n / \mathsf{T_{mix}}$. We also present a bound on the variance of the missing mass random variable, which may be of independent interest. We extend our estimator to approximate the stationary mass placed on elements occurring with small frequency in $X^n$. Finally, we demonstrate the efficacy of our estimators both in simulations on canonical chains and on sequences constructed from a popular natural language corpus.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Efficient reductions between some statistical models
Authors:
Mengqi Lou,
Guy Bresler,
Ashwin Pananjady
Abstract:
We study the problem of approximately transforming a sample from a source statistical model to a sample from a target statistical model without knowing the parameters of the source model, and construct several computationally efficient such reductions between statistical experiments. In particular, we provide computationally efficient procedures that approximately reduce uniform, Erlang, and Lapla…
▽ More
We study the problem of approximately transforming a sample from a source statistical model to a sample from a target statistical model without knowing the parameters of the source model, and construct several computationally efficient such reductions between statistical experiments. In particular, we provide computationally efficient procedures that approximately reduce uniform, Erlang, and Laplace location models to general target families. We illustrate our methodology by establishing nonasymptotic reductions between some canonical high-dimensional problems, spanning mixtures of experts, phase retrieval, and signal denoising. Notably, the reductions are structure preserving and can accommodate missing data. We also point to a possible application in transforming one differentially private mechanism to another.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Perceptual adjustment queries and an inverted measurement paradigm for low-rank metric learning
Authors:
Austin Xu,
Andrew D. McRae,
**gyan Wang,
Mark A. Davenport,
Ashwin Pananjady
Abstract:
We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobi…
▽ More
We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobis distance. This gives rise to a high-dimensional, low-rank matrix estimation problem to which standard matrix estimators cannot be applied. Consequently, we develop a two-stage estimator for metric learning from PAQs, and provide sample complexity guarantees for this estimator. We present numerical simulations demonstrating the performance of the estimator and its notable properties.
△ Less
Submitted 8 September, 2023;
originally announced September 2023.
-
Do algorithms and barriers for sparse principal component analysis extend to other structured settings?
Authors:
Guanyi Wang,
Mengqi Lou,
Ashwin Pananjady
Abstract:
We study a principal component analysis problem under the spiked Wishart model in which the structure in the signal is captured by a class of union-of-subspace models. This general class includes vanilla sparse PCA as well as its variants with graph sparsity. With the goal of studying these problems under a unified statistical and computational lens, we establish fundamental limits that depend on…
▽ More
We study a principal component analysis problem under the spiked Wishart model in which the structure in the signal is captured by a class of union-of-subspace models. This general class includes vanilla sparse PCA as well as its variants with graph sparsity. With the goal of studying these problems under a unified statistical and computational lens, we establish fundamental limits that depend on the geometry of the problem instance, and show that a natural projected power method exhibits local convergence to the statistically near-optimal neighborhood of the solution. We complement these results with end-to-end analyses of two important special cases given by path and tree sparsity in a general basis, showing initialization methods and matching evidence of computational hardness. Overall, our results indicate that several of the phenomena observed for vanilla sparse PCA extend in a natural fashion to its structured counterparts.
△ Less
Submitted 31 December, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Sharp analysis of EM for learning mixtures of pairwise differences
Authors:
Abhishek Dhawan,
Cheng Mao,
Ashwin Pananjady
Abstract:
We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$-norm guarantee on the estimation err…
▽ More
We consider a symmetric mixture of linear regressions with random samples from the pairwise comparison design, which can be seen as a noisy version of a type of Euclidean distance geometry problem. We analyze the expectation-maximization (EM) algorithm locally around the ground truth and establish that the sequence converges linearly, providing an $\ell_\infty$-norm guarantee on the estimation error of the iterates. Furthermore, we show that the limit of the EM sequence achieves the sharp rate of estimation in the $\ell_2$-norm, matching the information-theoretically optimal constant. We also argue through simulation that convergence from a random initialization is much more delicate in this setting, and does not appear to occur in general. Our results show that the EM algorithm can exhibit several unique behaviors when the covariate distribution is suitably structured.
△ Less
Submitted 22 June, 2023; v1 submitted 20 February, 2023;
originally announced February 2023.
-
A Dual Accelerated Method for Online Stochastic Distributed Averaging: From Consensus to Decentralized Policy Evaluation
Authors:
Sheng Zhang,
Ashwin Pananjady,
Justin Romberg
Abstract:
Motivated by decentralized sensing and policy evaluation problems, we consider a particular type of distributed stochastic optimization problem over a network, called the online stochastic distributed averaging problem. We design a dual-based method for this distributed consensus problem with Polyak--Ruppert averaging and analyze its behavior. We show that the proposed algorithm attains an acceler…
▽ More
Motivated by decentralized sensing and policy evaluation problems, we consider a particular type of distributed stochastic optimization problem over a network, called the online stochastic distributed averaging problem. We design a dual-based method for this distributed consensus problem with Polyak--Ruppert averaging and analyze its behavior. We show that the proposed algorithm attains an accelerated deterministic error depending optimally on the condition number of the network, and also that it has an order-optimal stochastic error. This improves on the guarantees of state-of-the-art distributed stochastic optimization algorithms when specialized to this setting, and yields -- among other things -- corollaries for decentralized policy evaluation. Our proofs rely on explicitly studying the evolution of several relevant linear systems, and may be of independent interest. Numerical experiments are provided, which validate our theoretical results and demonstrate that our approach outperforms existing methods in finite-sample scenarios on several natural network topologies.
△ Less
Submitted 8 August, 2022; v1 submitted 23 July, 2022;
originally announced July 2022.
-
Modeling and Correcting Bias in Sequential Evaluation
Authors:
**gyan Wang,
Ashwin Pananjady
Abstract:
We consider the problem of sequential evaluation, in which an evaluator observes candidates in a sequence and assigns scores to these candidates in an online, irrevocable fashion. Motivated by the psychology literature that has studied sequential bias in such settings -- namely, dependencies between the evaluation outcome and the order in which the candidates appear -- we propose a natural model f…
▽ More
We consider the problem of sequential evaluation, in which an evaluator observes candidates in a sequence and assigns scores to these candidates in an online, irrevocable fashion. Motivated by the psychology literature that has studied sequential bias in such settings -- namely, dependencies between the evaluation outcome and the order in which the candidates appear -- we propose a natural model for the evaluator's rating process that captures the lack of calibration inherent to such a task. We conduct crowdsourcing experiments to demonstrate various facets of our model. We then proceed to study how to correct sequential bias under our model by posing this as a statistical inference problem. We propose a near-linear time, online algorithm for this task and prove guarantees in terms of two canonical ranking metrics. We also prove that our algorithm is information theoretically optimal, by establishing matching lower bounds in both metrics. Finally, we perform a host of numerical experiments to show that our algorithm often outperforms the de facto method of using the rankings induced by the reported scores, both in simulation and on the crowdsourcing data that we collected.
△ Less
Submitted 16 November, 2023; v1 submitted 3 May, 2022;
originally announced May 2022.
-
Accelerated and instance-optimal policy evaluation with linear function approximation
Authors:
Tianjiao Li,
Guanghui Lan,
Ashwin Pananjady
Abstract:
We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-depe…
▽ More
We study the problem of policy evaluation with linear function approximation and present efficient and practical algorithms that come with strong optimality guarantees. We begin by proving lower bounds that establish baselines on both the deterministic error and stochastic error in this problem. In particular, we prove an oracle complexity lower bound on the deterministic error in an instance-dependent norm associated with the stationary distribution of the transition kernel, and use the local asymptotic minimax machinery to prove an instance-dependent lower bound on the stochastic error in the i.i.d. observation model. Existing algorithms fail to match at least one of these lower bounds: To illustrate, we analyze a variance-reduced variant of temporal difference learning, showing in particular that it fails to achieve the oracle complexity lower bound. To remedy this issue, we develop an accelerated, variance-reduced fast temporal difference algorithm (VRFTD) that simultaneously matches both lower bounds and attains a strong notion of instance-optimality. Finally, we extend the VRFTD algorithm to the setting with Markovian observations, and provide instance-dependent convergence results. Our theoretical guarantees of optimality are corroborated by numerical experiments.
△ Less
Submitted 13 August, 2022; v1 submitted 24 December, 2021;
originally announced December 2021.
-
Optimal and instance-dependent guarantees for Markovian linear stochastic approximation
Authors:
Wenlong Mou,
Ashwin Pananjady,
Martin J. Wainwright,
Peter L. Bartlett
Abstract:
We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a…
▽ More
We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise -- covering the TD($λ$) family of algorithms for all $λ\in [0, 1)$ -- and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $λ$ when running the TD($λ$) algorithm).
△ Less
Submitted 11 May, 2024; v1 submitted 23 December, 2021;
originally announced December 2021.
-
Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits
Authors:
Wenshuo Guo,
Kumar Krishna Agrawal,
Aditya Grover,
Vidya Muthukumar,
Ashwin Pananjady
Abstract:
We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy, and thereby suffer from an identifiability issue. In contrast, we propose to leverage the demonstrator's behavior…
▽ More
We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement learning assume the execution of an optimal policy, and thereby suffer from an identifiability issue. In contrast, we propose to leverage the demonstrator's behavior en route to optimality, and in particular, the exploration phase, for reward estimation. We begin by establishing a general information-theoretic lower bound under this paradigm that applies to any demonstrator algorithm, which characterizes a fundamental tradeoff between reward estimation and the amount of exploration of the demonstrator. Then, we develop simple and efficient reward estimators for upper-confidence-based demonstrator algorithms that attain the optimal tradeoff, showing in particular that consistent reward estimation -- free of identifiability issues -- is possible under our paradigm. Extensive simulations on both synthetic and semi-synthetic data corroborate our theoretical results.
△ Less
Submitted 22 February, 2022; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Preference learning along multiple criteria: A game-theoretic perspective
Authors:
Kush Bhatia,
Ashwin Pananjady,
Peter L. Bartlett,
Anca D. Dragan,
Martin J. Wainwright
Abstract:
The literature on ranking from ordinal data is vast, and there are several ways to aggregate overall preferences from pairwise comparisons between objects. In particular, it is well known that any Nash equilibrium of the zero sum game induced by the preference matrix defines a natural solution concept (winning distribution over objects) known as a von Neumann winner. Many real-world problems, howe…
▽ More
The literature on ranking from ordinal data is vast, and there are several ways to aggregate overall preferences from pairwise comparisons between objects. In particular, it is well known that any Nash equilibrium of the zero sum game induced by the preference matrix defines a natural solution concept (winning distribution over objects) known as a von Neumann winner. Many real-world problems, however, are inevitably multi-criteria, with different pairwise preferences governing the different criteria. In this work, we generalize the notion of a von Neumann winner to the multi-criteria setting by taking inspiration from Blackwell's approachability. Our framework allows for non-linear aggregation of preferences across criteria, and generalizes the linearization-based approach from multi-objective optimization.
From a theoretical standpoint, we show that the Blackwell winner of a multi-criteria problem instance can be computed as the solution to a convex optimization problem. Furthermore, given random samples of pairwise comparisons, we show that a simple plug-in estimator achieves near-optimal minimax sample complexity. Finally, we showcase the practical utility of our framework in a user study on autonomous driving, where we find that the Blackwell winner outperforms the von Neumann winner for the overall preferences.
△ Less
Submitted 4 May, 2021;
originally announced May 2021.
-
Optimal oracle inequalities for solving projected fixed-point equations
Authors:
Wenlong Mou,
Ashwin Pananjady,
Martin J. Wainwright
Abstract:
Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper…
▽ More
Linear fixed point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak--Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor, and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality.
△ Less
Submitted 9 December, 2020;
originally announced December 2020.
-
Isotonic regression with unknown permutations: Statistics, computation, and adaptation
Authors:
Ashwin Pananjady,
Richard J. Samworth
Abstract:
Motivated by models for multiway comparison data, we consider the problem of estimating a coordinate-wise isotonic function on the domain $[0, 1]^d$ from noisy observations collected on a uniform lattice, but where the design points have been permuted along each dimension. While the univariate and bivariate versions of this problem have received significant attention, our focus is on the multivari…
▽ More
Motivated by models for multiway comparison data, we consider the problem of estimating a coordinate-wise isotonic function on the domain $[0, 1]^d$ from noisy observations collected on a uniform lattice, but where the design points have been permuted along each dimension. While the univariate and bivariate versions of this problem have received significant attention, our focus is on the multivariate case $d \geq 3$. We study both the minimax risk of estimation (in empirical $L_2$ loss) and the fundamental limits of adaptation (quantified by the adaptivity index) to a family of piecewise constant functions. We provide a computationally efficient Mirsky partition estimator that is minimax optimal while also achieving the smallest adaptivity index possible for polynomial time procedures. Thus, from a worst-case perspective and in sharp contrast to the bivariate case, the latent permutations in the model do not introduce significant computational difficulties over and above vanilla isotonic regression. On the other hand, the fundamental limits of adaptation are significantly different with and without unknown permutations: Assuming a hardness conjecture from average-case complexity theory, a statistical-computational gap manifests in the former case. In a complementary direction, we show that natural modifications of existing estimators fail to satisfy at least one of the desiderata of optimal worst-case statistical performance, computational efficiency, and fast adaptation. Along the way to showing our results, we improve adaptation results in the special case $d = 2$ and establish some properties of estimators for vanilla isotonic regression, both of which may be of independent interest.
△ Less
Submitted 24 June, 2021; v1 submitted 5 September, 2020;
originally announced September 2020.
-
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis
Authors:
Koulik Khamaru,
Ashwin Pananjady,
Feng Ruan,
Martin J. Wainwright,
Michael I. Jordan
Abstract:
We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations s…
▽ More
We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations show that the widely-used temporal difference (TD) algorithm is strictly suboptimal when evaluated in a non-asymptotic setting, even when combined with Polyak-Ruppert iterate averaging. We remedy this issue by introducing and analyzing variance-reduced forms of stochastic approximation, showing that they achieve non-asymptotic, instance-dependent optimality up to logarithmic factors.
△ Less
Submitted 16 March, 2020;
originally announced March 2020.
-
Instance-dependent $\ell_\infty$-bounds for policy evaluation in tabular reinforcement learning
Authors:
Ashwin Pananjady,
Martin J. Wainwright
Abstract:
Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without a…
▽ More
Markov reward processes (MRPs) are used to model stochastic phenomena arising in operations research, control engineering, robotics, and artificial intelligence, as well as communication and transportation networks. In many of these cases, such as in the policy evaluation problem encountered in reinforcement learning, the goal is to estimate the long-term value function of such a process without access to the underlying population transition and reward functions. Working with samples generated under the synchronous model, we study the problem of estimating the value function of an infinite-horizon, discounted MRP on finitely many states in the $\ell_\infty$-norm. We analyze both the standard plug-in approach to this problem and a more robust variant, and establish non-asymptotic bounds that depend on the (unknown) problem instance, as well as data-dependent bounds that can be evaluated based on the observations of state-transitions and rewards. We show that these approaches are minimax-optimal up to constant factors over natural sub-classes of MRPs. Our analysis makes use of a leave-one-out decoupling argument tailored to the policy evaluation problem, one which may be of independent interest.
△ Less
Submitted 15 September, 2020; v1 submitted 18 September, 2019;
originally announced September 2019.
-
Max-Affine Regression: Provable, Tractable, and Near-Optimal Statistical Estimation
Authors:
Avishek Ghosh,
Ashwin Pananjady,
Adityanand Guntuboyina,
Kannan Ramchandran
Abstract:
Max-affine regression refers to a model where the unknown regression function is modeled as a maximum of $k$ unknown affine functions for a fixed $k \geq 1$. This generalizes linear regression and (real) phase retrieval, and is closely related to convex regression. Working within a non-asymptotic framework, we study this problem in the high-dimensional setting assuming that $k$ is a fixed constant…
▽ More
Max-affine regression refers to a model where the unknown regression function is modeled as a maximum of $k$ unknown affine functions for a fixed $k \geq 1$. This generalizes linear regression and (real) phase retrieval, and is closely related to convex regression. Working within a non-asymptotic framework, we study this problem in the high-dimensional setting assuming that $k$ is a fixed constant, and focus on estimation of the unknown coefficients of the affine functions underlying the model. We analyze a natural alternating minimization (AM) algorithm for the non-convex least squares objective when the design is random. We show that the AM algorithm, when initialized suitably, converges with high probability and at a geometric rate to a small ball around the optimal coefficients. In order to initialize the algorithm, we propose and analyze a combination of a spectral method and a random search scheme in a low-dimensional space, which may be of independent interest. The final rate that we obtain is near-parametric and minimax optimal (up to a poly-logarithmic factor) as a function of the dimension, sample size, and noise variance. In that sense, our approach should be viewed as a direct and implementable method of enforcing regularization to alleviate the curse of dimensionality in problems of the convex regression type. As a by-product of our analysis, we also obtain guarantees on a classical algorithm for the phase retrieval problem under considerably weaker assumptions on the design distribution than was previously known. Numerical experiments illustrate the sharpness of our bounds in the various problem parameters.
△ Less
Submitted 21 June, 2019;
originally announced June 2019.
-
A Family of Bayesian Cramér-Rao Bounds, and Consequences for Log-Concave Priors
Authors:
Efe Aras,
Kuan-Yun Lee,
Ashwin Pananjady,
Thomas A. Courtade
Abstract:
Under minimal regularity assumptions, we establish a family of information-theoretic Bayesian Cramér-Rao bounds, indexed by probability measures that satisfy a logarithmic Sobolev inequality. This family includes as a special case the known Bayesian Cramér-Rao bound (or van Trees inequality), and its less widely known entropic improvement due to Efroimovich. For the setting of a log-concave prior,…
▽ More
Under minimal regularity assumptions, we establish a family of information-theoretic Bayesian Cramér-Rao bounds, indexed by probability measures that satisfy a logarithmic Sobolev inequality. This family includes as a special case the known Bayesian Cramér-Rao bound (or van Trees inequality), and its less widely known entropic improvement due to Efroimovich. For the setting of a log-concave prior, we obtain a Bayesian Cramér-Rao bound which holds for any (possibly biased) estimator and, unlike the van Trees inequality, does not depend on the Fisher information of the prior.
△ Less
Submitted 22 February, 2019;
originally announced February 2019.
-
Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems
Authors:
Dhruv Malik,
Ashwin Pananjady,
Kush Bhatia,
Koulik Khamaru,
Peter L. Bartlett,
Martin J. Wainwright
Abstract:
We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order eva…
▽ More
We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of these methods when applied to linear-quadratic systems, and study various settings of driving noise and reward feedback. We show that these methods provably converge to within any pre-specified tolerance of the optimal policy with a number of zero-order evaluations that is an explicit polynomial of the error tolerance, dimension, and curvature properties of the problem. Our analysis reveals some interesting differences between the settings of additive driving noise and random initialization, as well as the settings of one-point and two-point reward feedback. Our theory is corroborated by extensive simulations of derivative-free methods on these systems. Along the way, we derive convergence rates for stochastic zero-order optimization algorithms when applied to a certain class of non-convex problems.
△ Less
Submitted 18 May, 2020; v1 submitted 19 December, 2018;
originally announced December 2018.
-
Towards Optimal Estimation of Bivariate Isotonic Matrices with Unknown Permutations
Authors:
Cheng Mao,
Ashwin Pananjady,
Martin J. Wainwright
Abstract:
Many applications, including rank aggregation, crowd-labeling, and graphon estimation, can be modeled in terms of a bivariate isotonic matrix with unknown permutations acting on its rows and/or columns. We consider the problem of estimating an unknown matrix in this class, based on noisy observations of (possibly, a subset of) its entries. We design and analyze polynomial-time algorithms that impr…
▽ More
Many applications, including rank aggregation, crowd-labeling, and graphon estimation, can be modeled in terms of a bivariate isotonic matrix with unknown permutations acting on its rows and/or columns. We consider the problem of estimating an unknown matrix in this class, based on noisy observations of (possibly, a subset of) its entries. We design and analyze polynomial-time algorithms that improve upon the state of the art in two distinct metrics, showing, in particular, that minimax optimal, computationally efficient estimation is achievable in certain settings. Along the way, we prove matching upper and lower bounds on the minimax radii of certain cone testing problems, which may be of independent interest.
△ Less
Submitted 26 October, 2019; v1 submitted 25 June, 2018;
originally announced June 2018.
-
Breaking the $1/\sqrt{n}$ Barrier: Faster Rates for Permutation-based Models in Polynomial Time
Authors:
Cheng Mao,
Ashwin Pananjady,
Martin J. Wainwright
Abstract:
Many applications, including rank aggregation and crowd-labeling, can be modeled in terms of a bivariate isotonic matrix with unknown permutations acting on its rows and columns. We consider the problem of estimating such a matrix based on noisy observations of a subset of its entries, and design and analyze a polynomial-time algorithm that improves upon the state of the art. In particular, our re…
▽ More
Many applications, including rank aggregation and crowd-labeling, can be modeled in terms of a bivariate isotonic matrix with unknown permutations acting on its rows and columns. We consider the problem of estimating such a matrix based on noisy observations of a subset of its entries, and design and analyze a polynomial-time algorithm that improves upon the state of the art. In particular, our results imply that any such $n \times n$ matrix can be estimated efficiently in the normalized Frobenius norm at rate $\widetilde{\mathcal O}(n^{-3/4})$, thus narrowing the gap between $\widetilde{\mathcal O}(n^{-1})$ and $\widetilde{\mathcal O}(n^{-1/2})$, which were hitherto the rates of the most statistically and computationally efficient methods, respectively.
△ Less
Submitted 5 June, 2018; v1 submitted 27 February, 2018;
originally announced February 2018.
-
Worst-case vs Average-case Design for Estimation from Fixed Pairwise Comparisons
Authors:
Ashwin Pananjady,
Cheng Mao,
Vidya Muthukumar,
Martin J. Wainwright,
Thomas A. Courtade
Abstract:
Pairwise comparison data arises in many domains, including tournament rankings, web search, and preference elicitation. Given noisy comparisons of a fixed subset of pairs of items, we study the problem of estimating the underlying comparison probabilities under the assumption of strong stochastic transitivity (SST). We also consider the noisy sorting subclass of the SST model. We show that when th…
▽ More
Pairwise comparison data arises in many domains, including tournament rankings, web search, and preference elicitation. Given noisy comparisons of a fixed subset of pairs of items, we study the problem of estimating the underlying comparison probabilities under the assumption of strong stochastic transitivity (SST). We also consider the noisy sorting subclass of the SST model. We show that when the assignment of items to the topology is arbitrary, these permutation-based models, unlike their parametric counterparts, do not admit consistent estimation for most comparison topologies used in practice. We then demonstrate that consistent estimation is possible when the assignment of items to the topology is randomized, thus establishing a dichotomy between worst-case and average-case designs. We propose two estimators in the average-case setting and analyze their risk, showing that it depends on the comparison topology only through the degree sequence of the topology. The rates achieved by these estimators are shown to be optimal for a large class of graphs. Our results are corroborated by simulations on multiple comparison topologies.
△ Less
Submitted 19 July, 2017;
originally announced July 2017.
-
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Authors:
Dong Yin,
Ashwin Pananjady,
Max Lam,
Dimitris Papailiopoulos,
Kannan Ramchandran,
Peter Bartlett
Abstract:
It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size. In this work, we present an analysis hinting that high similarity between concurrently processed gradients may be a cause of this performance degradation. We introduce the notio…
▽ More
It has been experimentally observed that distributed implementations of mini-batch stochastic gradient descent (SGD) algorithms exhibit speedup saturation and decaying generalization ability beyond a particular batch-size. In this work, we present an analysis hinting that high similarity between concurrently processed gradients may be a cause of this performance degradation. We introduce the notion of gradient diversity that measures the dissimilarity between concurrent gradient updates, and show its key role in the performance of mini-batch SGD. We prove that on problems with high gradient diversity, mini-batch SGD is amenable to better speedups, while maintaining the generalization performance of serial (one sample) SGD. We further establish lower bounds on convergence where mini-batch SGD slows down beyond a particular batch-size, solely due to the lack of gradient diversity. We provide experimental evidence indicating the key role of gradient diversity in distributed learning, and discuss how heuristics like dropout, Langevin dynamics, and quantization can improve it.
△ Less
Submitted 6 January, 2018; v1 submitted 18 June, 2017;
originally announced June 2017.
-
Denoising Linear Models with Permuted Data
Authors:
Ashwin Pananjady,
Martin J. Wainwright,
Thomas A. Courtade
Abstract:
The multivariate linear regression model with shuffled data and additive Gaussian noise arises in various correspondence estimation and matching problems. Focusing on the denoising aspect of this problem, we provide a characterization the minimax error rate that is sharp up to logarithmic factors. We also analyze the performance of two versions of a computationally efficient estimator, and establi…
▽ More
The multivariate linear regression model with shuffled data and additive Gaussian noise arises in various correspondence estimation and matching problems. Focusing on the denoising aspect of this problem, we provide a characterization the minimax error rate that is sharp up to logarithmic factors. We also analyze the performance of two versions of a computationally efficient estimator, and establish their consistency for a large range of input parameters. Finally, we provide an exact algorithm for the noiseless problem and demonstrate its performance on an image point-cloud matching task. Our analysis also extends to datasets with outliers.
△ Less
Submitted 24 April, 2017;
originally announced April 2017.
-
Existence of Stein Kernels under a Spectral Gap, and Discrepancy Bound
Authors:
Thomas A. Courtade,
Max Fathi,
Ashwin Pananjady
Abstract:
We establish existence of Stein kernels for probability measures on $\mathbb{R}^d$ satisfying a Poincaré inequality, and obtain bounds on the Stein discrepancy of such measures. Applications to quantitative central limit theorems are discussed, including a new CLT in Wasserstein distance $W_2$ with optimal rate and dependence on the dimension. As a byproduct, we obtain a stability version of an es…
▽ More
We establish existence of Stein kernels for probability measures on $\mathbb{R}^d$ satisfying a Poincaré inequality, and obtain bounds on the Stein discrepancy of such measures. Applications to quantitative central limit theorems are discussed, including a new CLT in Wasserstein distance $W_2$ with optimal rate and dependence on the dimension. As a byproduct, we obtain a stability version of an estimate of the Poincaré constant of probability measures under a second moment constraint. The results extend more generally to the setting of converse weighted Poincaré inequalities. The proof is based on simple arguments of calculus of variations.
Further, we establish two general properties enjoyed by the Stein discrepancy, holding whenever a Stein kernel exists: Stein discrepancy is strictly decreasing along the CLT, and it controls the skewness of a random vector.
△ Less
Submitted 8 March, 2018; v1 submitted 22 March, 2017;
originally announced March 2017.
-
Wasserstein Stability of the Entropy Power Inequality for Log-Concave Densities
Authors:
Thomas A. Courtade,
Max Fathi,
Ashwin Pananjady
Abstract:
We establish quantitative stability results for the entropy power inequality (EPI). Specifically, we show that if uniformly log-concave densities nearly saturate the EPI, then they must be close to Gaussian densities in the quadratic Wasserstein distance. Further, if one of the densities is log-concave and the other is Gaussian, then the deficit in the EPI can be controlled in terms of the $L^1$-W…
▽ More
We establish quantitative stability results for the entropy power inequality (EPI). Specifically, we show that if uniformly log-concave densities nearly saturate the EPI, then they must be close to Gaussian densities in the quadratic Wasserstein distance. Further, if one of the densities is log-concave and the other is Gaussian, then the deficit in the EPI can be controlled in terms of the $L^1$-Wasserstein distance. As a counterpoint, an example shows that the EPI can be unstable with respect to the quadratic Wasserstein distance when densities are uniformly log-concave on sets of measure arbitrarily close to one. Our stability results can be extended to non-log-concave densities, provided certain regularity conditions are met. The proofs are based on optimal transportation.
△ Less
Submitted 25 October, 2016;
originally announced October 2016.
-
Linear Regression with an Unknown Permutation: Statistical and Computational Limits
Authors:
Ashwin Pananjady,
Martin J. Wainwright,
Thomas A. Courtade
Abstract:
Consider a noisy linear observation model with an unknown permutation, based on observing $y = Π^* A x^* + w$, where $x^* \in \mathbb{R}^d$ is an unknown vector, $Π^*$ is an unknown $n \times n$ permutation matrix, and $w \in \mathbb{R}^n$ is additive Gaussian noise. We analyze the problem of permutation recovery in a random design setting in which the entries of the matrix $A$ are drawn i.i.d. fr…
▽ More
Consider a noisy linear observation model with an unknown permutation, based on observing $y = Π^* A x^* + w$, where $x^* \in \mathbb{R}^d$ is an unknown vector, $Π^*$ is an unknown $n \times n$ permutation matrix, and $w \in \mathbb{R}^n$ is additive Gaussian noise. We analyze the problem of permutation recovery in a random design setting in which the entries of the matrix $A$ are drawn i.i.d. from a standard Gaussian distribution, and establish sharp conditions on the SNR, sample size $n$, and dimension $d$ under which $Π^*$ is exactly and approximately recoverable. On the computational front, we show that the maximum likelihood estimate of $Π^*$ is NP-hard to compute, while also providing a polynomial time algorithm when $d =1$.
△ Less
Submitted 9 August, 2016;
originally announced August 2016.
-
Compressing Sparse Sequences under Local Decodability Constraints
Authors:
Ashwin Pananjady,
Thomas A. Courtade
Abstract:
We consider a variable-length source coding problem subject to local decodability constraints. In particular, we investigate the blocklength scaling behavior attainable by encodings of $r$-sparse binary sequences, under the constraint that any source bit can be correctly decoded upon probing at most $d$ codeword bits. We consider both adaptive and non-adaptive access models, and derive upper and l…
▽ More
We consider a variable-length source coding problem subject to local decodability constraints. In particular, we investigate the blocklength scaling behavior attainable by encodings of $r$-sparse binary sequences, under the constraint that any source bit can be correctly decoded upon probing at most $d$ codeword bits. We consider both adaptive and non-adaptive access models, and derive upper and lower bounds that often coincide up to constant factors. Notably, such a characterization for the fixed-blocklength analog of our problem remains unknown, despite considerable research over the last three decades. Connections to communication complexity are also briefly discussed.
△ Less
Submitted 8 April, 2015;
originally announced April 2015.
-
The Online Disjoint Set Cover Problem and its Applications
Authors:
Ashwin Pananjady,
Vivek Kumar Bagaria,
Rahul Vaze
Abstract:
Given a universe $U$ of $n$ elements and a collection of subsets $\mathcal{S}$ of $U$, the maximum disjoint set cover problem (DSCP) is to partition $\mathcal{S}$ into as many set covers as possible, where a set cover is defined as a collection of subsets whose union is $U$. We consider the online DSCP, in which the subsets arrive one by one (possibly in an order chosen by an adversary), and must…
▽ More
Given a universe $U$ of $n$ elements and a collection of subsets $\mathcal{S}$ of $U$, the maximum disjoint set cover problem (DSCP) is to partition $\mathcal{S}$ into as many set covers as possible, where a set cover is defined as a collection of subsets whose union is $U$. We consider the online DSCP, in which the subsets arrive one by one (possibly in an order chosen by an adversary), and must be irrevocably assigned to some partition on arrival with the objective of minimizing the competitive ratio. The competitive ratio of an online DSCP algorithm $A$ is defined as the maximum ratio of the number of disjoint set covers obtained by the optimal offline algorithm to the number of disjoint set covers obtained by $A$ across all inputs. We propose an online algorithm for solving the DSCP with competitive ratio $\ln n$. We then show a lower bound of $Ω(\sqrt{\ln n})$ on the competitive ratio for any online DSCP algorithm. The online disjoint set cover problem has wide ranging applications in practice, including the online crowd-sourcing problem, the online coverage lifetime maximization problem in wireless sensor networks, and in online resource allocation problems.
△ Less
Submitted 20 November, 2014;
originally announced November 2014.
-
On the Complexity of Making a Distinguished Vertex Minimum or Maximum Degree by Vertex Deletion
Authors:
Sounaka Mishra,
Ashwin Pananjady,
N Safina Devi
Abstract:
In this paper, we investigate the approximability of two node deletion problems. Given a vertex weighted graph $G=(V,E)$ and a specified, or "distinguished" vertex $p \in V$, MDD(min) is the problem of finding a minimum weight vertex set $S \subseteq V\setminus \{p\}$ such that $p$ becomes the minimum degree vertex in $G[V \setminus S]$; and MDD(max) is the problem of finding a minimum weight vert…
▽ More
In this paper, we investigate the approximability of two node deletion problems. Given a vertex weighted graph $G=(V,E)$ and a specified, or "distinguished" vertex $p \in V$, MDD(min) is the problem of finding a minimum weight vertex set $S \subseteq V\setminus \{p\}$ such that $p$ becomes the minimum degree vertex in $G[V \setminus S]$; and MDD(max) is the problem of finding a minimum weight vertex set $S \subseteq V\setminus \{p\}$ such that $p$ becomes the maximum degree vertex in $G[V \setminus S]$. These are known $NP$-complete problems and have been studied from the parameterized complexity point of view in previous work. Here, we prove that for any $ε> 0$, both the problems cannot be approximated within a factor $(1 - ε)\log n$, unless $NP \subseteq DTIME(n^{\log\log n})$. We also show that for any $ε> 0$, MDD(min) cannot be approximated within a factor $(1 -ε)\log n$ on bipartite graphs, unless $NP \subseteq DTIME(n^{\log\log n})$, and that for any $ε> 0$, MDD(max) cannot be approximated within a factor $(1/2 - ε)\log n$ on bipartite graphs, unless $NP \subseteq DTIME(n^{\log\log n})$. We give an $O(\log n)$ factor approximation algorithm for MDD(max) on general graphs, provided the degree of $p$ is $O(\log n)$. We then show that if the degree of $p$ is $n-O(\log n)$, a similar result holds for MDD(min). We prove that MDD(max) is $APX$-complete on 3-regular unweighted graphs and provide an approximation algorithm with ratio $1.583$ when $G$ is a 3-regular unweighted graph. In addition, we show that MDD(min) can be solved in polynomial time when $G$ is a regular graph of constant degree.
△ Less
Submitted 14 January, 2014; v1 submitted 13 December, 2013;
originally announced December 2013.
-
Maximizing Utility Among Selfish Users in Social Groups
Authors:
Ashwin Pananjady,
Vivek Kumar Bagaria,
Rahul Vaze
Abstract:
We consider the problem of a social group of users trying to obtain a "universe" of files, first from a server and then via exchange amongst themselves. We consider the selfish file-exchange paradigm of give-and-take, whereby two users can exchange files only if each has something unique to offer the other. We are interested in maximizing the number of users who can obtain the universe through a s…
▽ More
We consider the problem of a social group of users trying to obtain a "universe" of files, first from a server and then via exchange amongst themselves. We consider the selfish file-exchange paradigm of give-and-take, whereby two users can exchange files only if each has something unique to offer the other. We are interested in maximizing the number of users who can obtain the universe through a schedule of file-exchanges. We first present a practical paradigm of file acquisition. We then present an algorithm which ensures that at least half the users obtain the universe with high probability for $n$ files and $m=O(\log n)$ users when $n\rightarrow\infty$, thereby showing an approximation ratio of 2. Extending these ideas, we show a $1+ε_1$ - approximation algorithm for $m=O(n)$, $ε_1>0$ and a $(1+z)/2 +ε_2$ - approximation algorithm for $m=O(n^z)$, $z>1$, $ε_2>0$. Finally, we show that for any $m=O(e^{o(n)})$, there exists a schedule of file exchanges which ensures that at least half the users obtain the universe.
△ Less
Submitted 30 September, 2013;
originally announced September 2013.
-
Optimally Approximating the Coverage Lifetime of Wireless Sensor Networks
Authors:
Vivek Kumar Bagaria,
Ashwin Pananjady,
Rahul Vaze
Abstract:
We consider the problem of maximizing the lifetime of coverage (MLCP) of targets in a wireless sensor network with battery-limited sensors. We first show that the MLCP cannot be approximated within a factor less than $\ln n$ by any polynomial time algorithm, where $n$ is the number of targets. This provides closure to the long-standing open problem of showing optimality of previously known…
▽ More
We consider the problem of maximizing the lifetime of coverage (MLCP) of targets in a wireless sensor network with battery-limited sensors. We first show that the MLCP cannot be approximated within a factor less than $\ln n$ by any polynomial time algorithm, where $n$ is the number of targets. This provides closure to the long-standing open problem of showing optimality of previously known $\ln n$ approximation algorithms. We also derive a new $\ln n$ approximation to the MLCP by showing a $\ln n$ approximation to the maximum disjoint set cover problem (DSCP), which has many advantages over previous MLCP algorithms, including an easy extension to the $k$-coverage problem. We then present an improvement (in certain cases) to the $\ln n$ algorithm in terms of a newly defined quantity "expansiveness" of the network. For the special one-dimensional case, where each sensor can monitor a contiguous region of possibly different lengths, we show that the MLCP solution is equal to the DSCP solution, and can be found in polynomial time. Finally, for the special two-dimensional case, where each sensor can monitor a circular area with a given radius around itself, we combine existing results to derive a $1+ε$ approximation algorithm for solving MLCP for any $ε>0$.
△ Less
Submitted 28 June, 2014; v1 submitted 19 July, 2013;
originally announced July 2013.