-
Online Control in Population Dynamics
Authors:
Noah Golowich,
Elad Hazan,
Zhou Lu,
Dhruv Rohatgi,
Y. Jennifer Sun
Abstract:
The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of prediction rather than control. Existing mathematical models for control in population dynamics are often restricted to specific, noise-free dynamics,…
▽ More
The study of population dynamics originated with early sociological works but has since extended into many fields, including biology, epidemiology, evolutionary game theory, and economics. Most studies on population dynamics focus on the problem of prediction rather than control. Existing mathematical models for control in population dynamics are often restricted to specific, noise-free dynamics, while real-world population changes can be complex and adversarial.
To address this gap, we propose a new framework based on the paradigm of online control. We first characterize a set of linear dynamical systems that can naturally model evolving populations. We then give an efficient gradient-based controller for these systems, with near-optimal regret bounds with respect to a broad class of linear policies. Our empirical evaluations demonstrate the effectiveness of the proposed algorithm for control in population dynamics even for non-linear models such as SIR and replicator dynamics.
△ Less
Submitted 6 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Lasso with Latents: Efficient Estimation, Covariate Rescaling, and Computational-Statistical Gaps
Authors:
Jonathan Kelner,
Frederic Koehler,
Raghu Meka,
Dhruv Rohatgi
Abstract:
It is well-known that the statistical performance of Lasso can suffer significantly when the covariates of interest have strong correlations. In particular, the prediction error of Lasso becomes much worse than computationally inefficient alternatives like Best Subset Selection. Due to a large conjectured computational-statistical tradeoff in the problem of sparse linear regression, it may be impo…
▽ More
It is well-known that the statistical performance of Lasso can suffer significantly when the covariates of interest have strong correlations. In particular, the prediction error of Lasso becomes much worse than computationally inefficient alternatives like Best Subset Selection. Due to a large conjectured computational-statistical tradeoff in the problem of sparse linear regression, it may be impossible to close this gap in general.
In this work, we propose a natural sparse linear regression setting where strong correlations between covariates arise from unobserved latent variables. In this setting, we analyze the problem caused by strong correlations and design a surprisingly simple fix. While Lasso with standard normalization of covariates fails, there exists a heterogeneous scaling of the covariates with which Lasso will suddenly obtain strong provable guarantees for estimation. Moreover, we design a simple, efficient procedure for computing such a "smart scaling."
The sample complexity of the resulting "rescaled Lasso" algorithm incurs (in the worst case) quadratic dependence on the sparsity of the underlying signal. While this dependence is not information-theoretically necessary, we give evidence that it is optimal among the class of polynomial-time algorithms, via the method of low-degree polynomials. This argument reveals a new connection between sparse linear regression and a special version of sparse PCA with a near-critical negative spike. The latter problem can be thought of as a real-valued analogue of learning a sparse parity. Using it, we also establish the first computational-statistical gap for the closely related problem of learning a Gaussian Graphical Model.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles
Authors:
Noah Golowich,
Ankur Moitra,
Dhruv Rohatgi
Abstract:
The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $φ(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the ``kitchen s…
▽ More
The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $φ(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the ``kitchen sink" approach and hope that the true features are included in a much larger set of potential features. In this paper we revisit linear MDPs from the perspective of feature selection. In a $k$-sparse linear MDP, there is an unknown subset $S \subset [d]$ of size $k$ containing all the relevant features, and the goal is to learn a near-optimal policy in only poly$(k,\log d)$ interactions with the environment. Our main result is the first polynomial-time algorithm for this problem. In contrast, earlier works either made prohibitively strong assumptions that obviated the need for exploration, or required solving computationally intractable optimization problems.
Along the way we introduce the notion of an emulator: a succinct approximate representation of the transitions that suffices for computing certain Bellman backups. Since linear MDPs are a non-parametric model, it is not even obvious whether polynomial-sized emulators exist. We show that they do exist and can be computed efficiently via convex programming.
As a corollary of our main result, we give an algorithm for learning a near-optimal policy in block MDPs whose decoding function is a low-depth decision tree; the algorithm runs in quasi-polynomial time and takes a polynomial number of samples. This can be seen as a reinforcement learning analogue of classic results in computational learning theory. Furthermore, it gives a natural model where improving the sample complexity via representation learning is computationally feasible.
△ Less
Submitted 18 September, 2023; v1 submitted 17 September, 2023;
originally announced September 2023.
-
Feature Adaptation for Sparse Linear Regression
Authors:
Jonathan Kelner,
Frederic Koehler,
Raghu Meka,
Dhruv Rohatgi
Abstract:
Sparse linear regression is a central problem in high-dimensional statistics. We study the correlated random design setting, where the covariates are drawn from a multivariate Gaussian $N(0,Σ)$, and we seek an estimator with small excess risk.
If the true signal is $t$-sparse, information-theoretically, it is possible to achieve strong recovery guarantees with only $O(t\log n)$ samples. However,…
▽ More
Sparse linear regression is a central problem in high-dimensional statistics. We study the correlated random design setting, where the covariates are drawn from a multivariate Gaussian $N(0,Σ)$, and we seek an estimator with small excess risk.
If the true signal is $t$-sparse, information-theoretically, it is possible to achieve strong recovery guarantees with only $O(t\log n)$ samples. However, computationally efficient algorithms have sample complexity linear in (some variant of) the condition number of $Σ$. Classical algorithms such as the Lasso can require significantly more samples than necessary even if there is only a single sparse approximate dependency among the covariates.
We provide a polynomial-time algorithm that, given $Σ$, automatically adapts the Lasso to tolerate a small number of approximate dependencies. In particular, we achieve near-optimal sample complexity for constant sparsity and if $Σ$ has few ``outlier'' eigenvalues. Our algorithm fits into a broader framework of feature adaptation for sparse linear regression with ill-conditioned covariates. With this framework, we additionally provide the first polynomial-factor improvement over brute-force search for constant sparsity $t$ and arbitrary covariance $Σ$.
△ Less
Submitted 26 May, 2023;
originally announced May 2023.
-
Learning in Observable POMDPs, without Computationally Intractable Oracles
Authors:
Noah Golowich,
Ankur Moitra,
Dhruv Rohatgi
Abstract:
Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms either need to make strong assumptions about the model dynamics (e.g. deterministic transitions) or assume access to an oracle for solving a hard optimistic planni…
▽ More
Much of reinforcement learning theory is built on top of oracles that are computationally hard to implement. Specifically for learning near-optimal policies in Partially Observable Markov Decision Processes (POMDPs), existing algorithms either need to make strong assumptions about the model dynamics (e.g. deterministic transitions) or assume access to an oracle for solving a hard optimistic planning or estimation problem as a subroutine. In this work we develop the first oracle-free learning algorithm for POMDPs under reasonable assumptions. Specifically, we give a quasipolynomial-time end-to-end algorithm for learning in "observable" POMDPs, where observability is the assumption that well-separated distributions over states induce well-separated distributions over observations. Our techniques circumvent the more traditional approach of using the principle of optimism under uncertainty to promote exploration, and instead give a novel application of barycentric spanners to constructing policy covers.
△ Less
Submitted 7 June, 2022;
originally announced June 2022.
-
Distributional Hardness Against Preconditioned Lasso via Erasure-Robust Designs
Authors:
Jonathan A. Kelner,
Frederic Koehler,
Raghu Meka,
Dhruv Rohatgi
Abstract:
Sparse linear regression with ill-conditioned Gaussian random designs is widely believed to exhibit a statistical/computational gap, but there is surprisingly little formal evidence for this belief, even in the form of examples that are hard for restricted classes of algorithms. Recent work has shown that, for certain covariance matrices, the broad class of Preconditioned Lasso programs provably c…
▽ More
Sparse linear regression with ill-conditioned Gaussian random designs is widely believed to exhibit a statistical/computational gap, but there is surprisingly little formal evidence for this belief, even in the form of examples that are hard for restricted classes of algorithms. Recent work has shown that, for certain covariance matrices, the broad class of Preconditioned Lasso programs provably cannot succeed on polylogarithmically sparse signals with a sublinear number of samples. However, this lower bound only shows that for every preconditioner, there exists at least one signal that it fails to recover successfully. This leaves open the possibility that, for example, trying multiple different preconditioners solves every sparse linear regression problem.
In this work, we prove a stronger lower bound that overcomes this issue. For an appropriate covariance matrix, we construct a single signal distribution on which any invertibly-preconditioned Lasso program fails with high probability, unless it receives a linear number of samples.
Surprisingly, at the heart of our lower bound is a new positive result in compressed sensing. We show that standard sparse random designs are with high probability robust to adversarial measurement erasures, in the sense that if $b$ measurements are erased, then all but $O(b)$ of the coordinates of the signal are still information-theoretically identifiable. To our knowledge, this is the first time that partial recoverability of arbitrary sparse signals under erasures has been studied in compressed sensing.
△ Less
Submitted 5 March, 2022;
originally announced March 2022.
-
Planning in Observable POMDPs in Quasipolynomial Time
Authors:
Noah Golowich,
Ankur Moitra,
Dhruv Rohatgi
Abstract:
Partially Observable Markov Decision Processes (POMDPs) are a natural and general model in reinforcement learning that take into account the agent's uncertainty about its current state. In the literature on POMDPs, it is customary to assume access to a planning oracle that computes an optimal policy when the parameters are known, even though the problem is known to be computationally hard. Almost…
▽ More
Partially Observable Markov Decision Processes (POMDPs) are a natural and general model in reinforcement learning that take into account the agent's uncertainty about its current state. In the literature on POMDPs, it is customary to assume access to a planning oracle that computes an optimal policy when the parameters are known, even though the problem is known to be computationally hard. Almost all existing planning algorithms either run in exponential time, lack provable performance guarantees, or require placing strong assumptions on the transition dynamics under every possible policy. In this work, we revisit the planning problem and ask: are there natural and well-motivated assumptions that make planning easy?
Our main result is a quasipolynomial-time algorithm for planning in (one-step) observable POMDPs. Specifically, we assume that well-separated distributions on states lead to well-separated distributions on observations, and thus the observations are at least somewhat informative in each step. Crucially, this assumption places no restrictions on the transition dynamics of the POMDP; nevertheless, it implies that near-optimal policies admit quasi-succinct descriptions, which is not true in general (under standard hardness assumptions). Our analysis is based on new quantitative bounds for filter stability -- i.e. the rate at which an optimal filter for the latent state forgets its initialization. Furthermore, we prove matching hardness for planning in observable POMDPs under the Exponential Time Hypothesis.
△ Less
Submitted 23 March, 2022; v1 submitted 12 January, 2022;
originally announced January 2022.
-
On the Power of Preconditioning in Sparse Linear Regression
Authors:
Jonathan Kelner,
Frederic Koehler,
Raghu Meka,
Dhruv Rohatgi
Abstract:
Sparse linear regression is a fundamental problem in high-dimensional statistics, but strikingly little is known about how to efficiently solve it without restrictive conditions on the design matrix. We consider the (correlated) random design setting, where the covariates are independently drawn from a multivariate Gaussian $N(0,Σ)$ with $Σ: n \times n$, and seek estimators $\hat{w}$ minimizing…
▽ More
Sparse linear regression is a fundamental problem in high-dimensional statistics, but strikingly little is known about how to efficiently solve it without restrictive conditions on the design matrix. We consider the (correlated) random design setting, where the covariates are independently drawn from a multivariate Gaussian $N(0,Σ)$ with $Σ: n \times n$, and seek estimators $\hat{w}$ minimizing $(\hat{w}-w^*)^TΣ(\hat{w}-w^*)$, where $w^*$ is the $k$-sparse ground truth. Information theoretically, one can achieve strong error bounds with $O(k \log n)$ samples for arbitrary $Σ$ and $w^*$; however, no efficient algorithms are known to match these guarantees even with $o(n)$ samples, without further assumptions on $Σ$ or $w^*$. As far as hardness, computational lower bounds are only known with worst-case design matrices. Random-design instances are known which are hard for the Lasso, but these instances can generally be solved by Lasso after a simple change-of-basis (i.e. preconditioning).
In this work, we give upper and lower bounds clarifying the power of preconditioning in sparse linear regression. First, we show that the preconditioned Lasso can solve a large class of sparse linear regression problems nearly optimally: it succeeds whenever the dependency structure of the covariates, in the sense of the Markov property, has low treewidth -- even if $Σ$ is highly ill-conditioned. Second, we construct (for the first time) random-design instances which are provably hard for an optimally preconditioned Lasso. In fact, we complete our treewidth classification by proving that for any treewidth-$t$ graph, there exists a Gaussian Markov Random Field on this graph such that the preconditioned Lasso, with any choice of preconditioner, requires $Ω(t^{1/20})$ samples to recover $O(\log n)$-sparse signals when covariates are drawn from this model.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
Truncated Linear Regression in High Dimensions
Authors:
Constantinos Daskalakis,
Dhruv Rohatgi,
Manolis Zampetakis
Abstract:
As in standard linear regression, in truncated linear regression, we are given access to observations $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_i^{\rm T} \cdot x^* + η_i$, where $x^*$ is some fixed unknown vector of interest and $η_i$ is independent noise; except we are only given an observation if its dependent variable $y_i$ lies in some "truncation set" $S \subset \mathbb{R}$. The…
▽ More
As in standard linear regression, in truncated linear regression, we are given access to observations $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_i^{\rm T} \cdot x^* + η_i$, where $x^*$ is some fixed unknown vector of interest and $η_i$ is independent noise; except we are only given an observation if its dependent variable $y_i$ lies in some "truncation set" $S \subset \mathbb{R}$. The goal is to recover $x^*$ under some favorable conditions on the $A_i$'s and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x^*$ from $m$ truncated samples, which attains an optimal $\ell_2$ reconstruction error of $O(\sqrt{(k \log n)/m})$. As a corollary, our guarantees imply a computationally efficient and information-theoretically optimal algorithm for compressed sensing with truncation, which may arise from measurement saturation effects. Our result follows from a statistical and computational analysis of the Stochastic Gradient Descent (SGD) algorithm for solving a natural adaptation of the LASSO optimization problem that accommodates truncation. This generalizes the works of both: (1) [Daskalakis et al. 2018], where no regularization is needed due to the low-dimensionality of the data, and (2) [Wainright 2009], where the objective function is simple due to the absence of truncation. In order to deal with both truncation and high-dimensionality at the same time, we develop new techniques that not only generalize the existing ones but we believe are of independent interest.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Constant-Expansion Suffices for Compressed Sensing with Generative Priors
Authors:
Constantinos Daskalakis,
Dhruv Rohatgi,
Manolis Zampetakis
Abstract:
Generative neural networks have been empirically found very promising in providing effective structural priors for compressed sensing, since they can be trained to span low-dimensional data manifolds in high-dimensional signal spaces. Despite the non-convexity of the resulting optimization problem, it has also been shown theoretically that, for neural networks with random Gaussian weights, a signa…
▽ More
Generative neural networks have been empirically found very promising in providing effective structural priors for compressed sensing, since they can be trained to span low-dimensional data manifolds in high-dimensional signal spaces. Despite the non-convexity of the resulting optimization problem, it has also been shown theoretically that, for neural networks with random Gaussian weights, a signal in the range of the network can be efficiently, approximately recovered from a few noisy measurements. However, a major bottleneck of these theoretical guarantees is a network expansivity condition: that each layer of the neural network must be larger than the previous by a logarithmic factor. Our main contribution is to break this strong expansivity assumption, showing that constant expansivity suffices to get efficient recovery algorithms, besides it also being information-theoretically necessary. To overcome the theoretical bottleneck in existing approaches we prove a novel uniform concentration theorem for random functions that might not be Lipschitz but satisfy a relaxed notion which we call "pseudo-Lipschitzness." Using this theorem we can show that a matrix concentration inequality known as the Weight Distribution Condition (WDC), which was previously only known to hold for Gaussian matrices with logarithmic aspect ratio, in fact holds for constant aspect ratios too. Since the WDC is a fundamental matrix concentration inequality in the heart of all existing theoretical guarantees on this problem, our tighter bound immediately yields improvements in all known results in the literature on compressed sensing with deep generative priors, including one-bit recovery, phase retrieval, low-rank matrix recovery, and more.
△ Less
Submitted 26 June, 2020; v1 submitted 7 June, 2020;
originally announced June 2020.
-
Regarding two conjectures on clique and biclique partitions
Authors:
Dhruv Rohatgi,
John C. Urschel,
Jake Wellens
Abstract:
For a graph $G$, let $cp(G)$ denote the minimum number of cliques of $G$ needed to cover the edges of $G$ exactly once. Similarly, let $bp_k(G)$ denote the minimum number of bicliques (i.e. complete bipartite subgraphs of $G$) needed to cover each edge of $G$ exactly $k$ times. We consider two conjectures -- one regarding the maximum possible value of $cp(G) + cp(\overline{G})$ (due to de Caen, Er…
▽ More
For a graph $G$, let $cp(G)$ denote the minimum number of cliques of $G$ needed to cover the edges of $G$ exactly once. Similarly, let $bp_k(G)$ denote the minimum number of bicliques (i.e. complete bipartite subgraphs of $G$) needed to cover each edge of $G$ exactly $k$ times. We consider two conjectures -- one regarding the maximum possible value of $cp(G) + cp(\overline{G})$ (due to de Caen, Erdős, Pullman and Wormald) and the other regarding $bp_k(K_n)$ (due to de Caen, Gregory and Pritikin). We disprove the first, obtaining improved lower and upper bounds on $\max_G cp(G) + cp(\overline{G})$, and we prove an asymptotic version of the second, showing that $bp_k(K_n) = (1+o(1))n$.
△ Less
Submitted 5 May, 2020;
originally announced May 2020.
-
Off-diagonal ordered Ramsey numbers of matchings
Authors:
Dhruv Rohatgi
Abstract:
For ordered graphs $G$ and $H$, the ordered Ramsey number $r_<(G,H)$ is the smallest $n$ such that every red/blue edge coloring of the complete graph on vertices $\{1,\dots,n\}$ contains either a blue copy of $G$ or a red copy of $H$, where the embedding must preserve the relative order of vertices. One number of interest, first studied by Conlon, Fox, Lee, and Sudakov, is the "off-diagonal" order…
▽ More
For ordered graphs $G$ and $H$, the ordered Ramsey number $r_<(G,H)$ is the smallest $n$ such that every red/blue edge coloring of the complete graph on vertices $\{1,\dots,n\}$ contains either a blue copy of $G$ or a red copy of $H$, where the embedding must preserve the relative order of vertices. One number of interest, first studied by Conlon, Fox, Lee, and Sudakov, is the "off-diagonal" ordered Ramsey number $r_<(M, K_3)$, where $M$ is an ordered matching on $n$ vertices. In particular, Conlon et al. asked what asymptotic bounds (in $n$) can be obtained for $\max r_<(M, K_3)$, where the maximum is over all ordered matchings $M$ on $n$ vertices. The best-known upper bound is $O(n^2/\log n)$, whereas the best-known lower bound is $Ω((n/\log n)^{4/3})$, and Conlon et al. hypothesize that $r_<(M, K_3) = O(n^{2-ε})$ for every ordered matching $M$. We resolve two special cases of this conjecture. We show that the off-diagonal ordered Ramsey numbers for matchings in which edges do not cross are nearly linear. We also prove a truly sub-quadratic upper bound for random matchings with interval chromatic number $2$.
△ Less
Submitted 12 August, 2018;
originally announced August 2018.
-
When Two-Holed Torus Graphs are Hamiltonian
Authors:
Dhruv Rohatgi
Abstract:
Trotter and Erdös found conditions for when a directed $m \times n$ grid graph on a torus is Hamiltonian. We consider the analogous graphs on a two-holed torus, and study their Hamiltonicity. We find an $\mathcal{O}(n^4)$ algorithm to determine the Hamiltonicity of one of these graphs and an $\mathcal{O}(\log(n))$ algorithm to find the number of diagonals, which are sets of vertices that force the…
▽ More
Trotter and Erdös found conditions for when a directed $m \times n$ grid graph on a torus is Hamiltonian. We consider the analogous graphs on a two-holed torus, and study their Hamiltonicity. We find an $\mathcal{O}(n^4)$ algorithm to determine the Hamiltonicity of one of these graphs and an $\mathcal{O}(\log(n))$ algorithm to find the number of diagonals, which are sets of vertices that force the directions of edges in any Hamiltonian cycle. We also show that there is a periodicity pattern in the graphs' Hamiltonicities if one of the sides of the grid is fixed; and we completely classify which graphs are Hamiltonian in the cases where $n=m$, $n=2$, the $m \times n$ graph has $1$ diagonal, or the $\frac{m}{2} \times \frac{n}{2}$ graph has $1$ diagonal.
△ Less
Submitted 5 September, 2016;
originally announced September 2016.