Search | arXiv e-print repository

Imperfect-Recall Games: Equilibrium Concepts and Their Complexity

Authors: Emanuel Tewolde, Brian Hu Zhang, Caspar Oesterheld, Manolis Zampetakis, Tuomas Sandholm, Paul W. Goldberg, Vincent Conitzer

Abstract: We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer se… ▽ More We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. An example is the absentminded driver game, as well as team games in which the members have limited communication capabilities. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings across three different solution concepts: Nash, multiselves based on evidential decision theory (EDT), and multiselves based on causal decision theory (CDT). We are interested in both exact and approximate solution computation. As special cases, we consider (1) single-player games, (2) two-player zero-sum games and relationships to maximin values, and (3) games without exogenous stochasticity (chance nodes). We relate these problems to the complexity classes P, PPAD, PLS, $Σ_2^P$ , $\exists$R, and $\exists \forall$R. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: Long version of the paper that got accepted to the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI 2024). 35 pages, 10 figures, 1 table

MSC Class: 91A05; 91A06; 91A10; 91A11; 91A18; 91A35; 91A68; 68T37; 68Q17; 68Q25 ACM Class: I.2; J.4; F.2

arXiv:2406.05660 [pdf, other]

Injecting Undetectable Backdoors in Deep Learning and Language Models

Authors: Alkis Kalavasis, Amin Karbasi, Argyris Oikonomou, Katerina Sotiraki, Grigoris Velegkas, Manolis Zampetakis

Abstract: As ML models become increasingly complex and integral to high-stakes domains such as finance and healthcare, they also become more susceptible to sophisticated adversarial attacks. We investigate the threat posed by undetectable backdoors in models developed by insidious external expert firms. When such backdoors exist, they allow the designer of the model to sell information to the users on how t… ▽ More As ML models become increasingly complex and integral to high-stakes domains such as finance and healthcare, they also become more susceptible to sophisticated adversarial attacks. We investigate the threat posed by undetectable backdoors in models developed by insidious external expert firms. When such backdoors exist, they allow the designer of the model to sell information to the users on how to carefully perturb the least significant bits of their input to change the classification outcome to a favorable one. We develop a general strategy to plant a backdoor to neural networks while ensuring that even if the model's weights and architecture are accessible, the existence of the backdoor is still undetectable. To achieve this, we utilize techniques from cryptography such as cryptographic signatures and indistinguishability obfuscation. We further introduce the notion of undetectable backdoors to language models and extend our neural network backdoor attacks to such models based on the existence of steganographic functions. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2403.11963 [pdf, other]

Transfer Learning Beyond Bounded Density Ratios

Authors: Alkis Kalavasis, Ilias Zadik, Manolis Zampetakis

Abstract: We study the fundamental problem of transfer learning where a learning algorithm collects data from some source distribution $P$ but needs to perform well with respect to a different target distribution $Q$. A standard change of measure argument implies that transfer learning happens when the density ratio $dQ/dP$ is bounded. Yet, prior thought-provoking works by Kpotufe and Martinet (COLT, 2018)… ▽ More We study the fundamental problem of transfer learning where a learning algorithm collects data from some source distribution $P$ but needs to perform well with respect to a different target distribution $Q$. A standard change of measure argument implies that transfer learning happens when the density ratio $dQ/dP$ is bounded. Yet, prior thought-provoking works by Kpotufe and Martinet (COLT, 2018) and Hanneke and Kpotufe (NeurIPS, 2019) demonstrate cases where the ratio $dQ/dP$ is unbounded, but transfer learning is possible. In this work, we focus on transfer learning over the class of low-degree polynomial estimators. Our main result is a general transfer inequality over the domain $\mathbb{R}^n$, proving that non-trivial transfer learning for low-degree polynomials is possible under very mild assumptions, going well beyond the classical assumption that $dQ/dP$ is bounded. For instance, it always applies if $Q$ is a log-concave measure and the inverse ratio $dP/dQ$ is bounded. To demonstrate the applicability of our inequality, we obtain new results in the settings of: (1) the classical truncated regression setting, where $dQ/dP$ equals infinity, and (2) the more recent out-of-distribution generalization setting for in-context learning linear functions with transformers. We also provide a discrete analogue of our transfer inequality on the Boolean Hypercube $\{-1,1\}^n$, and study its connections with the recent problem of Generalization on the Unseen of Abbe, Bengio, Lotfi and Rizk (ICML, 2023). Our main conceptual contribution is that the maximum influence of the error of the estimator $\widehat{f}-f^*$ under $Q$, $\mathrm{I}_{\max}(\widehat{f}-f^*)$, acts as a sufficient condition for transferability; when $\mathrm{I}_{\max}(\widehat{f}-f^*)$ is appropriately bounded, transfer is possible over the Boolean domain. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: Abstract shortened to fit ArXiv requirements

arXiv:2312.02119 [pdf, other]

Tree of Attacks: Jailbreaking Black-Box LLMs Automatically

Authors: Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer, Amin Karbasi

Abstract: While Large Language Models (LLMs) display versatile functionality, they continue to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human-designed jailbreaks. In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM. TAP utilizes an LLM to iteratively refine ca… ▽ More While Large Language Models (LLMs) display versatile functionality, they continue to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human-designed jailbreaks. In this work, we present Tree of Attacks with Pruning (TAP), an automated method for generating jailbreaks that only requires black-box access to the target LLM. TAP utilizes an LLM to iteratively refine candidate (attack) prompts using tree-of-thought reasoning until one of the generated prompts jailbreaks the target. Crucially, before sending prompts to the target, TAP assesses them and prunes the ones unlikely to result in jailbreaks. Using tree-of-thought reasoning allows TAP to navigate a large search space of prompts and pruning reduces the total number of queries sent to the target. In empirical evaluations, we observe that TAP generates prompts that jailbreak state-of-the-art LLMs (including GPT4 and GPT4-Turbo) for more than 80% of the prompts using only a small number of queries. Interestingly, TAP is also capable of jailbreaking LLMs protected by state-of-the-art guardrails, e.g., LlamaGuard. This significantly improves upon the previous state-of-the-art black-box method for generating jailbreaks. △ Less

Submitted 21 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: An implementation of the presented method is available at https://github.com/RICommunity/TAP

arXiv:2311.14869 [pdf, ps, other]

On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games

Authors: Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm, Manolis Zampetakis

Abstract: Characterizing the performance of no-regret dynamics in multi-player games is a foundational problem at the interface of online learning and game theory. Recent results have revealed that when all players adopt specific learning algorithms, it is possible to improve exponentially over what is predicted by the overly pessimistic no-regret framework in the traditional adversarial regime, thereby lea… ▽ More Characterizing the performance of no-regret dynamics in multi-player games is a foundational problem at the interface of online learning and game theory. Recent results have revealed that when all players adopt specific learning algorithms, it is possible to improve exponentially over what is predicted by the overly pessimistic no-regret framework in the traditional adversarial regime, thereby leading to faster convergence to the set of coarse correlated equilibria (CCE). Yet, despite considerable recent progress, the fundamental complexity barriers for learning in normal- and extensive-form games are poorly understood. In this paper, we make a step towards closing this gap by first showing that -- barring major complexity breakthroughs -- any polynomial-time learning algorithms in extensive-form games need at least $2^{\log^{1/2 - o(1)} |\mathcal{T}|}$ iterations for the average regret to reach below even an absolute constant, where $|\mathcal{T}|$ is the number of nodes in the game. This establishes a superpolynomial separation between no-regret learning in normal- and extensive-form games, as in the former class a logarithmic number of iterations suffices to achieve constant average regret. Furthermore, our results imply that algorithms such as multiplicative weights update, as well as its \emph{optimistic} counterpart, require at least $2^{(\log \log m)^{1/2 - o(1)}}$ iterations to attain an $O(1)$-CCE in $m$-action normal-form games. These are the first non-trivial -- and dimension-dependent -- lower bounds in that setting for the most well-studied algorithms in the literature. From a technical standpoint, we follow a beautiful connection recently made by Foster, Golowich, and Kakade (ICML '23) between sparse CCE and Nash equilibria in the context of Markov games. Consequently, our lower bounds rule out polynomial-time algorithms well beyond the traditional online learning framework. △ Less

Submitted 24 November, 2023; originally announced November 2023.

Comments: To appear at ITCS 2024

arXiv:2310.14113 [pdf, other]

Sorting from Crowdsourced Comparisons using Expert Verifications

Authors: Ellen Vitercik, Manolis Zampetakis, David Zhang

Abstract: We introduce a novel noisy sorting model motivated by the Just Noticeable Difference (JND) model from experimental psychology. The goal of our model is to capture the low quality of the data that are collected from crowdsourcing environments. Compared to other celebrated models of noisy sorting, our model does not rely on precise data-generation assumptions and captures crowdsourced tasks' varying… ▽ More We introduce a novel noisy sorting model motivated by the Just Noticeable Difference (JND) model from experimental psychology. The goal of our model is to capture the low quality of the data that are collected from crowdsourcing environments. Compared to other celebrated models of noisy sorting, our model does not rely on precise data-generation assumptions and captures crowdsourced tasks' varying levels of difficulty that can lead to different amounts of noise in the data. To handle this challenging task, we assume we can verify some of the collected data using expert advice. This verification procedure is costly; hence, we aim to minimize the number of verifications we use. We propose a new efficient algorithm called CandidateSort, which we prove uses the optimal number of verifications in the noisy sorting models we consider. We characterize this optimal number of verifications by showing that it is linear in a parameter $k$, which intuitively measures the maximum number of comparisons that are wrong but not inconsistent in the crowdsourcing data. △ Less

Submitted 21 October, 2023; originally announced October 2023.

arXiv:2310.09974 [pdf, other]

Algorithmic Contract Design for Crowdsourced Ranking

Authors: Kiriaki Frangias, Andrew Lin, Ellen Vitercik, Manolis Zampetakis

Abstract: Ranking is fundamental to many areas, such as search engine optimization, human feedback for language models, as well as peer grading. Crowdsourcing, which is often used for these tasks, requires proper incentivization to ensure accurate inputs. In this work, we draw on the field of \emph{contract theory} from Economics to propose a novel mechanism that enables a \emph{principal} to accurately ran… ▽ More Ranking is fundamental to many areas, such as search engine optimization, human feedback for language models, as well as peer grading. Crowdsourcing, which is often used for these tasks, requires proper incentivization to ensure accurate inputs. In this work, we draw on the field of \emph{contract theory} from Economics to propose a novel mechanism that enables a \emph{principal} to accurately rank a set of items by incentivizing agents to provide pairwise comparisons of the items. Our mechanism implements these incentives by verifying a subset of each agent's comparisons, a task we assume to be costly. The agent is compensated (for example, monetarily or with class credit) based on the accuracy of these comparisons. Our mechanism achieves the following guarantees: (1) it only requires the principal to verify $O(\log s)$ comparisons, where $s$ is the total number of agents, and (2) it provably achieves higher total utility for the principal compared to ranking the items herself with no crowdsourcing. △ Less

Submitted 24 January, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

arXiv:2310.09157 [pdf, other]

The Computational Complexity of Finding Stationary Points in Non-Convex Optimization

Authors: Alexandros Hollender, Manolis Zampetakis

Abstract: Finding approximate stationary points, i.e., points where the gradient is approximately zero, of non-convex but smooth objective functions $f$ over unrestricted $d$-dimensional domains is one of the most fundamental problems in classical non-convex optimization. Nevertheless, the computational and query complexity of this problem are still not well understood when the dimension $d$ of the problem… ▽ More Finding approximate stationary points, i.e., points where the gradient is approximately zero, of non-convex but smooth objective functions $f$ over unrestricted $d$-dimensional domains is one of the most fundamental problems in classical non-convex optimization. Nevertheless, the computational and query complexity of this problem are still not well understood when the dimension $d$ of the problem is independent of the approximation error. In this paper, we show the following computational and query complexity results: 1. The problem of finding approximate stationary points over unrestricted domains is PLS-complete. 2. For $d = 2$, we provide a zero-order algorithm for finding $\varepsilon$-approximate stationary points that requires at most $O(1/\varepsilon)$ value queries to the objective function. 3. We show that any algorithm needs at least $Ω(1/\varepsilon)$ queries to the objective function and/or its gradient to find $\varepsilon$-approximate stationary points when $d=2$. Combined with the above, this characterizes the query complexity of this problem to be $Θ(1/\varepsilon)$. 4. For $d = 2$, we provide a zero-order algorithm for finding $\varepsilon$-KKT points in constrained optimization problems that requires at most $O(1/\sqrt{\varepsilon})$ value queries to the objective function. This closes the gap between the works of Bubeck and Mikulincer [2020] and Vavasis [1993] and characterizes the query complexity of this problem to be $Θ(1/\sqrt{\varepsilon})$. 5. Combining our results with the recent result of Fearnley et al. [2022], we show that finding approximate KKT points in constrained optimization is reducible to finding approximate stationary points in unconstrained optimization but the converse is impossible. △ Less

Submitted 13 October, 2023; originally announced October 2023.

Comments: Full version of COLT 2023 extended abstract

arXiv:2310.03504 [pdf, other]

Interplay of Space Charge and Intra-Beam Scattering in the LHC ion injector chain

Authors: Michail Zampetakis, Fanouria Antoniou, Foteini Asvesta, Hannes Bartosik, Yannis Papaphilippou, Angela Saá Hernández

Abstract: The ion injectors of the CERN accelerator chain, in particular the Super Proton Synchrotron (SPS) and the Low Energy Ion Ring (LEIR), operate in a strong Space Charge (SC) and Intra-Beam Scattering (IBS) regime, which can degrade beam quality. Optimizing the ion beam performance requires thus to study the interplay of these two effects in tracking simulations by incorporating both SC and IBS effec… ▽ More The ion injectors of the CERN accelerator chain, in particular the Super Proton Synchrotron (SPS) and the Low Energy Ion Ring (LEIR), operate in a strong Space Charge (SC) and Intra-Beam Scattering (IBS) regime, which can degrade beam quality. Optimizing the ion beam performance requires thus to study the interplay of these two effects in tracking simulations by incorporating both SC and IBS effects interleaved with lattice non-linearities. In this respect, the kinetic theory approach of treating IBS effects has been deployed. A new, modified approach has been introduced using the formalism of the Bjorken and Mtingwa model and the complete integrals of the second kind for faster numerical evaluation. This IBS kick is implemented in PyORBIT and extensive benchmarking cases against analytical models are shown. Results of combined space charge and intra-beam scattering simulations for the SPS and LEIR are presented and compared with observations from beam measurements. △ Less

Submitted 18 October, 2023; v1 submitted 5 October, 2023; originally announced October 2023.

Comments: 13 pages, 13 figures, to be submitted to Physical Review Accelerators and Beams

arXiv:2308.02196 [pdf, other]

doi 10.1103/PhysRevAccelBeams.27.064403

Interplay of space charge, intrabeam scattering and synchrotron radiation in the Compact Linear Collider dam** rings

Authors: Michail Zampetakis, Fanouria Antoniou, Foteini Asvesta, Hannes Bartosik, Yannis Papaphilippou

Abstract: Future ultra-low emittance rings for electron/positron colliders require extremely high beam brightness and can thus be limited by collective effects. In this paper, the interplay of effects such as synchrotron radiation, intra-beam scattering (IBS) and space charge in the vicinity of excited betatron resonances is assessed. In this respect, two algorithms were developed to simulate IBS and synchr… ▽ More Future ultra-low emittance rings for electron/positron colliders require extremely high beam brightness and can thus be limited by collective effects. In this paper, the interplay of effects such as synchrotron radiation, intra-beam scattering (IBS) and space charge in the vicinity of excited betatron resonances is assessed. In this respect, two algorithms were developed to simulate IBS and synchrotron radiation effects and integrated in the PyORBIT tracking code, to be combined with its widely used space charge module. The impact of these effects on the achievable beam parameters of the Compact Linear Collider (CLIC) Dam** Rings was studied, showing that synchrotron radiation dam** mitigates the adverse effects of IBS and space charge induced resonance crossing. The studies include also a full dynamic simulation of the CLIC dam** ring cycle starting from the injection beam parameters. It is demonstrated that a careful working point choice is necessary, in order to accommodate the transition from a non-linear lattice induced detuning to a space-charge dominated one and thereby avoid excessive losses and emittance growth generated in the vicinity of strong resonances. △ Less

Submitted 5 May, 2024; v1 submitted 4 August, 2023; originally announced August 2023.

Comments: To be published in Physical Review Accelerators and Beams

arXiv:2302.08300 [pdf, ps, other]

Deterministic Nonsmooth Nonconvex Optimization

Authors: Michael I. Jordan, Guy Kornowski, Tianyi Lin, Ohad Shamir, Manolis Zampetakis

Abstract: We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing $(δ,ε)$-stationary points. Several recent works have presented randomized algorithms that produce such points using $\tilde O(δ^{-1}ε^{-3})$ first-order oracle calls, independent of the dimension $d$. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We… ▽ More We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing $(δ,ε)$-stationary points. Several recent works have presented randomized algorithms that produce such points using $\tilde O(δ^{-1}ε^{-3})$ first-order oracle calls, independent of the dimension $d$. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We resolve this open problem, showing that randomization is necessary to obtain a dimension-free rate. In particular, we prove a lower bound of $Ω(d)$ for any deterministic algorithm. Moreover, we show that unlike smooth or convex optimization, access to function values is required for any deterministic algorithm to halt within any finite time. On the other hand, we prove that if the function is even slightly smooth, then the dimension-free rate of $\tilde O(δ^{-1}ε^{-3})$ can be obtained by a deterministic algorithm with merely a logarithmic dependence on the smoothness parameter. Motivated by these findings, we turn to study the complexity of deterministically smoothing Lipschitz functions. Though there are efficient black-box randomized smoothings, we start by showing that no such deterministic procedure can smooth functions in a meaningful manner, resolving an open question. We then bypass this impossibility result for the structured case of ReLU neural networks. To that end, in a practical white-box setting in which the optimizer is granted access to the network's architecture, we propose a simple, dimension-free, deterministic smoothing that provably preserves $(δ,ε)$-stationary points. Our method applies to a variety of architectures of arbitrary depth, including ResNets and ConvNets. Combined with our algorithm, this yields the first deterministic dimension-free algorithm for optimizing ReLU networks, circumventing our lower bound. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: This work supersedes arxiv:2209.12463 and arxiv:2209.10346[Section 3], with major additional results

arXiv:2210.13313 [pdf, ps, other]

Learning and Covering Sums of Independent Random Variables with Unbounded Support

Authors: Alkis Kalavasis, Konstantinos Stavropoulos, Manolis Zampetakis

Abstract: We study the problem of covering and learning sums $X = X_1 + \cdots + X_n$ of independent integer-valued random variables $X_i$ (SIIRVs) with unbounded, or even infinite, support. De et al. at FOCS 2018, showed that the maximum value of the collective support of $X_i$'s necessarily appears in the sample complexity of learning $X$. In this work, we address two questions: (i) Are there general fami… ▽ More We study the problem of covering and learning sums $X = X_1 + \cdots + X_n$ of independent integer-valued random variables $X_i$ (SIIRVs) with unbounded, or even infinite, support. De et al. at FOCS 2018, showed that the maximum value of the collective support of $X_i$'s necessarily appears in the sample complexity of learning $X$. In this work, we address two questions: (i) Are there general families of SIIRVs with unbounded support that can be learned with sample complexity independent of both $n$ and the maximal element of the support? (ii) Are there general families of SIIRVs with unbounded support that admit proper sparse covers in total variation distance? As for question (i), we provide a set of simple conditions that allow the unbounded SIIRV to be learned with complexity $\text{poly}(1/ε)$ bypassing the aforementioned lower bound. We further address question (ii) in the general setting where each variable $X_i$ has unimodal probability mass function and is a different member of some, possibly multi-parameter, exponential family $\mathcal{E}$ that satisfies some structural properties. These properties allow $\mathcal{E}$ to contain heavy tailed and non log-concave distributions. Moreover, we show that for every $ε> 0$, and every $k$-parameter family $\mathcal{E}$ that satisfies some structural assumptions, there exists an algorithm with $\tilde{O}(k) \cdot \text{poly}(1/ε)$ samples that learns a sum of $n$ arbitrary members of $\mathcal{E}$ within $ε$ in TV distance. The output of the learning algorithm is also a sum of random variables whose distribution lies in the family $\mathcal{E}$. En route, we prove that any discrete unimodal exponential family with bounded constant-degree central moments can be approximated by the family corresponding to a bounded subset of the initial (unbounded) parameter space. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: 60 pages, 0 figures. Accepted to the Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2210.09769 [pdf, other]

STay-ON-the-Ridge: Guaranteed Convergence to Local Minimax Equilibrium in Nonconvex-Nonconcave Games

Authors: Constantinos Daskalakis, Noah Golowich, Stratis Skoulakis, Manolis Zampetakis

Abstract: Min-max optimization problems involving nonconvex-nonconcave objectives have found important applications in adversarial training and other multi-agent learning settings. Yet, no known gradient descent-based method is guaranteed to converge to (even local notions of) min-max equilibrium in the nonconvex-nonconcave setting. For all known methods, there exist relatively simple objectives for which t… ▽ More Min-max optimization problems involving nonconvex-nonconcave objectives have found important applications in adversarial training and other multi-agent learning settings. Yet, no known gradient descent-based method is guaranteed to converge to (even local notions of) min-max equilibrium in the nonconvex-nonconcave setting. For all known methods, there exist relatively simple objectives for which they cycle or exhibit other undesirable behavior different from converging to a point, let alone to some game-theoretically meaningful one~\cite{flokas2019poincare,hsieh2021limits}. The only known convergence guarantees hold under the strong assumption that the initialization is very close to a local min-max equilibrium~\cite{wang2019solving}. Moreover, the afore-described challenges are not just theoretical curiosities. All known methods are unstable in practice, even in simple settings. We propose the first method that is guaranteed to converge to a local min-max equilibrium for smooth nonconvex-nonconcave objectives. Our method is second-order and provably escapes limit cycles as long as it is initialized at an easy-to-find initial point. Both the definition of our method and its convergence analysis are motivated by the topological nature of the problem. In particular, our method is not designed to decrease some potential function, such as the distance of its iterate from the set of local min-max equilibria or the projected gradient of the objective, but is designed to satisfy a topological property that guarantees the avoidance of cycles and implies its convergence. △ Less

Submitted 18 October, 2022; originally announced October 2022.

arXiv:2209.12463 [pdf, ps, other]

On the Complexity of Deterministic Nonsmooth and Nonconvex Optimization

Authors: Michael I. Jordan, Tianyi Lin, Manolis Zampetakis

Abstract: In this paper, we present several new results on minimizing a nonsmooth and nonconvex function under a Lipschitz condition. Recent work shows that while the classical notion of Clarke stationarity is computationally intractable up to some sufficiently small constant tolerance, the randomized first-order algorithms find a $(δ, ε)$-Goldstein stationary point with the complexity bound of… ▽ More In this paper, we present several new results on minimizing a nonsmooth and nonconvex function under a Lipschitz condition. Recent work shows that while the classical notion of Clarke stationarity is computationally intractable up to some sufficiently small constant tolerance, the randomized first-order algorithms find a $(δ, ε)$-Goldstein stationary point with the complexity bound of $\tilde{O}(δ^{-1}ε^{-3})$, which is independent of dimension $d \geq 1$~\citep{Zhang-2020-Complexity, Davis-2022-Gradient, Tian-2022-Finite}. However, the deterministic algorithms have not been fully explored, leaving open several problems in nonsmooth nonconvex optimization. Our first contribution is to demonstrate that the randomization is \textit{necessary} to obtain a dimension-independent guarantee, by proving a lower bound of $Ω(d)$ for any deterministic algorithm that has access to both $1^{st}$ and $0^{th}$ oracles. Furthermore, we show that the $0^{th}$ oracle is \textit{essential} to obtain a finite-time convergence guarantee, by showing that any deterministic algorithm with only the $1^{st}$ oracle is not able to find an approximate Goldstein stationary point within a finite number of iterations up to sufficiently small constant parameter and tolerance. Finally, we propose a deterministic smoothing approach under the \textit{arithmetic circuit} model where the resulting smoothness parameter is exponential in a certain parameter $M > 0$ (e.g., the number of nodes in the representation of the function), and design a new deterministic first-order algorithm that achieves a dimension-independent complexity bound of $\tilde{O}(Mδ^{-1}ε^{-3})$. △ Less

Submitted 4 November, 2022; v1 submitted 26 September, 2022; originally announced September 2022.

Comments: 28 Pages; Fix an error and add relevant references

arXiv:2208.12042 [pdf, other]

Efficient Truncated Linear Regression with Unknown Noise Variance

Authors: Constantinos Daskalakis, Patroklos Stefanou, Rui Yao, Manolis Zampetakis

Abstract: Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in it… ▽ More Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in its general form, since the early works of~\citet{tobin1958estimation,amemiya1973regression}. When the distribution of the error is normal with known variance, recent work of~\citet{daskalakis2019truncatedregression} provides computationally and statistically efficient estimators of the linear model, $w$. In this paper, we provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise. Our estimator is based on an efficient implementation of Projected Stochastic Gradient Descent on the negative log-likelihood of the truncated sample. Importantly, we show that the error of our estimates is asymptotically normal, and we use this to provide explicit confidence regions for our estimates. △ Less

Submitted 25 August, 2022; originally announced August 2022.

arXiv:2207.07557 [pdf, other]

The Computational Complexity of Multi-player Concave Games and Kakutani Fixed Points

Authors: Christos H. Papadimitriou, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Manolis Zampetakis

Abstract: Kakutani's Fixed Point theorem is a fundamental theorem in topology with numerous applications in game theory and economics. Computational formulations of Kakutani exist only in special cases and are too restrictive to be useful in reductions. In this paper, we provide a general computational formulation of Kakutani's Fixed Point Theorem and we prove that it is PPAD-complete. As an application of… ▽ More Kakutani's Fixed Point theorem is a fundamental theorem in topology with numerous applications in game theory and economics. Computational formulations of Kakutani exist only in special cases and are too restrictive to be useful in reductions. In this paper, we provide a general computational formulation of Kakutani's Fixed Point Theorem and we prove that it is PPAD-complete. As an application of our theorem we are able to characterize the computational complexity of the following fundamental problems: (1) Concave Games. Introduced by the celebrated works of Debreu and Rosen in the 1950s and 60s, concave $n$-person games have found many important applications in Economics and Game Theory. We characterize the computational complexity of finding an equilibrium in such games. We show that a general formulation of this problem belongs to PPAD, and that finding an equilibrium is PPAD-hard even for a rather restricted games of this kind: strongly-concave utilities that can be expressed as multivariate polynomials of a constant degree with axis aligned box constraints. (2) Walrasian Equilibrium. Using Kakutani's fixed point Arrow and Debreu we resolve an open problem related to Walras's theorem on the existence of price equilibria in general economies. There are many results about the PPAD-hardness of Walrasian equilibria, but the inclusion in PPAD is only known for piecewise linear utilities. We show that the problem with general convex utilities is in PPAD. Along the way we provide a Lipschitz continuous version of Berge's maximum theorem that may be of independent interest. △ Less

Submitted 25 May, 2023; v1 submitted 15 July, 2022; originally announced July 2022.

arXiv:2205.03246 [pdf, other]

What Makes A Good Fisherman? Linear Regression under Self-Selection Bias

Authors: Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Andrew Ilyas, Manolis Zampetakis

Abstract: In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the output of one of $k$ underlying models on input $x^{(i)}$. In contrast to mixture models, where we observe the output of a randomly selected model, here the observed model depends on the outputs themselves, and is determined by some known selecti… ▽ More In the classical setting of self-selection, the goal is to learn $k$ models, simultaneously from observations $(x^{(i)}, y^{(i)})$ where $y^{(i)}$ is the output of one of $k$ underlying models on input $x^{(i)}$. In contrast to mixture models, where we observe the output of a randomly selected model, here the observed model depends on the outputs themselves, and is determined by some known selection criterion. For example, we might observe the highest output, the smallest output, or the median output of the $k$ models. In known-index self-selection, the identity of the observed model output is observable; in unknown-index self-selection, it is not. Self-selection has a long history in Econometrics and applications in various theoretical and applied fields, including treatment effect estimation, imitation learning, learning from strategically reported data, and learning from markets at disequilibrium. In this work, we present the first computationally and statistically efficient estimation algorithms for the most standard setting of this problem where the models are linear. In the known-index case, we require poly$(1/\varepsilon, k, d)$ sample and time complexity to estimate all model parameters to accuracy $\varepsilon$ in $d$ dimensions, and can accommodate quite general selection criteria. In the more challenging unknown-index case, even the identifiability of the linear models (from infinitely many samples) was not known. We show three results in this case for the commonly studied $\max$ self-selection criterion: (1) we show that the linear models are indeed identifiable, (2) for general $k$ we provide an algorithm with poly$(d) \exp(\text{poly}(k))$ sample and time complexity to estimate the regression parameters up to error $1/\text{poly}(k)$, and (3) for $k = 2$ we provide an algorithm for any error $\varepsilon$ and poly$(d, 1/\varepsilon)$ sample and time complexity. △ Less

Submitted 10 December, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

arXiv:2205.02060 [pdf, ps, other]

Estimation of Standard Auction Models

Authors: Yeshwanth Cherapanamjeri, Constantinos Daskalakis, Andrew Ilyas, Manolis Zampetakis

Abstract: We provide efficient estimation methods for first- and second-price auctions under independent (asymmetric) private values and partial observability. Given a finite set of observations, each comprising the identity of the winner and the price they paid in a sequence of identical auctions, we provide algorithms for non-parametrically estimating the bid distribution of each bidder, as well as their… ▽ More We provide efficient estimation methods for first- and second-price auctions under independent (asymmetric) private values and partial observability. Given a finite set of observations, each comprising the identity of the winner and the price they paid in a sequence of identical auctions, we provide algorithms for non-parametrically estimating the bid distribution of each bidder, as well as their value distributions under equilibrium assumptions. We provide finite-sample estimation bounds which are uniform in that their error rates do not depend on the bid/value distributions being estimated. Our estimation guarantees advance a body of work in Econometrics wherein only identification results have been obtained, unless the setting is symmetric, parametric, or all bids are observable. Our guarantees also provide computationally and statistically effective alternatives to classical techniques from reliability theory. Finally, our results are immediately applicable to Dutch and English auctions. △ Less

Submitted 4 May, 2022; originally announced May 2022.

arXiv:2204.03132 [pdf, ps, other]

First-Order Algorithms for Nonlinear Generalized Nash Equilibrium Problems

Authors: Michael I. Jordan, Tianyi Lin, Manolis Zampetakis

Abstract: We consider the problem of computing an equilibrium in a class of \textit{nonlinear generalized Nash equilibrium problems (NGNEPs)} in which the strategy sets for each player are defined by equality and inequality constraints that may depend on the choices of rival players. While the asymptotic global convergence and local convergence rates of algorithms to solve this problem have been extensively… ▽ More We consider the problem of computing an equilibrium in a class of \textit{nonlinear generalized Nash equilibrium problems (NGNEPs)} in which the strategy sets for each player are defined by equality and inequality constraints that may depend on the choices of rival players. While the asymptotic global convergence and local convergence rates of algorithms to solve this problem have been extensively investigated, the analysis of nonasymptotic iteration complexity is still in its infancy. This paper presents two first-order algorithms -- based on the quadratic penalty method (QPM) and augmented Lagrangian method (ALM), respectively -- with an accelerated mirror-prox algorithm as the solver in each inner loop. We establish a global convergence guarantee for solving monotone and strongly monotone NGNEPs and provide nonasymptotic complexity bounds expressed in terms of the number of gradient evaluations. Experimental results demonstrate the efficiency of our algorithms in practice. △ Less

Submitted 5 February, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

Comments: Accepted by Journal of Machine Learning Research; Add the references and the funding information; 44 pages, 1 table

arXiv:2112.13826 [pdf, other]

Last-Iterate Convergence of Saddle-Point Optimizers via High-Resolution Differential Equations

Authors: Tatjana Chavdarova, Michael I. Jordan, Manolis Zampetakis

Abstract: Several widely-used first-order saddle-point optimization methods yield an identical continuous-time ordinary differential equation (ODE) that is identical to that of the Gradient Descent Ascent (GDA) method when derived naively. However, the convergence properties of these methods are qualitatively different, even on simple bilinear games. Thus the ODE perspective, which has proved powerful in an… ▽ More Several widely-used first-order saddle-point optimization methods yield an identical continuous-time ordinary differential equation (ODE) that is identical to that of the Gradient Descent Ascent (GDA) method when derived naively. However, the convergence properties of these methods are qualitatively different, even on simple bilinear games. Thus the ODE perspective, which has proved powerful in analyzing single-objective optimization methods, has not played a similar role in saddle-point optimization. We adopt a framework studied in fluid dynamics -- known as High-Resolution Differential Equations (HRDEs) -- to design differential equation models for several saddle-point optimization methods. Critically, these HRDEs are distinct for various saddle-point optimization methods. Moreover, in bilinear games, the convergence properties of the HRDEs match the qualitative features of the corresponding discrete methods. Additionally, we show that the HRDE of Optimistic Gradient Descent Ascent (OGDA) exhibits \emph{last-iterate convergence} for general monotone variational inequalities. Finally, we provide rates of convergence for the \emph{best-iterate convergence} of the OGDA method, relying solely on the first-order smoothness of the monotone operator. △ Less

Submitted 31 July, 2023; v1 submitted 27 December, 2021; originally announced December 2021.

Journal ref: Minimax Theory 8, Number 2 (2023) 333--380

arXiv:2107.06259 [pdf, other]

Robust Learning of Optimal Auctions

Authors: Wenshuo Guo, Michael I. Jordan, Manolis Zampetakis

Abstract: We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed. First, we prove tight upper bounds on the revenue we can obtain with a corrupted distribution under a population model, for both regular valuation distributions and distributions with… ▽ More We study the problem of learning revenue-optimal multi-bidder auctions from samples when the samples of bidders' valuations can be adversarially corrupted or drawn from distributions that are adversarially perturbed. First, we prove tight upper bounds on the revenue we can obtain with a corrupted distribution under a population model, for both regular valuation distributions and distributions with monotone hazard rate (MHR). We then propose new algorithms that, given only an ``approximate distribution'' for the bidder's valuation, can learn a mechanism whose revenue is nearly optimal simultaneously for all ``true distributions'' that are $α$-close to the original distribution in Kolmogorov-Smirnov distance. The proposed algorithms operate beyond the setting of bounded distributions that have been studied in prior works, and are guaranteed to obtain a fraction $1-O(α)$ of the optimal revenue under the true distribution when the distributions are MHR. Moreover, they are guaranteed to yield at least a fraction $1-O(\sqrtα)$ of the optimal revenue when the distributions are regular. We prove that these upper bounds cannot be further improved, by providing matching lower bounds. Lastly, we derive sample complexity upper bounds for learning a near-optimal auction for both MHR and regular distributions. △ Less

Submitted 13 July, 2021; originally announced July 2021.

arXiv:2106.15908 [pdf, other]

A Statistical Taylor Theorem and Extrapolation of Truncated Densities

Authors: Constantinos Daskalakis, Vasilis Kontonis, Christos Tzamos, Manolis Zampetakis

Abstract: We show a statistical version of Taylor's theorem and apply this result to non-parametric density estimation from truncated samples, which is a classical challenge in Statistics \cite{woodroofe1985estimating, stute1993almost}. The single-dimensional version of our theorem has the following implication: "For any distribution $P$ on $[0, 1]$ with a smooth log-density function, given samples from the… ▽ More We show a statistical version of Taylor's theorem and apply this result to non-parametric density estimation from truncated samples, which is a classical challenge in Statistics \cite{woodroofe1985estimating, stute1993almost}. The single-dimensional version of our theorem has the following implication: "For any distribution $P$ on $[0, 1]$ with a smooth log-density function, given samples from the conditional distribution of $P$ on $[a, a + \varepsilon] \subset [0, 1]$, we can efficiently identify an approximation to $P$ over the \emph{whole} interval $[0, 1]$, with quality of approximation that improves with the smoothness of $P$." To the best of knowledge, our result is the first in the area of non-parametric density estimation from truncated samples, which works under the hard truncation model, where the samples outside some survival set $S$ are never observed, and applies to multiple dimensions. In contrast, previous works assume single dimensional data where each sample has a different survival set $S$ so that samples from the whole support will ultimately be collected. △ Less

Submitted 30 June, 2021; originally announced June 2021.

Comments: Appeared at COLT2021

arXiv:2010.12000 [pdf, other]

Computationally and Statistically Efficient Truncated Regression

Authors: Constantinos Daskalakis, Themis Gouleakis, Christos Tzamos, Manolis Zampetakis

Abstract: We provide a computationally and statistically efficient estimator for the classical problem of truncated linear regression, where the dependent variable $y = w^T x + ε$ and its corresponding vector of covariates $x \in R^k$ are only revealed if the dependent variable falls in some subset $S \subseteq R$; otherwise the existence of the pair $(x, y)$ is hidden. This problem has remained a challenge… ▽ More We provide a computationally and statistically efficient estimator for the classical problem of truncated linear regression, where the dependent variable $y = w^T x + ε$ and its corresponding vector of covariates $x \in R^k$ are only revealed if the dependent variable falls in some subset $S \subseteq R$; otherwise the existence of the pair $(x, y)$ is hidden. This problem has remained a challenge since the early works of [Tobin 1958, Amemiya 1973, Hausman and Wise 1977], its applications are abundant, and its history dates back even further to the work of Galton, Pearson, Lee, and Fisher. While consistent estimators of the regression coefficients have been identified, the error rates are not well-understood, especially in high dimensions. Under a thickness assumption about the covariance matrix of the covariates in the revealed sample, we provide a computationally efficient estimator for the coefficient vector $w$ from $n$ revealed samples that attains $l_2$ error $\tilde{O}(\sqrt{k/n})$. Our estimator uses Projected Stochastic Gradient Descent (PSGD) without replacement on the negative log-likelihood of the truncated sample. For the statistically efficient estimation we only need oracle access to the set $S$.In order to achieve computational efficiency we need to assume that $S$ is a union of a finite number of intervals but still can be complicated. PSGD without replacement must be restricted to an appropriately defined convex cone to guarantee that the negative log-likelihood is strongly convex, which in turn is established using concentration of matrices on variables with sub-exponential tails. We perform experiments on simulated data to illustrate the accuracy of our estimator. As a corollary, we show that SGD learns the parameters of single-layer neural networks with noisy activation functions. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2019

arXiv:2010.11450 [pdf, other]

Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions

Authors: Alessandro Epasto, Mohammad Mahdian, Vahab Mirrokni, Manolis Zampetakis

Abstract: A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is to changes of its input. Our goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness. This leads to novel soft-max functions, each of which is… ▽ More A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is to changes of its input. Our goal is to identify the optimal approximation-smoothness tradeoffs for different measures of approximation and smoothness. This leads to novel soft-max functions, each of which is optimal for a different application. The most commonly used soft-max function, called exponential mechanism, has optimal tradeoff between approximation measured in terms of expected additive approximation and smoothness measured with respect to Rényi Divergence. We introduce a soft-max function, called "piecewise linear soft-max", with optimal tradeoff between approximation, measured in terms of worst-case additive approximation and smoothness, measured with respect to $\ell_q$-norm. The worst-case approximation guarantee of the piecewise linear mechanism enforces sparsity in the output of our soft-max function, a property that is known to be important in Machine Learning applications [Martins et al. '16, Laha et al. '18] and is not satisfied by the exponential mechanism. Moreover, the $\ell_q$-smoothness is suitable for applications in Mechanism Design and Game Theory where the piecewise linear mechanism outperforms the exponential mechanism. Finally, we investigate another soft-max function, called power mechanism, with optimal tradeoff between expected \textit{multiplicative} approximation and smoothness with respect to the Rényi Divergence, which provides improved theoretical and practical results in differentially private submodular optimization. △ Less

Submitted 22 October, 2020; originally announced October 2020.

Comments: Accepted for spotlight presentation at NeurIPS 2020

arXiv:2009.09623 [pdf, other]

The Complexity of Constrained Min-Max Optimization

Authors: Constantinos Daskalakis, Stratis Skoulakis, Manolis Zampetakis

Abstract: Despite its important applications in Machine Learning, min-max optimization of nonconvex-nonconcave objectives remains elusive. Not only are there no known first-order methods converging even to approximate local min-max points, but the computational complexity of identifying them is also poorly understood. In this paper, we provide a characterization of the computational complexity of the proble… ▽ More Despite its important applications in Machine Learning, min-max optimization of nonconvex-nonconcave objectives remains elusive. Not only are there no known first-order methods converging even to approximate local min-max points, but the computational complexity of identifying them is also poorly understood. In this paper, we provide a characterization of the computational complexity of the problem, as well as of the limitations of first-order methods in constrained min-max optimization problems with nonconvex-nonconcave objectives and linear constraints. As a warm-up, we show that, even when the objective is a Lipschitz and smooth differentiable function, deciding whether a min-max point exists, in fact even deciding whether an approximate min-max point exists, is NP-hard. More importantly, we show that an approximate local min-max point of large enough approximation is guaranteed to exist, but finding one such point is PPAD-complete. The same is true of computing an approximate fixed point of Gradient Descent/Ascent. An important byproduct of our proof is to establish an unconditional hardness result in the Nemirovsky-Yudin model. We show that, given oracle access to some function $f : P \to [-1, 1]$ and its gradient $\nabla f$, where $P \subseteq [0, 1]^d$ is a known convex polytope, every algorithm that finds a $\varepsilon$-approximate local min-max point needs to make a number of queries that is exponential in at least one of $1/\varepsilon$, $L$, $G$, or $d$, where $L$ and $G$ are respectively the smoothness and Lipschitzness of $f$ and $d$ is the dimension. This comes in sharp contrast to minimization problems, where finding approximate local minima in the same setting can be done with Projected Gradient Descent using $O(L/\varepsilon)$ many queries. Our result is the first to show an exponential separation between these two fundamental optimization problems. △ Less

Submitted 21 September, 2020; originally announced September 2020.

arXiv:2007.14539 [pdf, other]

Truncated Linear Regression in High Dimensions

Authors: Constantinos Daskalakis, Dhruv Rohatgi, Manolis Zampetakis

Abstract: As in standard linear regression, in truncated linear regression, we are given access to observations $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_i^{\rm T} \cdot x^* + η_i$, where $x^*$ is some fixed unknown vector of interest and $η_i$ is independent noise; except we are only given an observation if its dependent variable $y_i$ lies in some "truncation set" $S \subset \mathbb{R}$. The… ▽ More As in standard linear regression, in truncated linear regression, we are given access to observations $(A_i, y_i)_i$ whose dependent variable equals $y_i= A_i^{\rm T} \cdot x^* + η_i$, where $x^*$ is some fixed unknown vector of interest and $η_i$ is independent noise; except we are only given an observation if its dependent variable $y_i$ lies in some "truncation set" $S \subset \mathbb{R}$. The goal is to recover $x^*$ under some favorable conditions on the $A_i$'s and the noise distribution. We prove that there exists a computationally and statistically efficient method for recovering $k$-sparse $n$-dimensional vectors $x^*$ from $m$ truncated samples, which attains an optimal $\ell_2$ reconstruction error of $O(\sqrt{(k \log n)/m})$. As a corollary, our guarantees imply a computationally efficient and information-theoretically optimal algorithm for compressed sensing with truncation, which may arise from measurement saturation effects. Our result follows from a statistical and computational analysis of the Stochastic Gradient Descent (SGD) algorithm for solving a natural adaptation of the LASSO optimization problem that accommodates truncation. This generalizes the works of both: (1) [Daskalakis et al. 2018], where no regularization is needed due to the low-dimensionality of the data, and (2) [Wainright 2009], where the objective function is simple due to the absence of truncation. In order to deal with both truncation and high-dimensionality at the same time, we develop new techniques that not only generalize the existing ones but we believe are of independent interest. △ Less

Submitted 28 July, 2020; originally announced July 2020.

Comments: 30 pages, 1 figure

arXiv:2007.03210 [pdf, ps, other]

Estimation and Inference with Trees and Forests in High Dimensions

Authors: Vasilis Syrgkanis, Manolis Zampetakis

Abstract: We analyze the finite sample mean squared error (MSE) performance of regression trees and forests in the high dimensional regime with binary features, under a sparsity constraint. We prove that if only $r$ of the $d$ features are relevant for the mean outcome function, then shallow trees built greedily via the CART empirical MSE criterion achieve MSE rates that depend only logarithmically on the a… ▽ More We analyze the finite sample mean squared error (MSE) performance of regression trees and forests in the high dimensional regime with binary features, under a sparsity constraint. We prove that if only $r$ of the $d$ features are relevant for the mean outcome function, then shallow trees built greedily via the CART empirical MSE criterion achieve MSE rates that depend only logarithmically on the ambient dimension $d$. We prove upper bounds, whose exact dependence on the number relevant variables $r$ depends on the correlation among the features and on the degree of relevance. For strongly relevant features, we also show that fully grown honest forests achieve fast MSE rates and their predictions are also asymptotically normal, enabling asymptotically valid inference that adapts to the sparsity of the regression function. △ Less

Submitted 21 October, 2020; v1 submitted 7 July, 2020; originally announced July 2020.

Comments: Accepted for presentation at the Conference on Learning Theory (COLT) 2020

arXiv:2006.04237 [pdf, other]

Constant-Expansion Suffices for Compressed Sensing with Generative Priors

Authors: Constantinos Daskalakis, Dhruv Rohatgi, Manolis Zampetakis

Abstract: Generative neural networks have been empirically found very promising in providing effective structural priors for compressed sensing, since they can be trained to span low-dimensional data manifolds in high-dimensional signal spaces. Despite the non-convexity of the resulting optimization problem, it has also been shown theoretically that, for neural networks with random Gaussian weights, a signa… ▽ More Generative neural networks have been empirically found very promising in providing effective structural priors for compressed sensing, since they can be trained to span low-dimensional data manifolds in high-dimensional signal spaces. Despite the non-convexity of the resulting optimization problem, it has also been shown theoretically that, for neural networks with random Gaussian weights, a signal in the range of the network can be efficiently, approximately recovered from a few noisy measurements. However, a major bottleneck of these theoretical guarantees is a network expansivity condition: that each layer of the neural network must be larger than the previous by a logarithmic factor. Our main contribution is to break this strong expansivity assumption, showing that constant expansivity suffices to get efficient recovery algorithms, besides it also being information-theoretically necessary. To overcome the theoretical bottleneck in existing approaches we prove a novel uniform concentration theorem for random functions that might not be Lipschitz but satisfy a relaxed notion which we call "pseudo-Lipschitzness." Using this theorem we can show that a matrix concentration inequality known as the Weight Distribution Condition (WDC), which was previously only known to hold for Gaussian matrices with logarithmic aspect ratio, in fact holds for constant aspect ratios too. Since the WDC is a fundamental matrix concentration inequality in the heart of all existing theoretical guarantees on this problem, our tighter bound immediately yields improvements in all known results in the literature on compressed sensing with deep generative priors, including one-bit recovery, phase retrieval, low-rank matrix recovery, and more. △ Less

Submitted 26 June, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

Comments: 21 pages, 1 figure; added an application

arXiv:2003.11974 [pdf, other]

A Topological Characterization of Modulo-$p$ Arguments and Implications for Necklace Splitting

Authors: Aris Filos-Ratsikas, Alexandros Hollender, Katerina Sotiraki, Manolis Zampetakis

Abstract: The classes PPA-$p$ have attracted attention lately, because they are the main candidates for capturing the complexity of Necklace Splitting with $p$ thieves, for prime $p$. However, these classes were not known to have complete problems of a topological nature, which impedes any progress towards settling the complexity of the Necklace Splitting problem. On the contrary, topological problems have… ▽ More The classes PPA-$p$ have attracted attention lately, because they are the main candidates for capturing the complexity of Necklace Splitting with $p$ thieves, for prime $p$. However, these classes were not known to have complete problems of a topological nature, which impedes any progress towards settling the complexity of the Necklace Splitting problem. On the contrary, topological problems have been pivotal in obtaining completeness results for PPAD and PPA, such as the PPAD-completeness of finding a Nash equilibrium [Daskalakis et al., 2009, Chen et al., 2009b] and the PPA-completeness of Necklace Splitting with 2 thieves [Filos-Ratsikas and Goldberg, 2019]. In this paper, we provide the first topological characterization of the classes PPA-$p$. First, we show that the computational problem associated with a simple generalization of Tucker's Lemma, termed $p$-polygon-Tucker, as well as the associated Borsuk-Ulam-type theorem, $p$-polygon-Borsuk-Ulam, are PPA-$p$-complete. Then, we show that the computational version of the well-known BSS Theorem [Barany et al., 1981], as well as the associated BSS-Tucker problem are PPA-$p$-complete. Finally, using a different generalization of Tucker's Lemma (termed $\mathbb{Z}_p$-star-Tucker), which we prove to be PPA-$p$-complete, we prove that $p$-thief Necklace Splitting is in PPA-$p$. This latter result gives a new combinatorial proof for the Necklace Splitting theorem, the only proof of this nature other than that of Meunier [2014]. All of our containment results are obtained through a new combinatorial proof for $\mathbb{Z}_p$-versions of Tucker's lemma that is a natural generalization of the standard combinatorial proof of Tucker's lemma by Freund and Todd [1981]. We believe that this new proof technique is of independent interest. △ Less

Submitted 18 January, 2021; v1 submitted 26 March, 2020; originally announced March 2020.

Comments: v2: improved presentation based on reviewer comments and suggestions

arXiv:2002.11437 [pdf, other]

doi 10.1137/20M1387493

Consensus-Halving: Does It Ever Get Easier?

Authors: Aris Filos-Ratsikas, Alexandros Hollender, Katerina Sotiraki, Manolis Zampetakis

Abstract: In the $\varepsilon$-Consensus-Halving problem, a fundamental problem in fair division, there are $n$ agents with valuations over the interval $[0,1]$, and the goal is to divide the interval into pieces and assign a label "$+$" or "$-$" to each piece, such that every agent values the total amount of "$+$" and the total amount of "$-$" almost equally. The problem was recently proven by Filos-Ratsik… ▽ More In the $\varepsilon$-Consensus-Halving problem, a fundamental problem in fair division, there are $n$ agents with valuations over the interval $[0,1]$, and the goal is to divide the interval into pieces and assign a label "$+$" or "$-$" to each piece, such that every agent values the total amount of "$+$" and the total amount of "$-$" almost equally. The problem was recently proven by Filos-Ratsikas and Goldberg [2019] to be the first "natural" complete problem for the computational class PPA, answering a decade-old open question. In this paper, we examine the extent to which the problem becomes easy to solve, if one restricts the class of valuation functions. To this end, we provide the following contributions. First, we obtain a strengthening of the PPA-hardness result of [Filos-Ratsikas and Goldberg, 2019], to the case when agents have piecewise uniform valuations with only two blocks. We obtain this result via a new reduction, which is in fact conceptually much simpler than the corresponding one in [Filos-Ratsikas and Goldberg, 2019]. Then, we consider the case of single-block (uniform) valuations and provide a parameterized polynomial time algorithm for solving $\varepsilon$-Consensus-Halving for any $\varepsilon$, as well as a polynomial-time algorithm for $\varepsilon=1/2$. Finally, an important application of our new techniques is the first hardness result for a generalization of Consensus-Halving, the Consensus-$1/k$-Division problem [Simmons and Su, 2003]. In particular, we prove that $\varepsilon$-Consensus-$1/3$-Division is PPAD-hard. △ Less

Submitted 24 April, 2023; v1 submitted 26 February, 2020; originally announced February 2020.

Comments: Journal version. Preliminary version appeared at EC '20

Journal ref: SIAM Journal on Computing, 52(2):412-451 (2023)

arXiv:1912.04467 [pdf, ps, other]

On the Complexity of Modulo-q Arguments and the Chevalley-Warning Theorem

Authors: Mika Göös, Pritish Kamath, Katerina Sotiraki, Manolis Zampetakis

Abstract: We study the search problem class $\mathrm{PPA}_q$ defined as a modulo-$q$ analog of the well-known $\textit{polynomial parity argument}$ class $\mathrm{PPA}$ introduced by Papadimitriou '94. Our first result shows that this class can be characterized in terms of $\mathrm{PPA}_p$ for prime $p$. Our main result is to establish that an $\textit{explicit}$ version of a search problem associated to… ▽ More We study the search problem class $\mathrm{PPA}_q$ defined as a modulo-$q$ analog of the well-known $\textit{polynomial parity argument}$ class $\mathrm{PPA}$ introduced by Papadimitriou '94. Our first result shows that this class can be characterized in terms of $\mathrm{PPA}_p$ for prime $p$. Our main result is to establish that an $\textit{explicit}$ version of a search problem associated to the Chevalley--Warning theorem is complete for $\mathrm{PPA}_p$ for prime $p$. This problem is $\textit{natural}$ in that it does not explicitly involve circuits as part of the input. It is the first such complete problem for $\mathrm{PPA}_p$ when $p \ge 3$. Finally we discuss connections between Chevalley-Warning theorem and the well-studied $\textit{short integer solution}$ problem and survey the structural properties of $\mathrm{PPA}_q$. △ Less

Submitted 5 July, 2020; v1 submitted 9 December, 2019; originally announced December 2019.

Comments: To appear at the Computational Complexity Conference (CCC) 2020

arXiv:1908.01034 [pdf, other]

Efficient Truncated Statistics with Unknown Truncation

Authors: Vasilis Kontonis, Christos Tzamos, Manolis Zampetakis

Abstract: We study the problem of estimating the parameters of a Gaussian distribution when samples are only shown if they fall in some (unknown) subset $S \subseteq \R^d$. This core problem in truncated statistics has long history going back to Galton, Lee, Pearson and Fisher. Recent work by Daskalakis et al. (FOCS'18), provides the first efficient algorithm that works for arbitrary sets in high dimension… ▽ More We study the problem of estimating the parameters of a Gaussian distribution when samples are only shown if they fall in some (unknown) subset $S \subseteq \R^d$. This core problem in truncated statistics has long history going back to Galton, Lee, Pearson and Fisher. Recent work by Daskalakis et al. (FOCS'18), provides the first efficient algorithm that works for arbitrary sets in high dimension when the set is known, but leaves as an open problem the more challenging and relevant case of unknown truncation set. Our main result is a computationally and sample efficient algorithm for estimating the parameters of the Gaussian under arbitrary unknown truncation sets whose performance decays with a natural measure of complexity of the set, namely its Gaussian surface area. Notably, this algorithm works for large families of sets including intersections of halfspaces, polynomial threshold functions and general convex sets. We show that our algorithm closely captures the tradeoff between the complexity of the set and the number of samples needed to learn the parameters by exhibiting a set with small Gaussian surface area for which it is information theoretically impossible to learn the true Gaussian with few samples. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Comments: to appear at 60th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2019

arXiv:1906.01009 [pdf, ps, other]

Optimal Learning of Mallows Block Model

Authors: Róbert Busa-Fekete, Dimitris Fotakis, Balázs Szörényi, Manolis Zampetakis

Abstract: The Mallows model, introduced in the seminal paper of Mallows 1957, is one of the most fundamental ranking distribution over the symmetric group $S_m$. To analyze more complex ranking data, several studies considered the Generalized Mallows model defined by Fligner and Verducci 1986. Despite the significant research interest of ranking distributions, the exact sample complexity of estimating the p… ▽ More The Mallows model, introduced in the seminal paper of Mallows 1957, is one of the most fundamental ranking distribution over the symmetric group $S_m$. To analyze more complex ranking data, several studies considered the Generalized Mallows model defined by Fligner and Verducci 1986. Despite the significant research interest of ranking distributions, the exact sample complexity of estimating the parameters of a Mallows and a Generalized Mallows Model is not well-understood. The main result of the paper is a tight sample complexity bound for learning Mallows and Generalized Mallows Model. We approach the learning problem by analyzing a more general model which interpolates between the single parameter Mallows Model and the $m$ parameter Mallows model. We call our model Mallows Block Model -- referring to the Block Models that are a popular model in theoretical statistics. Our sample complexity analysis gives tight bound for learning the Mallows Block Model for any number of blocks. We provide essentially matching lower bounds for our sample complexity results. As a corollary of our analysis, it turns out that, if the central ranking is known, one single sample from the Mallows Block Model is sufficient to estimate the spread parameters with error that goes to zero as the size of the permutations goes to infinity. In addition, we calculate the exact rate of the parameter estimation error. △ Less

Submitted 3 June, 2019; originally announced June 2019.

arXiv:1809.03986 [pdf, other]

Efficient Statistics, in High Dimensions, from Truncated Samples

Authors: Constantinos Daskalakis, Themis Gouleakis, Christos Tzamos, Manolis Zampetakis

Abstract: We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a $d$-variate normal ${\cal N}(\mathbfμ,\mathbfΣ)$ means a samples is only revealed if it falls in some subset $S \subseteq \mathbb{R}^d$; otherwise the samp… ▽ More We provide an efficient algorithm for the classical problem, going back to Galton, Pearson, and Fisher, of estimating, with arbitrary accuracy the parameters of a multivariate normal distribution from truncated samples. Truncated samples from a $d$-variate normal ${\cal N}(\mathbfμ,\mathbfΣ)$ means a samples is only revealed if it falls in some subset $S \subseteq \mathbb{R}^d$; otherwise the samples are hidden and their count in proportion to the revealed samples is also hidden. We show that the mean $\mathbfμ$ and covariance matrix $\mathbfΣ$ can be estimated with arbitrary accuracy in polynomial-time, as long as we have oracle access to $S$, and $S$ has non-trivial measure under the unknown $d$-variate normal distribution. Additionally we show that without oracle access to $S$, any non-trivial estimation is impossible. △ Less

Submitted 22 October, 2020; v1 submitted 11 September, 2018; originally announced September 2018.

Comments: Appeared at 59th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2018

arXiv:1808.06407 [pdf, other]

PPP-Completeness with Connections to Cryptography

Authors: Katerina Sotiraki, Manolis Zampetakis, Giorgos Zirdelis

Abstract: Polynomial Pigeonhole Principle (PPP) is an important subclass of TFNP with profound connections to the complexity of the fundamental cryptographic primitives: collision-resistant hash functions and one-way permutations. In contrast to most of the other subclasses of TFNP, no complete problem is known for PPP. Our work identifies the first PPP-complete problem without any circuit or Turing Machine… ▽ More Polynomial Pigeonhole Principle (PPP) is an important subclass of TFNP with profound connections to the complexity of the fundamental cryptographic primitives: collision-resistant hash functions and one-way permutations. In contrast to most of the other subclasses of TFNP, no complete problem is known for PPP. Our work identifies the first PPP-complete problem without any circuit or Turing Machine given explicitly in the input, and thus we answer a longstanding open question from [Papadimitriou1994]. Specifically, we show that constrained-SIS (cSIS), a generalized version of the well-known Short Integer Solution problem (SIS) from lattice-based cryptography, is PPP-complete. In order to give intuition behind our reduction for constrained-SIS, we identify another PPP-complete problem with a circuit in the input but closely related to lattice problems. We call this problem BLICHFELDT and it is the computational problem associated with Blichfeldt's fundamental theorem in the theory of lattices. Building on the inherent connection of PPP with collision-resistant hash functions, we use our completeness result to construct the first natural hash function family that captures the hardness of all collision-resistant hash functions in a worst-case sense, i.e. it is natural and universal in the worst-case. The close resemblance of our hash function family with SIS, leads us to the first candidate collision-resistant hash function that is both natural and universal in an average-case sense. Finally, our results enrich our understanding of the connections between PPP, lattice problems and other concrete cryptographic assumptions, such as the discrete logarithm problem over general groups. △ Less

Submitted 20 August, 2018; originally announced August 2018.

arXiv:1709.03926 [pdf, other]

Certified Computation from Unreliable Datasets

Authors: Themis Gouleakis, Christos Tzamos, Manolis Zampetakis

Abstract: A wide range of learning tasks require human input in labeling massive data. The collected data though are usually low quality and contain inaccuracies and errors. As a result, modern science and business face the problem of learning from unreliable data sets. In this work, we provide a generic approach that is based on \textit{verification} of only few records of the data set to guarantee high… ▽ More A wide range of learning tasks require human input in labeling massive data. The collected data though are usually low quality and contain inaccuracies and errors. As a result, modern science and business face the problem of learning from unreliable data sets. In this work, we provide a generic approach that is based on \textit{verification} of only few records of the data set to guarantee high quality learning outcomes for various optimization objectives. Our method, identifies small sets of critical records and verifies their validity. We show that many problems only need $\text{poly}(1/\varepsilon)$ verifications, to ensure that the output of the computation is at most a factor of $(1 \pm \varepsilon)$ away from the truth. For any given instance, we provide an \textit{instance optimal} solution that verifies the minimum possible number of records to approximately certify correctness. Then using this instance optimal formulation of the problem we prove our main result: "every function that satisfies some Lipschitz continuity condition can be certified with a small number of verifications". We show that the required Lipschitz continuity condition is satisfied even by some NP-complete problems, which illustrates the generality and importance of this theorem. In case this certification step fails, an invalid record will be identified. Removing these records and repeating until success, guarantees that the result will be accurate and will depend only on the verified records. Surprisingly, as we show, for several computation tasks more efficient methods are possible. These methods always guarantee that the produced result is not affected by the invalid records, since any invalid record that affects the output will be detected and verified. △ Less

Submitted 12 June, 2018; v1 submitted 12 September, 2017; originally announced September 2017.

arXiv:1702.07339 [pdf, ps, other]

A Converse to Banach's Fixed Point Theorem and its CLS Completeness

Authors: Constantinos Daskalakis, Christos Tzamos, Manolis Zampetakis

Abstract: Banach's fixed point theorem for contraction maps has been widely used to analyze the convergence of iterative methods in non-convex problems. It is a common experience, however, that iterative maps fail to be globally contracting under the natural metric in their domain, making the applicability of Banach's theorem limited. We explore how generally we can apply Banach's fixed point theorem to est… ▽ More Banach's fixed point theorem for contraction maps has been widely used to analyze the convergence of iterative methods in non-convex problems. It is a common experience, however, that iterative maps fail to be globally contracting under the natural metric in their domain, making the applicability of Banach's theorem limited. We explore how generally we can apply Banach's fixed point theorem to establish the convergence of iterative methods when pairing it with carefully designed metrics. Our first result is a strong converse of Banach's theorem, showing that it is a universal analysis tool for establishing global convergence of iterative methods to unique fixed points, and for bounding their convergence rate. In other words, we show that, whenever an iterative map globally converges to a unique fixed point, there exists a metric under which the iterative map is contracting and which can be used to bound the number of iterations until convergence. We illustrate our approach in the widely used power method, providing a new way of bounding its convergence rate through contraction arguments. We next consider the computational complexity of Banach's fixed point theorem. Making the proof of our converse theorem constructive, we show that computing a fixed point whose existence is guaranteed by Banach's fixed point theorem is CLS-complete. We thus provide the first natural complete problem for the class CLS, which was defined in [Daskalakis, Papadimitriou 2011] to capture the complexity of problems such as P-matrix LCP, computing KKT-points, and finding mixed Nash equilibria in congestion and network coordination games. △ Less

Submitted 13 February, 2018; v1 submitted 23 February, 2017; originally announced February 2017.

arXiv:1609.00368 [pdf, other]

Ten Steps of EM Suffice for Mixtures of Two Gaussians

Authors: Constantinos Daskalakis, Christos Tzamos, Manolis Zampetakis

Abstract: The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guar… ▽ More The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices. We show that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provide simple, closed-form expressions for the convergence rate. As a simple illustration, we show that, in one dimension, ten steps of the EM algorithm initialized at infinity result in less than 1\% error estimation of the means. In the finite sample regime, we show that, under a random initialization, $\tilde{O}(d/ε^2)$ samples suffice to compute the unknown vectors to within $ε$ in Mahalanobis distance, where $d$ is the dimension. In particular, the error rate of the EM based estimator is $\tilde{O}\left(\sqrt{d \over n}\right)$ where $n$ is the number of samples, which is optimal up to logarithmic factors. △ Less

Submitted 5 June, 2017; v1 submitted 1 September, 2016; originally announced September 2016.

Comments: Accepted for presentation at Conference on Learning Theory (COLT) 2017

arXiv:1608.04759 [pdf, ps, other]

Faster Sublinear Algorithms using Conditional Sampling

Authors: Themistoklis Gouleakis, Christos Tzamos, Manolis Zampetakis

Abstract: A conditional sampling oracle for a probability distribution D returns samples from the conditional distribution of D restricted to a specified subset of the domain. A recent line of work (Chakraborty et al. 2013 and Cannone et al. 2014) has shown that having access to such a conditional sampling oracle requires only polylogarithmic or even constant number of samples to solve distribution testing… ▽ More A conditional sampling oracle for a probability distribution D returns samples from the conditional distribution of D restricted to a specified subset of the domain. A recent line of work (Chakraborty et al. 2013 and Cannone et al. 2014) has shown that having access to such a conditional sampling oracle requires only polylogarithmic or even constant number of samples to solve distribution testing problems like identity and uniformity. This significantly improves over the standard sampling model where polynomially many samples are necessary. Inspired by these results, we introduce a computational model based on conditional sampling to develop sublinear algorithms with exponentially faster runtimes compared to standard sublinear algorithms. We focus on geometric optimization problems over points in high dimensional Euclidean space. Access to these points is provided via a conditional sampling oracle that takes as input a succinct representation of a subset of the domain and outputs a uniformly random point in that subset. We study two well studied problems: k-means clustering and estimating the weight of the minimum spanning tree. In contrast to prior algorithms for the classic model, our algorithms have time, space and sample complexity that is polynomial in the dimension and polylogarithmic in the number of points. Finally, we comment on the applicability of the model and compare with existing ones like streaming, parallel and distributed computational models. △ Less

Submitted 16 August, 2016; originally announced August 2016.

Showing 1–39 of 39 results for author: Zampetakis, M