Skip to main content

Showing 1–12 of 12 results for author: Pittas, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.10416  [pdf, other

    cs.LG cs.DS math.ST stat.ML

    Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination

    Authors: Ilias Diakonikolas, Daniel M. Kane, Sushrut Karmalkar, Ankit Pensia, Thanasis Pittas

    Abstract: We study Gaussian sparse estimation tasks in Huber's contamination model with a focus on mean estimation, PCA, and linear regression. For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees, within constant factors. All prior efficient algorithms for these tasks incur quantitatively suboptimal error. Concretely, for Gaussian r… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  2. arXiv:2403.02300  [pdf, other

    cs.DS cs.LG math.ST stat.ML

    Statistical Query Lower Bounds for Learning Truncated Gaussians

    Authors: Ilias Diakonikolas, Daniel M. Kane, Thanasis Pittas, Nikos Zarifis

    Abstract: We study the problem of estimating the mean of an identity covariance Gaussian in the truncated setting, in the regime when the truncation set comes from a low-complexity family $\mathcal{C}$ of sets. Specifically, for a fixed but unknown truncation set $S \subseteq \mathbb{R}^d$, we are given access to samples from the distribution $\mathcal{N}(\boldsymbol{ μ}, \mathbf{ I})$ truncated to the set… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  3. arXiv:2312.11769  [pdf, other

    cs.LG cs.DS cs.IT math.ST stat.ML

    Clustering Mixtures of Bounded Covariance Distributions Under Optimal Separation

    Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C. H. Lee, Thanasis Pittas

    Abstract: We study the clustering problem for mixtures of bounded covariance distributions, under a fine-grained separation assumption. Specifically, given samples from a $k$-component mixture distribution $D = \sum_{i =1}^k w_i P_i$, where each $w_i \ge α$ for some known parameter $α$, and each $P_i$ has unknown covariance $Σ_i \preceq σ^2_i \cdot I_d$ for some unknown $σ_i$, the goal is to cluster the sam… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2312.01547  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Near-Optimal Algorithms for Gaussians with Huber Contamination: Mean Estimation and Linear Regression

    Authors: Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia, Thanasis Pittas

    Abstract: We study the fundamental problems of Gaussian mean estimation and linear regression with Gaussian covariates in the presence of Huber contamination. Our main contribution is the design of the first sample near-optimal and almost linear-time algorithms with optimal error guarantees for both of these problems. Specifically, for Gaussian robust mean estimation on $\mathbb{R}^d$ with contamination par… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: To appear in NeurIPS 2023

  5. arXiv:2306.13057  [pdf, other

    cs.LG cs.DS math.ST stat.ML

    SQ Lower Bounds for Learning Bounded Covariance GMMs

    Authors: Ilias Diakonikolas, Daniel M. Kane, Thanasis Pittas, Nikos Zarifis

    Abstract: We study the complexity of learning mixtures of separated Gaussians with common unknown bounded covariance matrix. Specifically, we focus on learning Gaussian mixture models (GMMs) on $\mathbb{R}^d$ of the form $P= \sum_{i=1}^k w_i \mathcal{N}(\boldsymbol μ_i,\mathbf Σ_i)$, where $\mathbf Σ_i = \mathbf Σ\preceq \mathbf I$ and $\min_{i \neq j} \| \boldsymbol μ_i - \boldsymbol μ_j\|_2 \geq k^ε$ for… ▽ More

    Submitted 22 June, 2023; originally announced June 2023.

  6. arXiv:2305.02544  [pdf, other

    cs.LG cs.DS math.ST stat.ML

    Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA

    Authors: Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia, Thanasis Pittas

    Abstract: We study principal component analysis (PCA), where given a dataset in $\mathbb{R}^d$ from a distribution, the task is to find a unit vector $v$ that approximately maximizes the variance of the distribution after being projected along $v$. Despite being a classical task, standard estimators fail drastically if the data contains even a small fraction of outliers, motivating the problem of robust PCA… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: To appear in ICML 2023

  7. arXiv:2305.00966  [pdf, other

    cs.DS cs.LG math.ST stat.ML

    A Spectral Algorithm for List-Decodable Covariance Estimation in Relative Frobenius Norm

    Authors: Ilias Diakonikolas, Daniel M. Kane, Jasper C. H. Lee, Ankit Pensia, Thanasis Pittas

    Abstract: We study the problem of list-decodable Gaussian covariance estimation. Given a multiset $T$ of $n$ points in $\mathbb R^d$ such that an unknown $α<1/2$ fraction of points in $T$ are i.i.d. samples from an unknown Gaussian $\mathcal{N}(μ, Σ)$, the goal is to output a list of $O(1/α)$ hypotheses at least one of which is close to $Σ$ in relative Frobenius norm. Our main result is a… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

  8. arXiv:2206.05245  [pdf, other

    cs.DS cs.LG math.ST stat.ML

    List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering

    Authors: Ilias Diakonikolas, Daniel M. Kane, Sushrut Karmalkar, Ankit Pensia, Thanasis Pittas

    Abstract: We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter $α\in (0, 1/2)$, we are given $m$ points in $\mathbb{R}^n$, $\lfloor αm \rfloor$ of which are i.i.d. samples from a distribution $D$ with unknown $k$-sparse mean $μ$. No assumptions are made on the remaining points, which form the majority of the dataset. The goal is to return a small list of candidates co… ▽ More

    Submitted 5 July, 2024; v1 submitted 10 June, 2022; originally announced June 2022.

    Comments: Added fact about taking roots in SoS proofs (Fact 2.9)

  9. arXiv:2206.03441  [pdf, other

    cs.DS cs.LG math.ST stat.ML

    Robust Sparse Mean Estimation via Sum of Squares

    Authors: Ilias Diakonikolas, Daniel M. Kane, Sushrut Karmalkar, Ankit Pensia, Thanasis Pittas

    Abstract: We study the problem of high-dimensional sparse mean estimation in the presence of an $ε$-fraction of adversarial outliers. Prior work obtained sample and computationally efficient algorithms for this task for identity-covariance subgaussian distributions. In this work, we develop the first efficient algorithms for robust sparse mean estimation without a priori knowledge of the covariance. For dis… ▽ More

    Submitted 5 July, 2024; v1 submitted 7 June, 2022; originally announced June 2022.

    Comments: Fixed minor oversight in runtime calculation

  10. arXiv:2204.12399  [pdf, other

    cs.DS cs.LG math.ST stat.ML

    Streaming Algorithms for High-Dimensional Robust Statistics

    Authors: Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia, Thanasis Pittas

    Abstract: We study high-dimensional robust statistics tasks in the streaming model. A recent line of work obtained computationally efficient algorithms for a range of high-dimensional robust estimation tasks. Unfortunately, all previous algorithms require storing the entire dataset, incurring memory at least quadratic in the dimension. In this work, we develop the first efficient streaming algorithms for hi… ▽ More

    Submitted 3 May, 2023; v1 submitted 26 April, 2022; originally announced April 2022.

  11. arXiv:2106.09689  [pdf, ps, other

    cs.DS cs.LG math.ST stat.ML

    Statistical Query Lower Bounds for List-Decodable Linear Regression

    Authors: Ilias Diakonikolas, Daniel M. Kane, Ankit Pensia, Thanasis Pittas, Alistair Stewart

    Abstract: We study the problem of list-decodable linear regression, where an adversary can corrupt a majority of the examples. Specifically, we are given a set $T$ of labeled examples $(x, y) \in \mathbb{R}^d \times \mathbb{R}$ and a parameter $0< α<1/2$ such that an $α$-fraction of the points in $T$ are i.i.d. samples from a linear regression model with Gaussian covariates, and the remaining $(1-α)$-fracti… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

  12. arXiv:2102.04401  [pdf, ps, other

    cs.LG cs.DS math.ST stat.ML

    The Optimality of Polynomial Regression for Agnostic Learning under Gaussian Marginals

    Authors: Ilias Diakonikolas, Daniel M. Kane, Thanasis Pittas, Nikos Zarifis

    Abstract: We study the problem of agnostic learning under the Gaussian distribution. We develop a method for finding hard families of examples for a wide class of problems by using LP duality. For Boolean-valued concept classes, we show that the $L^1$-regression algorithm is essentially best possible, and therefore that the computational difficulty of agnostically learning a concept class is closely related… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.