Computer Science > Data Structures and Algorithms
[Submitted on 4 Jun 2024]
Title:Coresets for Multiple $\ell_p$ Regression
View PDF HTML (experimental)Abstract:A coreset of a dataset with $n$ examples and $d$ features is a weighted subset of examples that is sufficient for solving downstream data analytic tasks. Nearly optimal constructions of coresets for least squares and $\ell_p$ linear regression with a single response are known in prior work. However, for multiple $\ell_p$ regression where there can be $m$ responses, there are no known constructions with size sublinear in $m$. In this work, we construct coresets of size $\tilde O(\varepsilon^{-2}d)$ for $p<2$ and $\tilde O(\varepsilon^{-p}d^{p/2})$ for $p>2$ independently of $m$ (i.e., dimension-free) that approximate the multiple $\ell_p$ regression objective at every point in the domain up to $(1\pm\varepsilon)$ relative error. If we only need to preserve the minimizer subject to a subspace constraint, we improve these bounds by an $\varepsilon$ factor for all $p>1$. All of our bounds are nearly tight.
We give two application of our results. First, we settle the number of uniform samples needed to approximate $\ell_p$ Euclidean power means up to a $(1+\varepsilon)$ factor, showing that $\tilde\Theta(\varepsilon^{-2})$ samples for $p = 1$, $\tilde\Theta(\varepsilon^{-1})$ samples for $1 < p < 2$, and $\tilde\Theta(\varepsilon^{1-p})$ samples for $p>2$ is tight, answering a question of Cohen-Addad, Saulpic, and Schwiegelshohn. Second, we show that for $1<p<2$, every matrix has a subset of $\tilde O(\varepsilon^{-1}k)$ rows which spans a $(1+\varepsilon)$-approximately optimal $k$-dimensional subspace for $\ell_p$ subspace approximation, which is also nearly optimal.
Current browse context:
cs
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.