-
Covariance estimation with direction dependence accuracy
Authors:
Pedro Abdalla,
Shahar Mendelson
Abstract:
We construct an estimator $\widehatΣ$ for covariance matrices of unknown, centred random vectors X, with the given data consisting of N independent measurements $X_1,...,X_N$ of X and the wanted confidence level. We show under minimal assumptions on X, the estimator performs with the optimal accuracy with respect to the operator norm. In addition, the estimator is also optimal with respect to dire…
▽ More
We construct an estimator $\widehatΣ$ for covariance matrices of unknown, centred random vectors X, with the given data consisting of N independent measurements $X_1,...,X_N$ of X and the wanted confidence level. We show under minimal assumptions on X, the estimator performs with the optimal accuracy with respect to the operator norm. In addition, the estimator is also optimal with respect to direction dependence accuracy: $\langle \widehatΣu,u\rangle$ is an optimal estimator for $σ^2(u)=\mathbb{E}\langle X,u\rangle^2$ when $σ^2(u)$ is ``large".
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
A uniform Dvoretzky-Kiefer-Wolfowitz inequality
Authors:
Daniel Bartl,
Shahar Mendelson
Abstract:
We show that under minimal assumption on a class of functions $\mathcal{H}$ defined on a probability space $(\mathcal{X},μ)$, there is a threshold $Δ_0$ satisfying the following: for every $Δ\geqΔ_0$, with probability at least $1-2\exp(-cΔm)$ with respect to $μ^{\otimes m}$,
\[ \sup_{h\in\mathcal{H}} \sup_{t\in\mathbb{R}} \left| \mathbb{P}(h(X)\leq t) - \frac{1}{m}\sum_{i=1}^m 1_{(-\infty,t]}(h(…
▽ More
We show that under minimal assumption on a class of functions $\mathcal{H}$ defined on a probability space $(\mathcal{X},μ)$, there is a threshold $Δ_0$ satisfying the following: for every $Δ\geqΔ_0$, with probability at least $1-2\exp(-cΔm)$ with respect to $μ^{\otimes m}$,
\[ \sup_{h\in\mathcal{H}} \sup_{t\in\mathbb{R}} \left| \mathbb{P}(h(X)\leq t) - \frac{1}{m}\sum_{i=1}^m 1_{(-\infty,t]}(h(X_i)) \right| \leq \sqrtΔ;\]
here $X$ is distributed according to $μ$ and $(X_i)_{i=1}^m$ are independent copies of $X$.
The value of $Δ_0$ is determined by an unexpected complexity parameter of the class $\mathcal{H}$ that captures the set's geometry (Talagrand's $γ_1$-functional).
The bound, the probability estimate and the value of $Δ_0$ are all optimal up to a logarithmic factor.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Exact Synthesis of Multiqubit Clifford-Cyclotomic Circuits
Authors:
Matthew Amy,
Andrew N. Glaudell,
Shaun Kelso,
William Maxwell,
Samuel S. Mendelson,
Neil J. Ross
Abstract:
Let $n\geq 8$ be divisible by 4. The Clifford-cyclotomic gate set $\mathcal{G}_n$ is the universal gate set obtained by extending the Clifford gates with the $z$-rotation $T_n = \mathrm{diag}(1,ζ_n)$, where $ζ_n$ is a primitive $n$-th root of unity. In this note, we show that, when $n$ is a power of 2, a multiqubit unitary matrix $U$ can be exactly represented by a circuit over $\mathcal{G}_n$ if…
▽ More
Let $n\geq 8$ be divisible by 4. The Clifford-cyclotomic gate set $\mathcal{G}_n$ is the universal gate set obtained by extending the Clifford gates with the $z$-rotation $T_n = \mathrm{diag}(1,ζ_n)$, where $ζ_n$ is a primitive $n$-th root of unity. In this note, we show that, when $n$ is a power of 2, a multiqubit unitary matrix $U$ can be exactly represented by a circuit over $\mathcal{G}_n$ if and only if the entries of $U$ belong to the ring $\mathbb{Z}[1/2,ζ_n]$. We moreover show that $\log(n)-2$ ancillas are always sufficient to construct a circuit for $U$. Our results generalize prior work to an infinite family of gate sets and show that the limitations that apply to single-qubit unitaries, for which the correspondence between Clifford-cyclotomic operators and matrices over $\mathbb{Z}[1/2,ζ_n]$ fails for all but finitely many values of $n$, can be overcome through the use of ancillas.
△ Less
Submitted 12 April, 2024; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Spectral properties of random graphs with fixed equitable partition
Authors:
Matthew B. Crawford,
David J. Marchette,
William Maxwell,
Samuel S. Mendelson
Abstract:
We define a graph to be $S$-regular if it contains an equitable partition given by a matrix $S$. These graphs are generalizations of both regular and bipartite, biregular graphs. An $S$-regular matrix is defined then as a matrix on an $S$-regular graph consistent with the graph's equitable partition. In this paper we derive the limiting spectral density for large, random $S$-regular matrices as we…
▽ More
We define a graph to be $S$-regular if it contains an equitable partition given by a matrix $S$. These graphs are generalizations of both regular and bipartite, biregular graphs. An $S$-regular matrix is defined then as a matrix on an $S$-regular graph consistent with the graph's equitable partition. In this paper we derive the limiting spectral density for large, random $S$-regular matrices as well as limiting functions of certain statistics for their eigenvector coordinates as a function of eigenvalue. These limiting functions are defined in terms of spectral measures on $S$-regular trees. In general, these spectral measures do not have a closed-form expression; however, we provide a defining system of polynomials for them. Finally, we explore eigenvalue bounds of $S$-regular graph, proving an expander mixing lemma, Alon-Bopana bound, and other eigenvalue inequalities in terms of the eigenvalues of the matrix $S$.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
Optimal non-gaussian Dvoretzky-Milman embeddings
Authors:
Daniel Bartl,
Shahar Mendelson
Abstract:
We construct the first non-gaussian ensemble that yields the optimal estimate in the Dvoretzky-Milman Theorem: the ensemble exhibits almost Euclidean sections in arbitrary normed spaces of the same dimension as the gaussian embedding -- despite being very far from gaussian (in fact, it happens to be heavy-tailed).
We construct the first non-gaussian ensemble that yields the optimal estimate in the Dvoretzky-Milman Theorem: the ensemble exhibits almost Euclidean sections in arbitrary normed spaces of the same dimension as the gaussian embedding -- despite being very far from gaussian (in fact, it happens to be heavy-tailed).
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Empirical approximation of the gaussian distribution in $\mathbb{R}^d$
Authors:
Daniel Bartl,
Shahar Mendelson
Abstract:
Let $G_1,\dots,G_m$ be independent copies of the standard gaussian random vector in $\mathbb{R}^d$. We show that there is an absolute constant $c$ such that for any $A \subset S^{d-1}$, with probability at least $1-2\exp(-cΔm)$, for every $t\in\mathbb{R}$, \[ \sup_{x \in A} \left| \frac{1}{m}\sum_{i=1}^m 1_{ \{\langle G_i,x\rangle \leq t \}} - \mathbb{P}(\langle G,x\rangle \leq t) \right| \leq Δ+…
▽ More
Let $G_1,\dots,G_m$ be independent copies of the standard gaussian random vector in $\mathbb{R}^d$. We show that there is an absolute constant $c$ such that for any $A \subset S^{d-1}$, with probability at least $1-2\exp(-cΔm)$, for every $t\in\mathbb{R}$, \[ \sup_{x \in A} \left| \frac{1}{m}\sum_{i=1}^m 1_{ \{\langle G_i,x\rangle \leq t \}} - \mathbb{P}(\langle G,x\rangle \leq t) \right| \leq Δ+ σ(t) \sqrtΔ. \] Here $σ(t) $ is the variance of $1_{\{\langle G,x\rangle\leq t\}}$ and $Δ\geq Δ_0$, where $Δ_0$ is determined by an unexpected complexity parameter of $A$ that captures the set's geometry (Talagrand's $γ_1$ functional). The bound, the probability estimate, and the value of $Δ_0$ are all (almost) optimal.
We use this fact to show that if $Γ=\sum_{i=1}^m \langle G_i,x\rangle e_i$ is the random matrix that has $G_1,\dots,G_m$ as its rows, then the structure of $Γ(A)=\{Γx: x\in A\}$ is far more rigid and well-prescribed than was previously expected.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
On a variance dependent Dvoretzky-Kiefer-Wolfowitz inequality
Authors:
Daniel Bartl,
Shahar Mendelson
Abstract:
Let $X$ be a real-valued random variable with distribution function $F$. Set $X_1,\dots, X_m$ to be independent copies of $X$ and let $F_m$ be the corresponding empirical distribution function. We show that there are absolute constants $c_0$ and $c_1$ such that if $Δ\geq c_0\frac{\log\log m}{m}$, then with probability at least $1-2\exp(-c_1Δm)$, for every $t\in\mathbb{R}$ that satisfies…
▽ More
Let $X$ be a real-valued random variable with distribution function $F$. Set $X_1,\dots, X_m$ to be independent copies of $X$ and let $F_m$ be the corresponding empirical distribution function. We show that there are absolute constants $c_0$ and $c_1$ such that if $Δ\geq c_0\frac{\log\log m}{m}$, then with probability at least $1-2\exp(-c_1Δm)$, for every $t\in\mathbb{R}$ that satisfies $F(t)\in[Δ,1-Δ]$, \[ |F_m(t) - F(t) | \leq \sqrt{Δ\min\{F(t),1-F(t)\} } .\] Moreover, this estimate is optimal up to the multiplicative constants $c_0$ and $c_1$.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Fitting an ellipsoid to a quadratic number of random points
Authors:
Afonso S. Bandeira,
Antoine Maillard,
Shahar Mendelson,
Elliot Paquette
Abstract:
We consider the problem $(\mathrm{P})$ of fitting $n$ standard Gaussian random vectors in $\mathbb{R}^d$ to the boundary of a centered ellipsoid, as $n, d \to \infty$. This problem is conjectured to have a sharp feasibility transition: for any $\varepsilon > 0$, if $n \leq (1 - \varepsilon) d^2 / 4$ then $(\mathrm{P})$ has a solution with high probability, while $(\mathrm{P})$ has no solutions wit…
▽ More
We consider the problem $(\mathrm{P})$ of fitting $n$ standard Gaussian random vectors in $\mathbb{R}^d$ to the boundary of a centered ellipsoid, as $n, d \to \infty$. This problem is conjectured to have a sharp feasibility transition: for any $\varepsilon > 0$, if $n \leq (1 - \varepsilon) d^2 / 4$ then $(\mathrm{P})$ has a solution with high probability, while $(\mathrm{P})$ has no solutions with high probability if $n \geq (1 + \varepsilon) d^2 /4$. So far, only a trivial bound $n \geq d^2 / 2$ is known on the negative side, while the best results on the positive side assume $n \leq d^2 / \mathrm{polylog}(d)$. In this work, we improve over previous approaches using a key result of Bartl & Mendelson on the concentration of Gram matrices of random vectors under mild assumptions on their tail behavior. This allows us to give a simple proof that $(\mathrm{P})$ is feasible with high probability when $n \leq d^2 / C$, for a (possibly large) constant $C > 0$.
△ Less
Submitted 3 July, 2023;
originally announced July 2023.
-
Catalytic Embeddings of Quantum Circuits
Authors:
M. Amy,
M. Crawford,
A. N. Glaudell,
M. L. Macasieb,
S. S. Mendelson,
N. J. Ross
Abstract:
If a set $\mathbb{G}$ of quantum gates is countable, then the operators that can be exactly represented by a circuit over $\mathbb{G}$ form a strict subset of the collection of all unitary operators. When $\mathbb{G}$ is universal, one circumvents this limitation by resorting to repeated gate approximations: every occurrence of a gate which cannot be exactly represented over $\mathbb{G}$ is replac…
▽ More
If a set $\mathbb{G}$ of quantum gates is countable, then the operators that can be exactly represented by a circuit over $\mathbb{G}$ form a strict subset of the collection of all unitary operators. When $\mathbb{G}$ is universal, one circumvents this limitation by resorting to repeated gate approximations: every occurrence of a gate which cannot be exactly represented over $\mathbb{G}$ is replaced by an approximating circuit. Here, we introduce catalytic embeddings, which provide an alternative to repeated gate approximations. With catalytic embeddings, approximations are relegated to the preparation of a fixed number of reusable resource states called catalysts. Because the catalysts only need to be prepared once, when catalytic embeddings exist they always produce shorter circuits, in the limit of increasing gate count and target precision. In the present paper, we lay the foundations of the theory of catalytic embeddings and we establish several of their structural properties. In addition, we provide methods to design catalytic embeddings, showing that their construction can be reduced to that of a single fixed matrix when the gates involved have entries in well-behaved rings of algebraic numbers. Finally, we showcase some concrete examples and applications. Notably, we show that catalytic embeddings generalize a technique previously used to implement the Quantum Fourier Transform over the Clifford+$T$ gate set with $O(n)$ gate approximations.
△ Less
Submitted 12 May, 2023;
originally announced May 2023.
-
Structure preservation via the Wasserstein distance
Authors:
Daniel Bartl,
Shahar Mendelson
Abstract:
We show that under minimal assumptions on a random vector $X\in\mathbb{R}^d$ and with high probability, given $m$ independent copies of $X$, the coordinate distribution of each vector $(\langle X_i,θ\rangle)_{i=1}^m$ is dictated by the distribution of the true marginal $\langle X,θ\rangle$. Specifically, we show that with high probability, \[\sup_{θ\in S^{d-1}} \left( \frac{1}{m}\sum_{i=1}^m \left…
▽ More
We show that under minimal assumptions on a random vector $X\in\mathbb{R}^d$ and with high probability, given $m$ independent copies of $X$, the coordinate distribution of each vector $(\langle X_i,θ\rangle)_{i=1}^m$ is dictated by the distribution of the true marginal $\langle X,θ\rangle$. Specifically, we show that with high probability, \[\sup_{θ\in S^{d-1}} \left( \frac{1}{m}\sum_{i=1}^m \left|\langle X_i,θ\rangle^\sharp - λ^θ_i \right|^2 \right)^{1/2} \leq c \left( \frac{d}{m} \right)^{1/4},\] where $λ^θ_i = m\int_{(\frac{i-1}{m}, \frac{i}{m}]} F_{ \langle X,θ\rangle }^{-1}(u)\,du$ and $a^\sharp$ denotes the monotone non-decreasing rearrangement of $a$. Moreover, this estimate is optimal.
The proof follows from a sharp estimate on the worst Wasserstein distance between a marginal of $X$ and its empirical counterpart, $\frac{1}{m} \sum_{i=1}^m δ_{\langle X_i, θ\rangle}$.
△ Less
Submitted 21 September, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Fast metric embedding into the Hamming cube
Authors:
Sjoerd Dirksen,
Shahar Mendelson,
Alexander Stollenwerk
Abstract:
We consider the problem of embedding a subset of $\mathbb{R}^n$ into a low-dimensional Hamming cube in an almost isometric way. We construct a simple, data-oblivious, and computationally efficient map that achieves this task with high probability: we first apply a specific structured random matrix, which we call the double circulant matrix; using that matrix requires linear storage and matrix-vect…
▽ More
We consider the problem of embedding a subset of $\mathbb{R}^n$ into a low-dimensional Hamming cube in an almost isometric way. We construct a simple, data-oblivious, and computationally efficient map that achieves this task with high probability: we first apply a specific structured random matrix, which we call the double circulant matrix; using that matrix requires linear storage and matrix-vector multiplication can be performed in near-linear time. We then binarize each vector by comparing each of its entries to a random threshold, selected uniformly at random from a well-chosen interval.
We estimate the number of bits required for this encoding scheme in terms of two natural geometric complexity parameters of the set - its Euclidean covering numbers and its localized Gaussian complexity. The estimate we derive turns out to be the best that one can hope for - up to logarithmic terms.
The key to the proof is a phenomenon of independent interest: we show that the double circulant matrix mimics the behavior of a Gaussian matrix in two important ways. First, it maps an arbitrary set in $\mathbb{R}^n$ into a set of well-spread vectors. Second, it yields a fast near-isometric embedding of any finite subset of $\ell_2^n$ into $\ell_1^m$. This embedding achieves the same dimension reduction as a Gaussian matrix in near-linear time, under an optimal condition - up to logarithmic factors - on the number of points to be embedded. This improves a well-known construction due to Ailon and Chazelle.
△ Less
Submitted 6 September, 2022; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Sharp estimates on random hyperplane tessellations
Authors:
Sjoerd Dirksen,
Shahar Mendelson,
Alexander Stollenwerk
Abstract:
We study the problem of generating a hyperplane tessellation of an arbitrary set $T$ in $\mathbb{R}^n$, ensuring that the Euclidean distance between any two points corresponds to the fraction of hyperplanes separating them up to a pre-specified error $δ$. We focus on random gaussian tessellations with uniformly distributed shifts and derive sharp bounds on the number of hyperplanes $m$ that are re…
▽ More
We study the problem of generating a hyperplane tessellation of an arbitrary set $T$ in $\mathbb{R}^n$, ensuring that the Euclidean distance between any two points corresponds to the fraction of hyperplanes separating them up to a pre-specified error $δ$. We focus on random gaussian tessellations with uniformly distributed shifts and derive sharp bounds on the number of hyperplanes $m$ that are required. Surprisingly, our lower estimates falsify the conjecture that $m\sim \ell_*^2(T)/δ^2$, where $\ell_*^2(T)$ is the gaussian width of $T$, is optimal.
△ Less
Submitted 13 January, 2022;
originally announced January 2022.
-
Random embeddings with an almost Gaussian distortion
Authors:
Daniel Bartl,
Shahar Mendelson
Abstract:
Let $X$ be a symmetric, isotropic random vector in $\mathbb{R}^m$ and let $X_1...,X_n$ be independent copies of $X$. We show that under mild assumptions on $\|X\|_2$ (a suitable thin-shell bound) and on the tail-decay of the marginals $\langle X,u\rangle$, the random matrix $A$, whose columns are $X_i/\sqrt{m}$ exhibits a Gaussian-like behaviour in the following sense: for an arbitrary subset of…
▽ More
Let $X$ be a symmetric, isotropic random vector in $\mathbb{R}^m$ and let $X_1...,X_n$ be independent copies of $X$. We show that under mild assumptions on $\|X\|_2$ (a suitable thin-shell bound) and on the tail-decay of the marginals $\langle X,u\rangle$, the random matrix $A$, whose columns are $X_i/\sqrt{m}$ exhibits a Gaussian-like behaviour in the following sense: for an arbitrary subset of $T\subset \mathbb{R}^n$, the distortion $\sup_{t \in T} | \|At\|_2^2 - \|t\|_2^2 |$ is almost the same as if $A$ were a Gaussian matrix.
A simple outcome of our result is that if $X$ is a symmetric, isotropic, log-concave random vector and $n \leq m \leq c_1(α)n^α$ for some $α>1$, then with high probability, the extremal singular values of $A$ satisfy the optimal estimate: $1-c_2(α) \sqrt{n/m} \leq λ_{\rm min} \leq λ_{\rm max} \leq 1+c_2(α) \sqrt{n/m}$.
△ Less
Submitted 4 February, 2022; v1 submitted 29 June, 2021;
originally announced June 2021.
-
Column randomization and almost-isometric embeddings
Authors:
Shahar Mendelson
Abstract:
The matrix $A:\mathbb{R}^n \to \mathbb{R}^m$ is $(δ,k)$-regular if for any $k$-sparse vector $x$, $$ \left| \|Ax\|_2^2-\|x\|_2^2\right| \leq δ\sqrt{k} \|x\|_2^2. $$ We show that if $A$ is $(δ,k)$-regular for $1 \leq k \leq 1/δ^2$, then by multiplying the columns of $A$ by independent random signs, the resulting random ensemble $A_ε$ acts on an arbitrary subset $T \subset \mathbb{R}^n$ (almost) as…
▽ More
The matrix $A:\mathbb{R}^n \to \mathbb{R}^m$ is $(δ,k)$-regular if for any $k$-sparse vector $x$, $$ \left| \|Ax\|_2^2-\|x\|_2^2\right| \leq δ\sqrt{k} \|x\|_2^2. $$ We show that if $A$ is $(δ,k)$-regular for $1 \leq k \leq 1/δ^2$, then by multiplying the columns of $A$ by independent random signs, the resulting random ensemble $A_ε$ acts on an arbitrary subset $T \subset \mathbb{R}^n$ (almost) as if it were gaussian, and with the optimal probability estimate: if $\ell_*(T)$ is the gaussian mean-width of $T$ and $d_T=\sup_{t \in T} \|t\|_2$, then with probability at least $1-2\exp(-c(\ell_*(T)/d_T)^2)$, $$ \sup_{t \in T} \left| \|A_εt\|_2^2-\|t\|_2^2 \right| \leq C\left(Λd_T δ\ell_*(T)+(δ\ell_*(T))^2 \right), $$ where $Λ=\max\{1,δ^2\log(nδ^2)\}$. This estimate is optimal for $0<δ\leq 1/\sqrt{\log n}$.
△ Less
Submitted 9 March, 2021;
originally announced March 2021.
-
On Monte-Carlo methods in convex stochastic optimization
Authors:
Daniel Bartl,
Shahar Mendelson
Abstract:
We develop a novel procedure for estimating the optimizer of general convex stochastic optimization problems of the form $\min_{x\in\mathcal{X}} \mathbb{E}[F(x,ξ)]$, when the given data is a finite independent sample selected according to $ξ$. The procedure is based on a median-of-means tournament, and is the first procedure that exhibits the optimal statistical performance in heavy tailed situati…
▽ More
We develop a novel procedure for estimating the optimizer of general convex stochastic optimization problems of the form $\min_{x\in\mathcal{X}} \mathbb{E}[F(x,ξ)]$, when the given data is a finite independent sample selected according to $ξ$. The procedure is based on a median-of-means tournament, and is the first procedure that exhibits the optimal statistical performance in heavy tailed situations: we recover the asymptotic rates dictated by the central limit theorem in a non-asymptotic manner once the sample size exceeds some explicitly computable threshold. Additionally, our results apply in the high-dimensional setup, as the threshold sample size exhibits the optimal dependence on the dimension (up to a logarithmic factor). The general setting allows us to recover recent results on multivariate mean estimation and linear regression in heavy-tailed situations and to prove the first sharp, non-asymptotic results for the portfolio optimization problem.
△ Less
Submitted 25 January, 2022; v1 submitted 19 January, 2021;
originally announced January 2021.
-
An isomorphic Dvoretzky-Milman Theorem using general random ensembles
Authors:
Shahar Mendelson
Abstract:
We construct rather general random ensembles that yield the optimal (isomorphic) estimate in the Dvoretzky-Milman Theorem. This is the first construction of non gaussian/spherical ensembles that exhibit the optimal behaviour. The ensembles constructed here need not satisfy any rotation invariance and can be rather heavy-tailed.
We construct rather general random ensembles that yield the optimal (isomorphic) estimate in the Dvoretzky-Milman Theorem. This is the first construction of non gaussian/spherical ensembles that exhibit the optimal behaviour. The ensembles constructed here need not satisfy any rotation invariance and can be rather heavy-tailed.
△ Less
Submitted 27 October, 2020;
originally announced October 2020.
-
Multivariate mean estimation with direction-dependent accuracy
Authors:
Gabor Lugosi,
Shahar Mendelson
Abstract:
We consider the problem of estimating the mean of a random vector based on $N$ independent, identically distributed observations. We prove the existence of an estimator that has a near-optimal error in all directions in which the variance of the one dimensional marginal of the random vector is not too small: with probability $1-δ$, the procedure returns $\whμ_N$ which satisfies that for every dire…
▽ More
We consider the problem of estimating the mean of a random vector based on $N$ independent, identically distributed observations. We prove the existence of an estimator that has a near-optimal error in all directions in which the variance of the one dimensional marginal of the random vector is not too small: with probability $1-δ$, the procedure returns $\whμ_N$ which satisfies that for every direction $u \in S^{d-1}$, \[ \inr{\whμ_N - μ, u}\le \frac{C}{\sqrt{N}} \left( σ(u)\sqrt{\log(1/δ)} + \left(\E\|X-\EXP X\|_2^2\right)^{1/2} \right)~, \] where $σ^2(u) = \var(\inr{X,u})$ and $C$ is a constant. To achieve this, we require only slightly more than the existence of the covariance matrix, in the form of a certain moment-equivalence assumption.
The proof relies on novel bounds for the ratio of empirical and true probabilities that hold uniformly over certain classes of random variables.
△ Less
Submitted 22 October, 2020;
originally announced October 2020.
-
Approximating $L_p$ unit balls via random sampling
Authors:
Shahar Mendelson
Abstract:
Let $X$ be an isotropic random vector in $R^d$ that satisfies that for every $v \in S^{d-1}$, $\|<X,v>\|_{L_q} \leq L \|<X,v>\|_{L_p}$ for some $q \geq 2p$. We show that for $0<\varepsilon<1$, a set of $N = c(p,q,\varepsilon) d$ random points, selected independently according to $X$, can be used to construct a $1 \pm \varepsilon$ approximation of the $L_p$ unit ball endowed on $R^d$ by $X$. Moreov…
▽ More
Let $X$ be an isotropic random vector in $R^d$ that satisfies that for every $v \in S^{d-1}$, $\|<X,v>\|_{L_q} \leq L \|<X,v>\|_{L_p}$ for some $q \geq 2p$. We show that for $0<\varepsilon<1$, a set of $N = c(p,q,\varepsilon) d$ random points, selected independently according to $X$, can be used to construct a $1 \pm \varepsilon$ approximation of the $L_p$ unit ball endowed on $R^d$ by $X$. Moreover, $c(p,q,\varepsilon) \leq c^p \varepsilon^{-2}\log(2/\varepsilon)$; when $q=2p$ the approximation is achieved with probability at least $1-2\exp(-cN \varepsilon^2/\log^2(2/\varepsilon))$ and if $q$ is much larger than $p$---say, $q=4p$, the approximation is achieved with probability at least $1-2\exp(-cN \varepsilon^2)$.
In particular, when $X$ is a log-concave random vector, this estimate improves the previous state-of-the-art---that $N=c^\prime(p,\varepsilon) d^{p/2}\log d$ random points are enough, and that the approximation is valid with constant probability.
△ Less
Submitted 19 August, 2020;
originally announced August 2020.
-
Transfer Learning of Photometric Phenotypes in Agriculture Using Metadata
Authors:
Dan Halbersberg,
Aharon Bar Hillel,
Shon Mendelson,
Daniel Koster,
Lena Karol,
Boaz Lerner
Abstract:
Estimation of photometric plant phenotypes (e.g., hue, shine, chroma) in field conditions is important for decisions on the expected yield quality, fruit ripeness, and need for further breeding. Estimating these from images is difficult due to large variances in lighting conditions, shadows, and sensor properties. We combine the image and metadata regarding capturing conditions embedded into a net…
▽ More
Estimation of photometric plant phenotypes (e.g., hue, shine, chroma) in field conditions is important for decisions on the expected yield quality, fruit ripeness, and need for further breeding. Estimating these from images is difficult due to large variances in lighting conditions, shadows, and sensor properties. We combine the image and metadata regarding capturing conditions embedded into a network, enabling more accurate estimation and transfer between different conditions. Compared to a state-of-the-art deep CNN and a human expert, metadata embedding improves the estimation of the tomato's hue and chroma.
△ Less
Submitted 1 April, 2020;
originally announced April 2020.
-
Learning bounded subsets of $L_p$
Authors:
Shahar Mendelson
Abstract:
We study learning problems in which the underlying class is a bounded subset of $L_p$ and the target $Y$ belongs to $L_p$. Previously, minimax sample complexity estimates were known under such boundedness assumptions only when $p=\infty$. We present a sharp sample complexity estimate that holds for any $p > 4$. It is based on a learning procedure that is suited for heavy-tailed problems.
We study learning problems in which the underlying class is a bounded subset of $L_p$ and the target $Y$ belongs to $L_p$. Previously, minimax sample complexity estimates were known under such boundedness assumptions only when $p=\infty$. We present a sharp sample complexity estimate that holds for any $p > 4$. It is based on a learning procedure that is suited for heavy-tailed problems.
△ Less
Submitted 4 February, 2020;
originally announced February 2020.
-
Robust multivariate mean estimation: the optimality of trimmed mean
Authors:
Gabor Lugosi,
Shahar Mendelson
Abstract:
We consider the problem of estimating the mean of a random vector based on i.i.d. observations and adversarial contamination. We introduce a multivariate extension of the trimmed-mean estimator and show its optimal performance under minimal conditions.
We consider the problem of estimating the mean of a random vector based on i.i.d. observations and adversarial contamination. We introduce a multivariate extension of the trimmed-mean estimator and show its optimal performance under minimal conditions.
△ Less
Submitted 22 February, 2020; v1 submitted 26 July, 2019;
originally announced July 2019.
-
On the geometry of polytopes generated by heavy-tailed random vectors
Authors:
Olivier Guédon,
Felix Krahmer,
Christian Kümmerle,
Shahar Mendelson,
Holger Rauhut
Abstract:
We study the geometry of centrally-symmetric random polytopes, generated by $N$ independent copies of a random vector $X$ taking values in $\mathbb{R}^n$. We show that under minimal assumptions on $X$, for $N \gtrsim n$ and with high probability, the polytope contains a deterministic set that is naturally associated with the random vector---namely, the polar of a certain floating body. This solves…
▽ More
We study the geometry of centrally-symmetric random polytopes, generated by $N$ independent copies of a random vector $X$ taking values in $\mathbb{R}^n$. We show that under minimal assumptions on $X$, for $N \gtrsim n$ and with high probability, the polytope contains a deterministic set that is naturally associated with the random vector---namely, the polar of a certain floating body. This solves the long-standing question on whether such a random polytope contains a canonical body. Moreover, by identifying the floating bodies associated with various random vectors we recover the estimates that have been obtained previously, and thanks to the minimal assumptions on $X$ we derive estimates in cases that had been out of reach, involving random polytopes generated by heavy-tailed random vectors (e.g., when $X$ is $q$-stable or when $X$ has an unconditional structure). Finally, the structural results are used for the study of a fundamental question in compressive sensing---noise blind sparse recovery.
△ Less
Submitted 16 July, 2019;
originally announced July 2019.
-
On a special presentation of matrix algebras
Authors:
Geir Agnarsson,
Samuel S. Mendelson
Abstract:
Recognizing when a ring is a complete matrix ring is of significant importance in algebra. It is well-known folklore that a ring $R$ is a complete $n\times n$ matrix ring, so $R\cong M_{n}(S)$ for some ring $S$, if and only if it contains a set of $n\times n$ matrix units $\{e_{ij}\}_{i,j=1}^n$. A more recent and less known result states that a ring $R$ is a complete $(m+n)\times(m+n)$ matrix ring…
▽ More
Recognizing when a ring is a complete matrix ring is of significant importance in algebra. It is well-known folklore that a ring $R$ is a complete $n\times n$ matrix ring, so $R\cong M_{n}(S)$ for some ring $S$, if and only if it contains a set of $n\times n$ matrix units $\{e_{ij}\}_{i,j=1}^n$. A more recent and less known result states that a ring $R$ is a complete $(m+n)\times(m+n)$ matrix ring if and only if, $R$ contains three elements, $a$, $b$, and $f$, satisfying the two relations $af^m+f^nb=1$ and $f^{m+n}=0$. In many instances the two elements $a$ and $b$ can be replaced by appropriate powers $a^i$ and $a^j$ of a single element $a$ respectively. In general very little is known about the structure of the ring $S$. In this article we study in depth the case $m=n=1$ when $R\cong M_2(S)$. More specifically we study the universal algebra over a commutative ring $A$ with elements $x$ and $y$ that satisfy the relations $x^iy+yx^j=1$ and $y^2=0$. We describe completely the structure of these $A$-algebras and their underlying rings when $\gcd(i,j)=1$. Finally we obtain results that fully determine when there are surjections onto $M_2({\mathbb F})$ when ${\mathbb F}$ is a base field ${\mathbb Q}$ or ${\mathbb Z}_p$ for a prime number $p$.
△ Less
Submitted 11 July, 2019;
originally announced July 2019.
-
Mean estimation and regression under heavy-tailed distributions--a survey
Authors:
Gabor Lugosi,
Shahar Mendelson
Abstract:
We survey some of the recent advances in mean estimation and regression function estimation. In particular, we describe sub-Gaussian mean estimators for possibly heavy-tailed data both in the univariate and multivariate settings. We focus on estimators based on median-of-means techniques but other methods such as the trimmed mean and Catoni's estimator are also reviewed. We give detailed proofs fo…
▽ More
We survey some of the recent advances in mean estimation and regression function estimation. In particular, we describe sub-Gaussian mean estimators for possibly heavy-tailed data both in the univariate and multivariate settings. We focus on estimators based on median-of-means techniques but other methods such as the trimmed mean and Catoni's estimator are also reviewed. We give detailed proofs for the cornerstone results. We dedicate a section on statistical learning problems--in particular, regression function estimation--in the presence of possibly heavy-tailed data.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Quantum-Assisted Clustering Algorithms for NISQ-Era Devices
Authors:
Samuel S. Mendelson,
Robert W. Strand,
Guy B. Oldaker IV,
Jacob M. Farinholt
Abstract:
In the NISQ-era of quantum computing, we should not expect to see quantum devices that provide an exponential improvement in runtime for practical problems, due to the lack of error correction and small number of qubits available. Nevertheless, these devices should be able to provide other performance improvements, particularly when combined with existing classical machines. In this article, we de…
▽ More
In the NISQ-era of quantum computing, we should not expect to see quantum devices that provide an exponential improvement in runtime for practical problems, due to the lack of error correction and small number of qubits available. Nevertheless, these devices should be able to provide other performance improvements, particularly when combined with existing classical machines. In this article, we develop several hybrid quantum-classical clustering algorithms that can be employed as subroutines on small, NISQ-era devices. These new hybrid algorithms require a number of qubits that is at most logarithmic in the size of the data, provide performance improvement and/or runtime improvement over their classical counterparts, and do not require a black-box oracle. Consequently, we are able to provide a promising near-term application of NISQ-era devices.
△ Less
Submitted 27 June, 2019; v1 submitted 18 April, 2019;
originally announced April 2019.
-
Stable recovery and the coordinate small-ball behaviour of random vectors
Authors:
Shahar Mendelson,
Grigoris Paouris
Abstract:
Recovery procedures in various application in Data Science are based on \emph{stable point separation}. In its simplest form, stable point separation implies that if $f$ is "far away" from $0$, and one is given a random sample $(f(Z_i))_{i=1}^m$ where a proportional number of the sample points may be corrupted by noise, that information is still enough to exhibit that $f$ is far from $0$.
Stable…
▽ More
Recovery procedures in various application in Data Science are based on \emph{stable point separation}. In its simplest form, stable point separation implies that if $f$ is "far away" from $0$, and one is given a random sample $(f(Z_i))_{i=1}^m$ where a proportional number of the sample points may be corrupted by noise, that information is still enough to exhibit that $f$ is far from $0$.
Stable point separation is well understood in the context of iid sampling, and to explore it for general sampling methods we introduce a new notion---the \emph{coordinate small-ball} of a random vector $X$. Roughly put, this feature captures the number of "relatively large coordinates" of $(|<TX,u_i>|)_{i=1}^m$, where $T:\mathbb{R}^n \to \mathbb{R}^m$ is an arbitrary linear operator and $(u_i)_{i=1}^m$ is any fixed orthonormal basis of $\mathbb{R}^m$.
We show that under the bare-minimum assumptions on $X$, and with high probability, many of the values $|<TX,u_i>|$ are at least of the order $\|T\|_{S_2}/\sqrt{m}$. As a result, the "coordinate structure" of $TX$ exhibits the typical Euclidean norm of $TX$ and does so in a stable way.
One outcome of our analysis is that random sub-sampled convolutions satisfy stable point separation under minimal assumptions on the generating random vector---a fact that was known previously only in a highly restrictive setup, namely, for random vectors with iid subgaussian coordinates.
△ Less
Submitted 17 April, 2019;
originally announced April 2019.
-
On the geometry of random polytopes
Authors:
Shahar Mendelson
Abstract:
We present a simple proof to a fact recently established in [5]: let $ξ$ be a symmetric random variable that has variance $1$, let $Γ=(ξ_{ij})$ be an $N \times n$ random matrix whose entries are independent copies of $ξ$, and set $X_1,...,X_N$ to be the rows of $Γ$. Then under minimal assumptions on $ξ$ and as long as $N \geq c_1n$,…
▽ More
We present a simple proof to a fact recently established in [5]: let $ξ$ be a symmetric random variable that has variance $1$, let $Γ=(ξ_{ij})$ be an $N \times n$ random matrix whose entries are independent copies of $ξ$, and set $X_1,...,X_N$ to be the rows of $Γ$. Then under minimal assumptions on $ξ$ and as long as $N \geq c_1n$, $$ c_2 \bigl(B_\infty^n \cap \sqrt{\log(eN/n)} B_2^n \bigr) \subset {\rm absconv}(X_1,...,X_N) $$ with high probability.
△ Less
Submitted 5 February, 2019;
originally announced February 2019.
-
Robust one-bit compressed sensing with partial circulant matrices
Authors:
Sjoerd Dirksen,
Shahar Mendelson
Abstract:
We present optimal sample complexity estimates for one-bit compressed sensing problems in a realistic scenario: the procedure uses a structured matrix (a randomly sub-sampled circulant matrix) and is robust to analog pre-quantization noise as well as to adversarial bit corruptions in the quantization process. Our results imply that quantization is not a statistically expensive procedure in the pre…
▽ More
We present optimal sample complexity estimates for one-bit compressed sensing problems in a realistic scenario: the procedure uses a structured matrix (a randomly sub-sampled circulant matrix) and is robust to analog pre-quantization noise as well as to adversarial bit corruptions in the quantization process. Our results imply that quantization is not a statistically expensive procedure in the presence of nontrivial analog noise: recovery requires the same sample size one would have needed had the measurement matrix been Gaussian and the noisy analog measurements been given as data.
△ Less
Submitted 17 December, 2018;
originally announced December 2018.
-
Robust covariance estimation under $L_4-L_2$ norm equivalence
Authors:
Shahar Mendelson,
Nikita Zhivotovskiy
Abstract:
Let $X$ be a centered random vector taking values in $\mathbb{R}^d$ and let $Σ= \mathbb{E}(X\otimes X)$ be its covariance matrix. We show that if $X$ satisfies an $L_4-L_2$ norm equivalence, there is a covariance estimator $\hatΣ$ that exhibits the optimal performance one would expect had $X$ been a gaussian vector. The procedure also improves the current state-of-the-art regarding high probabilit…
▽ More
Let $X$ be a centered random vector taking values in $\mathbb{R}^d$ and let $Σ= \mathbb{E}(X\otimes X)$ be its covariance matrix. We show that if $X$ satisfies an $L_4-L_2$ norm equivalence, there is a covariance estimator $\hatΣ$ that exhibits the optimal performance one would expect had $X$ been a gaussian vector. The procedure also improves the current state-of-the-art regarding high probability bounds in the subgaussian case (sharp results were only known in expectation or with constant probability). In both scenarios the new bound does not depend explicitly on the dimension $d$, but rather on the effective rank of the covariance matrix $Σ$.
△ Less
Submitted 26 March, 2019; v1 submitted 27 September, 2018;
originally announced September 2018.
-
Near-optimal mean estimators with respect to general norms
Authors:
Gábor Lugosi,
Shahar Mendelson
Abstract:
We study the problem of estimating the mean of a random vector in $\mathbb{R}^d$ based on an i.i.d.\ sample, when the accuracy of the estimator is measured by a general norm on $\mathbb{R}^d$. We construct an estimator (that depends on the norm) that achieves an essentially optimal accuracy/confidence tradeoff under the only assumption that the random vector has a well-defined covariance matrix. T…
▽ More
We study the problem of estimating the mean of a random vector in $\mathbb{R}^d$ based on an i.i.d.\ sample, when the accuracy of the estimator is measured by a general norm on $\mathbb{R}^d$. We construct an estimator (that depends on the norm) that achieves an essentially optimal accuracy/confidence tradeoff under the only assumption that the random vector has a well-defined covariance matrix. The estimator is based on the construction of a uniform median-of-means estimator in a class of real valued functions that may be of independent interest.
△ Less
Submitted 16 June, 2018;
originally announced June 2018.
-
Non-Gaussian Hyperplane Tessellations and Robust One-Bit Compressed Sensing
Authors:
Sjoerd Dirksen,
Shahar Mendelson
Abstract:
We show that a tessellation generated by a small number of random affine hyperplanes can be used to approximate Euclidean distances between any two points in an arbitrary bounded set $T$, where the random hyperplanes are generated by subgaussian or heavy-tailed normal vectors and uniformly distributed shifts. We derive quantitative bounds on the number of hyperplanes needed for constructing such t…
▽ More
We show that a tessellation generated by a small number of random affine hyperplanes can be used to approximate Euclidean distances between any two points in an arbitrary bounded set $T$, where the random hyperplanes are generated by subgaussian or heavy-tailed normal vectors and uniformly distributed shifts. We derive quantitative bounds on the number of hyperplanes needed for constructing such tessellations in terms of natural metric complexity measures of $T$ and the desired approximation error. Our work extends significantly prior results in this direction, which were restricted to Gaussian hyperplane tessellations of subsets of the Euclidean unit sphere.
As an application, we obtain new reconstruction results in memoryless one-bit compressed sensing with non-Gaussian measurement matrices. We show that by quantizing at uniformly distributed thresholds, it is possible to accurately reconstruct low-complexity signals from a small number of one-bit quantized measurements, even if the measurement vectors are drawn from a heavy-tailed distribution. Our reconstruction results are uniform in nature and robust in the presence of pre-quantization noise on the analog measurements as well as adversarial bit corruptions in the quantization process. Moreover we show that if the measurement matrix is subgaussian then accurate recovery can be achieved via a convex program.
△ Less
Submitted 13 August, 2018; v1 submitted 23 May, 2018;
originally announced May 2018.
-
Approximating the covariance ellipsoid
Authors:
Shahar Mendelson
Abstract:
We explore ways in which the covariance ellipsoid ${\cal B}=\{v \in \mathbb{R}^d : \mathbb{E} <X,v>^2 \leq 1\}$ of a centred random vector $X$ in $\mathbb{R}^d$ can be approximated by a simple set. The data one is given for constructing the approximating set consists of $X_1,...,X_N$ that are independent and distributed as $X$.
We present a general method that can be used to construct such appro…
▽ More
We explore ways in which the covariance ellipsoid ${\cal B}=\{v \in \mathbb{R}^d : \mathbb{E} <X,v>^2 \leq 1\}$ of a centred random vector $X$ in $\mathbb{R}^d$ can be approximated by a simple set. The data one is given for constructing the approximating set consists of $X_1,...,X_N$ that are independent and distributed as $X$.
We present a general method that can be used to construct such approximations and implement it for two types of approximating sets. We first construct a (random) set ${\cal K}$ defined by a union of intersections of slabs $H_{z,α}=\{v \in \mathbb{R}^d : |<z,v>| \leq α\}$ (and therefore ${\cal K}$ is actually the output of a neural network with two hidden layers). The slabs are generated using $X_1,...,X_N$, and under minimal assumptions on $X$ (e.g., $X$ can be heavy-tailed) it suffices that $N = c_1d η^{-4}\log(2/η)$ to ensure that $(1-η) {\cal K} \subset {\cal B} \subset (1+η){\cal K}$. In some cases (e.g., if $X$ is rotation invariant and has marginals that are well behaved in some weak sense), a smaller sample size suffices: $N = c_1dη^{-2}\log(2/η)$.
We then show that if the slabs are replaced by randomly generated ellipsoids defined using $X_1,...,X_N$, the same degree of approximation is true when $N \geq c_2dη^{-2}\log(2/η)$.
The construction we use is based on the small-ball method.
△ Less
Submitted 15 April, 2018;
originally announced April 2018.
-
Concentration of the spectral norm of Erdős-Rényi random graphs
Authors:
Gábor Lugosi,
Shahar Mendelson,
Nikita Zhivotovskiy
Abstract:
We present results on the concentration properties of the spectral norm $\|A_p\|$ of the adjacency matrix $A_p$ of an Erdős-Rényi random graph $G(n,p)$. First we consider the Erdős-Rényi random graph process and prove that $\|A_p\|$ is uniformly concentrated over the range $p\in [C\log n/n,1]$. The analysis is based on delocalization arguments, uniform laws of large numbers, together with the entr…
▽ More
We present results on the concentration properties of the spectral norm $\|A_p\|$ of the adjacency matrix $A_p$ of an Erdős-Rényi random graph $G(n,p)$. First we consider the Erdős-Rényi random graph process and prove that $\|A_p\|$ is uniformly concentrated over the range $p\in [C\log n/n,1]$. The analysis is based on delocalization arguments, uniform laws of large numbers, together with the entropy method to prove concentration inequalities. As an application of our techniques we prove sharp sub-Gaussian moment inequalities for $\|A_p\|$ for all $p\in [c\log^3n/n,1]$ that improve the general bounds of Alon, Krivelevich, and Vu (2001) and some of the more recent results of Erdős et al. (2013). Both results are consistent with the asymptotic result of Füredi and Komlós (1981) that holds for fixed $p$ as $n\to \infty$.
△ Less
Submitted 20 November, 2018; v1 submitted 7 January, 2018;
originally announced January 2018.
-
A remark on "Robust machine learning by median-of-means"
Authors:
Gabor Lugosi,
Shahar Mendelson
Abstract:
We explore the recent results announced in "Robust machine learning by median-of-means: theory and practice" by G. Lecué and M. Lerasle. We show that these results are, in fact, almost obvious outcomes of the machinery developed in [4] for the study of tournament procedures.
We explore the recent results announced in "Robust machine learning by median-of-means: theory and practice" by G. Lecué and M. Lerasle. We show that these results are, in fact, almost obvious outcomes of the machinery developed in [4] for the study of tournament procedures.
△ Less
Submitted 19 December, 2017;
originally announced December 2017.
-
Extending the scope of the small-ball method
Authors:
Shahar Mendelson
Abstract:
The small-ball method was introduced as a way of obtaining a high probability, isomorphic lower bound on the quadratic empirical process, under weak assumptions on the indexing class. The key assumption was that class members satisfy a uniform small-ball estimate: that $Pr(|f| \geq κ\|f\|_{L_2}) \geq δ$ for given constants $κ$ and $δ$.
Here we extend the small-ball method and obtain a high proba…
▽ More
The small-ball method was introduced as a way of obtaining a high probability, isomorphic lower bound on the quadratic empirical process, under weak assumptions on the indexing class. The key assumption was that class members satisfy a uniform small-ball estimate: that $Pr(|f| \geq κ\|f\|_{L_2}) \geq δ$ for given constants $κ$ and $δ$.
Here we extend the small-ball method and obtain a high probability, almost-isometric (rather than isomorphic) lower bound on the quadratic empirical process. The scope of the result is considerably wider than the small-ball method: there is no need for class members to satisfy a uniform small-ball condition, and moreover, motivated by the notion of tournament learning procedures, the result is stable under a `majority vote'.
△ Less
Submitted 15 June, 2020; v1 submitted 4 September, 2017;
originally announced September 2017.
-
An optimal unrestricted learning procedure
Authors:
Shahar Mendelson
Abstract:
We study learning problems involving arbitrary classes of functions $F$, distributions $X$ and targets $Y$. Because proper learning procedures, i.e., procedures that are only allowed to select functions in $F$, tend to perform poorly unless the problem satisfies some additional structural property (e.g., that $F$ is convex), we consider unrestricted learning procedures that are free to choose func…
▽ More
We study learning problems involving arbitrary classes of functions $F$, distributions $X$ and targets $Y$. Because proper learning procedures, i.e., procedures that are only allowed to select functions in $F$, tend to perform poorly unless the problem satisfies some additional structural property (e.g., that $F$ is convex), we consider unrestricted learning procedures that are free to choose functions outside the given class.
We present a new unrestricted procedure that is optimal in a very strong sense: the required sample complexity is essentially the best one can hope for, and the estimate holds for (almost) any problem, including heavy-tailed situations. Moreover, the sample complexity coincides with the what one would expect if $F$ were convex, even when $F$ is not. And if $F$ is convex, the procedure turns out to be proper. Thus, the unrestricted procedure is actually optimal in both realms, for convex classes as a proper procedure and for arbitrary classes as an unrestricted procedure.
△ Less
Submitted 14 April, 2018; v1 submitted 17 July, 2017;
originally announced July 2017.
-
Column normalization of a random measurement matrix
Authors:
Shahar Mendelson
Abstract:
In this note we answer a question of G. Lecué, by showing that column normalization of a random matrix with iid entries need not lead to good sparse recovery properties, even if the generating random variable has a reasonable moment growth. Specifically, for every $2 \leq p \leq c_1\log d$ we construct a random vector $X \in R^d$ with iid, mean-zero, variance $1$ coordinates, that satisfies…
▽ More
In this note we answer a question of G. Lecué, by showing that column normalization of a random matrix with iid entries need not lead to good sparse recovery properties, even if the generating random variable has a reasonable moment growth. Specifically, for every $2 \leq p \leq c_1\log d$ we construct a random vector $X \in R^d$ with iid, mean-zero, variance $1$ coordinates, that satisfies $\sup_{t \in S^{d-1}} \|<X,t>\|_{L_q} \leq c_2\sqrt{q}$ for every $2\leq q \leq p$.
We show that if $m \leq c_3\sqrt{p}d^{1/p}$ and $\tildeΓ:R^d \to R^m$ is the column-normalized matrix generated by $m$ independent copies of $X$, then with probability at least $1-2\exp(-c_4m)$, $\tildeΓ$ does not satisfy the exact reconstruction property of order $2$.
△ Less
Submitted 21 February, 2017;
originally announced February 2017.
-
Sub-Gaussian estimators of the mean of a random vector
Authors:
Gábor Lugosi,
Shahar Mendelson
Abstract:
We study the problem of estimating the mean of a random vector $X$ given a sample of $N$ independent, identically distributed points. We introduce a new estimator that achieves a purely sub-Gaussian performance under the only condition that the second moment of $X$ exists. The estimator is based on a novel concept of a multivariate median.
We study the problem of estimating the mean of a random vector $X$ given a sample of $N$ independent, identically distributed points. We introduce a new estimator that achieves a purely sub-Gaussian performance under the only condition that the second moment of $X$ exists. The estimator is based on a novel concept of a multivariate median.
△ Less
Submitted 1 February, 2017;
originally announced February 2017.
-
Regularization, sparse recovery, and median-of-means tournaments
Authors:
Gábor Lugosi,
Shahar Mendelson
Abstract:
A regularized risk minimization procedure for regression function estimation is introduced that achieves near optimal accuracy and confidence under general conditions, including heavy-tailed predictor and response variables. The procedure is based on median-of-means tournaments, introduced by the authors in [8]. It is shown that the new procedure outperforms standard regularized empirical risk min…
▽ More
A regularized risk minimization procedure for regression function estimation is introduced that achieves near optimal accuracy and confidence under general conditions, including heavy-tailed predictor and response variables. The procedure is based on median-of-means tournaments, introduced by the authors in [8]. It is shown that the new procedure outperforms standard regularized empirical risk minimization procedures such as lasso or slope in heavy-tailed problems.
△ Less
Submitted 29 November, 2017; v1 submitted 15 January, 2017;
originally announced January 2017.
-
Generalized Dual Sudakov Minoration via Dimension Reduction - A Program
Authors:
Shahar Mendelson,
Emanuel Milman,
Grigoris Paouris
Abstract:
We propose a program for establishing a conjectural extension to the class of (origin-symmetric) log-concave probability measures $μ$, of the classical dual Sudakov Minoration on the expectation of the supremum of a Gaussian process: \begin{equation} \label{eq:abstract} M(Z_p(μ), C \int ||x||_K dμ\cdot K) \leq \exp(C p) \;\;\, \forall p \geq 1 . \end{equation} Here $K$ is an origin-symmetric conve…
▽ More
We propose a program for establishing a conjectural extension to the class of (origin-symmetric) log-concave probability measures $μ$, of the classical dual Sudakov Minoration on the expectation of the supremum of a Gaussian process: \begin{equation} \label{eq:abstract} M(Z_p(μ), C \int ||x||_K dμ\cdot K) \leq \exp(C p) \;\;\, \forall p \geq 1 . \end{equation} Here $K$ is an origin-symmetric convex body, $Z_p(μ)$ is the $L_p$-centroid body associated to $μ$, $M(A,B)$ is the packing-number of $B$ in $A$, and $C > 0$ is a universal constant. The Program consists of first establishing a Weak Generalized Dual Sudakov Minoration, involving the dimension $n$ of the ambient space, which is then self-improved to a dimension-free estimate after applying a dimension-reduction step. The latter step may be thought of as a conjectural "small-ball one-sided" variant of the Johnson--Lindenstrauss dimension-reduction lemma. We establish the Weak Generalized Dual Sudakov Minoration for a variety of log-concave probability measures and convex bodies (for instance, this step is fully resolved assuming a positive answer to the Slicing Problem). The Separation Dimension-Reduction step is fully established for ellipsoids and, up to logarithmic factors in the dimension, for cubes, resulting in a corresponding Generalized (regular) Dual Sudakov Minoration estimate for these bodies and arbitrary log-concave measures, which are shown to be (essentially) best-possible. Along the way, we establish a regular version of (\ref{eq:abstract}) for all $p \geq n$ and provide a new direct proof of Sudakov Minoration via The Program.
△ Less
Submitted 6 May, 2018; v1 submitted 28 October, 2016;
originally announced October 2016.
-
Improved bounds for sparse recovery from subsampled random convolutions
Authors:
Shahar Mendelson,
Holger Rauhut,
Rachel Ward
Abstract:
We study the recovery of sparse vectors from subsampled random convolutions via $\ell_1$-minimization. We consider the setup in which both the subsampling locations as well as the generating vector are chosen at random. For a subgaussian generator with independent entries, we improve previously known estimates: if the sparsity $s$ is small enough, i.e., $s \lesssim \sqrt{n/\log(n)}$, we show that…
▽ More
We study the recovery of sparse vectors from subsampled random convolutions via $\ell_1$-minimization. We consider the setup in which both the subsampling locations as well as the generating vector are chosen at random. For a subgaussian generator with independent entries, we improve previously known estimates: if the sparsity $s$ is small enough, i.e., $s \lesssim \sqrt{n/\log(n)}$, we show that $m \gtrsim s \log(en/s)$ measurements are sufficient to recover $s$-sparse vectors in dimension $n$ with high probability, matching the well-known condition for recovery from standard Gaussian measurements. If $s$ is larger, then essentially $m \geq s \log^2(s) \log(\log(s)) \log(n)$ measurements are sufficient, again improving over previous estimates. Our results are shown via the so-called robust null space property which is weaker than the standard restricted isometry property. Our method of proof involves a novel combination of small ball estimates with chaining techniques {which should be of independent interest.
△ Less
Submitted 23 March, 2018; v1 submitted 17 October, 2016;
originally announced October 2016.
-
Regularization and the small-ball method II: complexity dependent error rates
Authors:
Guillaume Lecué,
Shahar Mendelson
Abstract:
For a convex class of functions $F$, a regularization functions $Ψ(\cdot)$ and given the random data $(X_i, Y_i)_{i=1}^N$, we study estimation properties of regularization procedures of the form \begin{equation*}
\hat f \in {\rm argmin}_{f\in
F}\Big(\frac{1}{N}\sum_{i=1}^N\big(Y_i-f(X_i)\big)^2+λΨ(f)\Big) \end{equation*} for some well chosen regularization parameter $λ$.
We obtain bounds on…
▽ More
For a convex class of functions $F$, a regularization functions $Ψ(\cdot)$ and given the random data $(X_i, Y_i)_{i=1}^N$, we study estimation properties of regularization procedures of the form \begin{equation*}
\hat f \in {\rm argmin}_{f\in
F}\Big(\frac{1}{N}\sum_{i=1}^N\big(Y_i-f(X_i)\big)^2+λΨ(f)\Big) \end{equation*} for some well chosen regularization parameter $λ$.
We obtain bounds on the $L_2$ estimation error rate that depend on the complexity of the "true model" $F^*:=\{f\in F: Ψ(f)\leqΨ(f^*)\}$, where $f^*\in {\rm argmin}_{f\in F}\mathbb{E}(Y-f(X))^2$ and the $(X_i,Y_i)$'s are independent and distributed as $(X,Y)$. Our estimate holds under weak stochastic assumptions -- one of which being a small-ball condition satisfied by $F$ -- and for rather flexible choices of regularization functions $Ψ(\cdot)$. Moreover, the result holds in the learning theory framework: we do not assume any a-priori connection between the output $Y$ and the input $X$.
As a proof of concept, we apply our general estimation bound to various choices of $Ψ$, for example, the $\ell_p$ and $S_p$-norms (for $p\geq1$), weak-$\ell_p$, atomic norms, max-norm and SLOPE. In many cases, the estimation rate almost coincides with the minimax rate in the class $F^*$.
△ Less
Submitted 27 August, 2016;
originally announced August 2016.
-
Risk minimization by median-of-means tournaments
Authors:
Gabor Lugosi,
Shahar Mendelson
Abstract:
We consider the classical statistical learning/regression problem, when the value of a real random variable Y is to be predicted based on the observation of another random variable X. Given a class of functions F and a sample of independent copies of (X, Y ), one needs to choose a function f from F such that f(X) approximates Y as well as possible, in the mean-squared sense. We introduce a new pro…
▽ More
We consider the classical statistical learning/regression problem, when the value of a real random variable Y is to be predicted based on the observation of another random variable X. Given a class of functions F and a sample of independent copies of (X, Y ), one needs to choose a function f from F such that f(X) approximates Y as well as possible, in the mean-squared sense. We introduce a new procedure, the so-called median-of-means tournament, that achieves the optimal tradeoff between accuracy and confidence under minimal assumptions, and in particular outperforms classical methods based on empirical risk minimization.
△ Less
Submitted 2 August, 2016;
originally announced August 2016.
-
On multiplier processes under weak moment assumptions
Authors:
Shahar Mendelson
Abstract:
We show that if $V \subset \R^n$ satisfies a certain symmetry condition (closely related to unconditionaity) and if $X$ is an isotropic random vector for which $\|\inr{X,t}\|_{L_p} \leq L \sqrt{p}$ for every $t \in S^{n-1}$ and $p \lesssim \log n$, then the corresponding empirical and multiplier processes indexed by $V$ behave as if $X$ were $L$-subgaussian.
We show that if $V \subset \R^n$ satisfies a certain symmetry condition (closely related to unconditionaity) and if $X$ is an isotropic random vector for which $\|\inr{X,t}\|_{L_p} \leq L \sqrt{p}$ for every $t \in S^{n-1}$ and $p \lesssim \log n$, then the corresponding empirical and multiplier processes indexed by $V$ behave as if $X$ were $L$-subgaussian.
△ Less
Submitted 25 January, 2016;
originally announced January 2016.
-
Regularization and the small-ball method I: sparse recovery
Authors:
Guillaume Lecué,
Shahar Mendelson
Abstract:
We obtain bounds on estimation error rates for regularization procedures of the form \begin{equation*}
\hat f \in {\rm argmin}_{f\in
F}\left(\frac{1}{N}\sum_{i=1}^N\left(Y_i-f(X_i)\right)^2+λΨ(f)\right) \end{equation*} when $Ψ$ is a norm and $F$ is convex.
Our approach gives a common framework that may be used in the analysis of learning problems and regularization problems alike. In particu…
▽ More
We obtain bounds on estimation error rates for regularization procedures of the form \begin{equation*}
\hat f \in {\rm argmin}_{f\in
F}\left(\frac{1}{N}\sum_{i=1}^N\left(Y_i-f(X_i)\right)^2+λΨ(f)\right) \end{equation*} when $Ψ$ is a norm and $F$ is convex.
Our approach gives a common framework that may be used in the analysis of learning problems and regularization problems alike. In particular, it sheds some light on the role various notions of sparsity have in regularization and on their connection with the size of subdifferentials of $Ψ$ in a neighbourhood of the true minimizer.
As `proof of concept' we extend the known estimates for the LASSO, SLOPE and trace norm regularization.
△ Less
Submitted 3 January, 2017; v1 submitted 21 January, 2016;
originally announced January 2016.
-
`local' vs. `global' parameters -- breaking the gaussian complexity barrier
Authors:
Shahar Mendelson
Abstract:
We show that if $F$ is a convex class of functions that is $L$-subgaussian, the error rate of learning problems generated by independent noise is equivalent to a fixed point determined by `local' covering estimates of the class, rather than by the gaussian averages. To that end, we establish new sharp upper and lower estimates on the error rate for such problems.
We show that if $F$ is a convex class of functions that is $L$-subgaussian, the error rate of learning problems generated by independent noise is equivalent to a fixed point determined by `local' covering estimates of the class, rather than by the gaussian averages. To that end, we establish new sharp upper and lower estimates on the error rate for such problems.
△ Less
Submitted 9 April, 2015;
originally announced April 2015.
-
On aggregation for heavy-tailed classes
Authors:
Shahar Mendelson
Abstract:
We introduce an alternative to the notion of `fast rate' in Learning Theory, which coincides with the optimal error rate when the given class happens to be convex and regular in some sense. While it is well known that such a rate cannot always be attained by a learning procedure (i.e., a procedure that selects a function in the given class), we introduce an aggregation procedure that attains that…
▽ More
We introduce an alternative to the notion of `fast rate' in Learning Theory, which coincides with the optimal error rate when the given class happens to be convex and regular in some sense. While it is well known that such a rate cannot always be attained by a learning procedure (i.e., a procedure that selects a function in the given class), we introduce an aggregation procedure that attains that rate under rather minimal assumptions -- for example, that the $L_q$ and $L_2$ norms are equivalent on the linear span of the class for some $q>2$, and the target random variable is square-integrable.
△ Less
Submitted 25 February, 2015;
originally announced February 2015.
-
Upper bounds on product and multiplier empirical processes
Authors:
Shahar Mendelson
Abstract:
We study two empirical process of special structure: firstly, the centred multiplier process indexed by a class $F$, $f \to \left|\sum_{i=1}^N (ξ_i f(X_i) - \E ξf)\right|$, where the i.i.d. multipliers $(ξ_i)_{i=1}^N$ need not be independent of $(X_i)_{i=1}^N$, and secondly, $(f,h) \to \left|\sum_{i=1}^N (f(X_i)h(X_i)-\E f h) \right|$, the centred product process indexed by the classes $F$ and…
▽ More
We study two empirical process of special structure: firstly, the centred multiplier process indexed by a class $F$, $f \to \left|\sum_{i=1}^N (ξ_i f(X_i) - \E ξf)\right|$, where the i.i.d. multipliers $(ξ_i)_{i=1}^N$ need not be independent of $(X_i)_{i=1}^N$, and secondly, $(f,h) \to \left|\sum_{i=1}^N (f(X_i)h(X_i)-\E f h) \right|$, the centred product process indexed by the classes $F$ and $H$.
We use chaining methods to obtain high probability upper bounds on the suprema of the two processes using a natural variation of Talagrand's $γ$-functionals.
△ Less
Submitted 2 October, 2015; v1 submitted 29 October, 2014;
originally announced October 2014.
-
Dvoretzky type theorems for subgaussian coordinate projections
Authors:
Shahar Mendelson
Abstract:
Given a class of functions $F$ on a probability space $(Ω,μ)$, we study the structure of a typical coordinate projection of the class, defined by $\{(f(X_i))_{i=1}^N : f \in F\}$, where $X_1,...,X_N$ are independent, selected according to $μ$. This notion of projection generalizes the standard linear random projection used in Asymptotic Geometric Analysis.
We show that when $F$ is a subgaussian…
▽ More
Given a class of functions $F$ on a probability space $(Ω,μ)$, we study the structure of a typical coordinate projection of the class, defined by $\{(f(X_i))_{i=1}^N : f \in F\}$, where $X_1,...,X_N$ are independent, selected according to $μ$. This notion of projection generalizes the standard linear random projection used in Asymptotic Geometric Analysis.
We show that when $F$ is a subgaussian class of functions, a typical coordinate projection satisfies a Dvoretzky type theorem.
△ Less
Submitted 25 October, 2014;
originally announced October 2014.
-
Learning without Concentration for General Loss Functions
Authors:
Shahar Mendelson
Abstract:
We study prediction and estimation problems using empirical risk minimization, relative to a general convex loss function. We obtain sharp error rates even when concentration is false or is very restricted, for example, in heavy-tailed scenarios. Our results show that the error rate depends on two parameters: one captures the intrinsic complexity of the class, and essentially leads to the error ra…
▽ More
We study prediction and estimation problems using empirical risk minimization, relative to a general convex loss function. We obtain sharp error rates even when concentration is false or is very restricted, for example, in heavy-tailed scenarios. Our results show that the error rate depends on two parameters: one captures the intrinsic complexity of the class, and essentially leads to the error rate in a noise-free (or realizable) problem; the other measures interactions between class members the target and the loss, and is dominant when the problem is far from realizable. We also explain how one may deal with outliers by choosing the loss in a way that is calibrated to the intrinsic complexity of the class and to the noise-level of the problem (the latter is measured by the distance between the target and the class).
△ Less
Submitted 13 October, 2014;
originally announced October 2014.