-
Statistically Near-Optimal Hypothesis Selection
Authors:
Olivier Bousquet,
Mark Braverman,
Klim Efremenko,
Gillat Kol,
Shay Moran
Abstract:
Hypothesis Selection is a fundamental distribution learning problem where given a comparator-class $Q=\{q_1,\ldots, q_n\}$ of distributions, and a sampling access to an unknown target distribution $p$, the goal is to output a distribution $q$ such that $\mathsf{TV}(p,q)$ is close to $opt$, where $opt = \min_i\{\mathsf{TV}(p,q_i)\}$ and $\mathsf{TV}(\cdot, \cdot)$ denotes the total-variation distan…
▽ More
Hypothesis Selection is a fundamental distribution learning problem where given a comparator-class $Q=\{q_1,\ldots, q_n\}$ of distributions, and a sampling access to an unknown target distribution $p$, the goal is to output a distribution $q$ such that $\mathsf{TV}(p,q)$ is close to $opt$, where $opt = \min_i\{\mathsf{TV}(p,q_i)\}$ and $\mathsf{TV}(\cdot, \cdot)$ denotes the total-variation distance. Despite the fact that this problem has been studied since the 19th century, its complexity in terms of basic resources, such as number of samples and approximation guarantees, remains unsettled (this is discussed, e.g., in the charming book by Devroye and Lugosi `00). This is in stark contrast with other (younger) learning settings, such as PAC learning, for which these complexities are well understood.
We derive an optimal $2$-approximation learning strategy for the Hypothesis Selection problem, outputting $q$ such that $\mathsf{TV}(p,q) \leq2 \cdot opt + \eps$, with a (nearly) optimal sample complexity of~$\tilde O(\log n/ε^2)$. This is the first algorithm that simultaneously achieves the best approximation factor and sample complexity: previously, Bousquet, Kane, and Moran (COLT `19) gave a learner achieving the optimal $2$-approximation, but with an exponentially worse sample complexity of $\tilde O(\sqrt{n}/ε^{2.5})$, and Yatracos~(Annals of Statistics `85) gave a learner with optimal sample complexity of $O(\log n /ε^2)$ but with a sub-optimal approximation factor of $3$.
△ Less
Submitted 17 August, 2021;
originally announced August 2021.
-
Non-parametric Binary regression in metric spaces with KL loss
Authors:
Ariel Avital,
Klim Efremenko,
Aryeh Kontorovich,
David Toplin,
Bo Waggoner
Abstract:
We propose a non-parametric variant of binary regression, where the hypothesis is regularized to be a Lipschitz function taking a metric space to [0,1] and the loss is logarithmic. This setting presents novel computational and statistical challenges. On the computational front, we derive a novel efficient optimization algorithm based on interior point methods; an attractive feature is that it is p…
▽ More
We propose a non-parametric variant of binary regression, where the hypothesis is regularized to be a Lipschitz function taking a metric space to [0,1] and the loss is logarithmic. This setting presents novel computational and statistical challenges. On the computational front, we derive a novel efficient optimization algorithm based on interior point methods; an attractive feature is that it is parameter-free (i.e., does not require tuning an update step size). On the statistical front, the unbounded loss function presents a problem for classic generalization bounds, based on covering-number and Rademacher techniques. We get around this challenge via an adaptive truncation approach, and also present a lower bound indicating that the truncation is, in some sense, necessary.
△ Less
Submitted 19 October, 2020;
originally announced October 2020.
-
Fast and Bayes-consistent nearest neighbors
Authors:
Klim Efremenko,
Aryeh Kontorovich,
Moshe Noivirt
Abstract:
Research on nearest-neighbor methods tends to focus somewhat dichotomously either on the statistical or the computational aspects -- either on, say, Bayes consistency and rates of convergence or on techniques for speeding up the proximity search. This paper aims at bridging these realms: to reap the advantages of fast evaluation time while maintaining Bayes consistency, and further without sacrifi…
▽ More
Research on nearest-neighbor methods tends to focus somewhat dichotomously either on the statistical or the computational aspects -- either on, say, Bayes consistency and rates of convergence or on techniques for speeding up the proximity search. This paper aims at bridging these realms: to reap the advantages of fast evaluation time while maintaining Bayes consistency, and further without sacrificing too much in the risk decay rate. We combine the locality-sensitive hashing (LSH) technique with a novel missing-mass argument to obtain a fast and Bayes-consistent classifier. Our algorithm's prediction runtime compares favorably against state of the art approximate NN methods, while maintaining Bayes-consistency and attaining rates comparable to minimax. On samples of size $n$ in $\R^d$, our pre-processing phase has runtime $O(d n \log n)$, while the evaluation phase has runtime $O(d\log n)$ per query point.
△ Less
Submitted 15 April, 2020; v1 submitted 7 October, 2019;
originally announced October 2019.
-
Optimal Short-Circuit Resilient Formulas
Authors:
Mark Braverman,
Klim Efremenko,
Ran Gelles,
Michael A. Yitayew
Abstract:
We consider fault-tolerant boolean formulas in which the output of a faulty gate is short-circuited to one of the gate's inputs. A recent result by Kalai et al. (FOCS 2012) converts any boolean formula into a resilient formula of polynomial size that works correctly if less than a fraction $1/6$ of the gates (on every input-to-output path) are faulty. We improve the result of Kalai et al., and sho…
▽ More
We consider fault-tolerant boolean formulas in which the output of a faulty gate is short-circuited to one of the gate's inputs. A recent result by Kalai et al. (FOCS 2012) converts any boolean formula into a resilient formula of polynomial size that works correctly if less than a fraction $1/6$ of the gates (on every input-to-output path) are faulty. We improve the result of Kalai et al., and show how to efficiently fortify any boolean formula against a fraction $1/5$ of short-circuit gates per path, with only a polynomial blowup in size. We additionally show that it is impossible to obtain formulas with higher resilience and sub-exponential growth in size.
Towards our results, we consider interactive coding schemes when noiseless feedback is present; these produce resilient boolean formulas via a Karchmer-Wigderson relation. We develop a coding scheme that resists up to a fraction $1/5$ of corrupted transmissions in each direction of the interactive channel. We further show that such a level of noise is maximal for coding schemes with sub-exponential blowup in communication. Our coding scheme takes a surprising inspiration from Blockchain technology.
△ Less
Submitted 3 August, 2022; v1 submitted 13 July, 2018;
originally announced July 2018.
-
Barriers for Rank Methods in Arithmetic Complexity
Authors:
Klim Efremenko,
Ankit Garg,
Rafael Oliveira,
Avi Wigderson
Abstract:
Arithmetic complexity is considered simpler to understand than Boolean complexity, namely computing Boolean functions via logical gates. And indeed, we seem to have significantly more lower bound techniques and results in arithmetic complexity than in Boolean complexity. Despite many successes and rapid progress, however, challenges like proving super-polynomial lower bounds on circuit or formula…
▽ More
Arithmetic complexity is considered simpler to understand than Boolean complexity, namely computing Boolean functions via logical gates. And indeed, we seem to have significantly more lower bound techniques and results in arithmetic complexity than in Boolean complexity. Despite many successes and rapid progress, however, challenges like proving super-polynomial lower bounds on circuit or formula size for explicit polynomials, or super-linear lower bounds on explicit 3-dimensional tensors, remain elusive.
At the same time, we have plenty more "barrier results" for failing to prove basic lower bounds in Boolean complexity than in arithmetic complexity. Finding barriers to arithmetic lower bound techniques seem harder, and despite some attempts we have no excuses of similar quality for these failures in arithmetic complexity. This paper aims to add to this study.
We address rank methods, which were long recognized as encompassing and abstracting almost all known arithmetic lower bounds to-date, including the most recent impressive successes. Rank methods (or flattenings) are also in wide use in algebraic geometry for proving tensor rank and symmetric tensor rank lower bounds. Our main results are barriers to these methods. In particular,
1. Rank methods cannot prove better than $Ω_d (n^{\lfloor d/2 \rfloor})$ lower bound on the tensor rank of any $d$-dimensional tensor of side $n$. (In particular, they cannot prove super-linear, indeed even $>8n$ tensor rank lower bounds for any 3-dimensional tensors.)
2. Rank methods cannot prove $Ω_d (n^{\lfloor d/2 \rfloor})$ on the Waring rank of any $n$-variate polynomial of degree $d$. (In particular, they cannot prove such lower bounds on stronger models, including depth-3 circuits.)
△ Less
Submitted 25 October, 2017;
originally announced October 2017.
-
MDS Code Constructions with Small Sub-packetization and Near-optimal Repair Bandwidth
Authors:
Ankit Singh Rawat,
Itzhak Tamo,
Venkatesan Guruswami,
Klim Efremenko
Abstract:
This paper addresses the problem of constructing MDS codes that enable exact repair of each code block with small repair bandwidth, which refers to the total amount of information flow from the remaining code blocks during the repair process. This problem naturally arises in the context of distributed storage systems as the node repair problem [7]. The constructions of exact-repairable MDS codes w…
▽ More
This paper addresses the problem of constructing MDS codes that enable exact repair of each code block with small repair bandwidth, which refers to the total amount of information flow from the remaining code blocks during the repair process. This problem naturally arises in the context of distributed storage systems as the node repair problem [7]. The constructions of exact-repairable MDS codes with optimal repair-bandwidth require working with large sub-packetization levels, which restricts their employment in practice.
This paper presents constructions for MDS codes that simultaneously provide both small repair bandwidth and small sub-packetization level. In particular, this paper presents two general approaches to construct exact-repairable MDS codes that aim at significantly reducing the required sub-packetization level at the cost of slightly sub-optimal repair bandwidth. The first approach gives MDS codes that have repair bandwidth at most twice the optimal repair-bandwidth. Additionally, these codes also have the smallest possible sub-packetization level $\ell = O(r)$, where $r$ denotes the number of parity blocks. This approach is then generalized to design codes that have their repair bandwidth approaching the optimal repair-bandwidth at the cost of graceful increment in the required sub-packetization level. The second approach transforms an MDS code with optimal repair-bandwidth and large sub-packetization level into a longer MDS code with small sub-packetization level and near-optimal repair bandwidth. For a given $r$, the obtained codes have their sub-packetization level scaling logarithmically with the code length. In addition, the obtained codes require field size only linear in the code length and ensure load balancing among the intact code blocks in terms of the information downloaded from these blocks during the exact reconstruction of a code block.
△ Less
Submitted 24 September, 2017;
originally announced September 2017.
-
The method of shifted partial derivatives cannot separate the permanent from the determinant
Authors:
Klim Efremenko,
J. M. Landsberg,
Hal Schenck,
Jerzy Weyman
Abstract:
The method of shifted partial derivatives was used to prove a super-polynomial lower bound on the size of depth four circuits needed to compute the permanent. We show that this method alone cannot prove that the padded permanent $\ell^{n-m} perm_m$ cannot be realized inside the $GL_{n^2}$-orbit closure of the determinant $ det_n$ when $n>2m^2+2m$. Our proof relies on several simple degenerations o…
▽ More
The method of shifted partial derivatives was used to prove a super-polynomial lower bound on the size of depth four circuits needed to compute the permanent. We show that this method alone cannot prove that the padded permanent $\ell^{n-m} perm_m$ cannot be realized inside the $GL_{n^2}$-orbit closure of the determinant $ det_n$ when $n>2m^2+2m$. Our proof relies on several simple degenerations of the determinant polynomial, Macaulay's theorem that gives a lower bound on the growth of an ideal, and a lower bound estimate from Gupta et. al. regarding the shifted partial derivatives of the determinant.
△ Less
Submitted 7 September, 2016;
originally announced September 2016.
-
Testing Equality in Communication Graphs
Authors:
Noga Alon,
Klim Efremenko,
Benny Sudakov
Abstract:
Let $G=(V,E)$ be a connected undirected graph with $k$ vertices. Suppose that on each vertex of the graph there is a player having an $n$-bit string. Each player is allowed to communicate with its neighbors according to an agreed communication protocol, and the players must decide, deterministically, if their inputs are all equal. What is the minimum possible total number of bits transmitted in a…
▽ More
Let $G=(V,E)$ be a connected undirected graph with $k$ vertices. Suppose that on each vertex of the graph there is a player having an $n$-bit string. Each player is allowed to communicate with its neighbors according to an agreed communication protocol, and the players must decide, deterministically, if their inputs are all equal. What is the minimum possible total number of bits transmitted in a protocol solving this problem ? We determine this minimum up to a lower order additive term in many cases (but not for all graphs). In particular, we show that it is $kn/2+o(n)$ for any Hamiltonian $k$-vertex graph, and that for any $2$-edge connected graph with $m$ edges containing no two adjacent vertices of degree exceeding $2$ it is $mn/2+o(n)$. The proofs combine graph theoretic ideas with tools from additive number theory.
△ Less
Submitted 5 May, 2016;
originally announced May 2016.
-
On minimal free resolutions of sub-permanents and other ideals arising in complexity theory
Authors:
Klim Efremenko,
J. M. Landsberg,
Hal Schenck,
Jerzy Weyman
Abstract:
We compute the linear strand of the minimal free resolution of the ideal generated by k x k sub-permanents of an n x n generic matrix and of the ideal generated by square-free monomials of degree k. The latter calculation gives the full minimal free resolution by work of Biagioli-Faridi-Rosas. Our motivation is to lay groundwork for the use of commutative algebra in algebraic complexity theory. We…
▽ More
We compute the linear strand of the minimal free resolution of the ideal generated by k x k sub-permanents of an n x n generic matrix and of the ideal generated by square-free monomials of degree k. The latter calculation gives the full minimal free resolution by work of Biagioli-Faridi-Rosas. Our motivation is to lay groundwork for the use of commutative algebra in algebraic complexity theory. We also compute several Hilbert functions relevant for complexity theory.
△ Less
Submitted 3 December, 2017; v1 submitted 20 April, 2015;
originally announced April 2015.
-
Maximal Noise in Interactive Communication over Erasure Channels and Channels with Feedback
Authors:
Klim Efremenko,
Ran Gelles,
Bernhard Haeupler
Abstract:
We provide tight upper and lower bounds on the noise resilience of interactive communication over noisy channels with feedback. In this setting, we show that the maximal fraction of noise that any robust protocol can resist is 1/3. Additionally, we provide a simple and efficient robust protocol that succeeds as long as the fraction of noise is at most 1/3 - ε. Surprisingly, both bounds hold regard…
▽ More
We provide tight upper and lower bounds on the noise resilience of interactive communication over noisy channels with feedback. In this setting, we show that the maximal fraction of noise that any robust protocol can resist is 1/3. Additionally, we provide a simple and efficient robust protocol that succeeds as long as the fraction of noise is at most 1/3 - ε. Surprisingly, both bounds hold regardless of whether the parties send bits or symbols from an arbitrarily large alphabet. We also consider interactive communication over erasure channels. We provide a protocol that matches the optimal tolerable erasure rate of 1/2 - εof previous protocols (Franklin et al., CRYPTO '13) but operates in a much simpler and more efficient way. Our protocol works with an alphabet of size 4, in contrast to prior protocols in which the alphabet size grows as epsilon goes to zero. Building on the above algorithm with a fixed alphabet size, we are able to devise a protocol for binary erasure channels that tolerates erasure rates of up to 1/3 - ε.
△ Less
Submitted 3 January, 2015;
originally announced January 2015.
-
Approximating General Metric Distances Between a Pattern and a Text
Authors:
Klim Efremenko,
Ely Porat
Abstract:
Let $T=t_0 ... t_{n-1}$ be a text and $P = p_0 ... p_{m-1}$ a pattern taken from some finite alphabet set $Σ$, and let $\dist$ be a metric on $Σ$. We consider the problem of calculating the sum of distances between the symbols of $P$ and the symbols of substrings of $T$ of length $m$ for all possible offsets. We present an $ε$-approximation algorithm for this problem which runs in time…
▽ More
Let $T=t_0 ... t_{n-1}$ be a text and $P = p_0 ... p_{m-1}$ a pattern taken from some finite alphabet set $Σ$, and let $\dist$ be a metric on $Σ$. We consider the problem of calculating the sum of distances between the symbols of $P$ and the symbols of substrings of $T$ of length $m$ for all possible offsets. We present an $ε$-approximation algorithm for this problem which runs in time $O(\frac{1}{ε^2}n\cdot \mathrm{polylog}(n,\absΣ))$
△ Less
Submitted 11 February, 2008;
originally announced February 2008.
-
Improved Deterministic Length Reduction
Authors:
Amihood Amir,
Klim Efremenko,
Oren Kapah,
Ely Porat,
Amir Rothschild
Abstract:
This paper presents a new technique for deterministic length reduction. This technique improves the running time of the algorithm presented in \cite{LR07} for performing fast convolution in sparse data. While the regular fast convolution of vectors $V_1,V_2$ whose sizes are $N_1,N_2$ respectively, takes $O(N_1 \log N_2)$ using FFT, using the new technique for length reduction, the algorithm prop…
▽ More
This paper presents a new technique for deterministic length reduction. This technique improves the running time of the algorithm presented in \cite{LR07} for performing fast convolution in sparse data. While the regular fast convolution of vectors $V_1,V_2$ whose sizes are $N_1,N_2$ respectively, takes $O(N_1 \log N_2)$ using FFT, using the new technique for length reduction, the algorithm proposed in \cite{LR07} performs the convolution in $O(n_1 \log^3 n_1)$, where $n_1$ is the number of non-zero values in $V_1$. The algorithm assumes that $V_1$ is given in advance, and $V_2$ is given in running time. The novel technique presented in this paper improves the convolution time to $O(n_1 \log^2 n_1)$ {\sl deterministically}, which equals the best running time given achieved by a {\sl randomized} algorithm.
The preprocessing time of the new technique remains the same as the preprocessing time of \cite{LR07}, which is $O(n_1^2)$. This assumes and deals the case where $N_1$ is polynomial in $n_1$. In the case where $N_1$ is exponential in $n_1$, a reduction to a polynomial case can be used. In this paper we also improve the preprocessing time of this reduction from $O(n_1^4)$ to $O(n_1^3{\rm polylog}(n_1))$.
△ Less
Submitted 31 January, 2008;
originally announced February 2008.