Search | arXiv e-print repository

doi 10.1109/ICCVW60793.2023.00274

Semantic Parsing of Colonoscopy Videos with Multi-Label Temporal Networks

Authors: Ori Kelner, Or Weinstein, Ehud Rivlin, Roman Goldenberg

Abstract: Following the successful debut of polyp detection and characterization, more advanced automation tools are being developed for colonoscopy. The new automation tasks, such as quality metrics or report generation, require understanding of the procedure flow that includes activities, events, anatomical landmarks, etc. In this work we present a method for automatic semantic parsing of colonoscopy vide… ▽ More Following the successful debut of polyp detection and characterization, more advanced automation tools are being developed for colonoscopy. The new automation tasks, such as quality metrics or report generation, require understanding of the procedure flow that includes activities, events, anatomical landmarks, etc. In this work we present a method for automatic semantic parsing of colonoscopy videos. The method uses a novel DL multi-label temporal segmentation model trained in supervised and unsupervised regimes. We evaluate the accuracy of the method on a test set of over 300 annotated colonoscopy videos, and use ablation to explore the relative importance of various method's components. △ Less

Submitted 22 August, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

Journal ref: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

arXiv:2302.05966 [pdf, other]

Infinite Lewis Weights in Spectral Graph Theory

Authors: Amit Suliman, Omri Weinstein

Abstract: We study the spectral implications of re-weighting a graph by the $\ell_\infty$-Lewis weights of its edges. Our main motivation is the ER-Minimization problem (Saberi et al., SIAM'08): Given an undirected graph $G$, the goal is to find positive normalized edge-weights $w\in \mathbb{R}_+^m$ which minimize the sum of pairwise \emph{effective-resistances} of $G_w$ (Kirchhoff's index). By contrast,… ▽ More We study the spectral implications of re-weighting a graph by the $\ell_\infty$-Lewis weights of its edges. Our main motivation is the ER-Minimization problem (Saberi et al., SIAM'08): Given an undirected graph $G$, the goal is to find positive normalized edge-weights $w\in \mathbb{R}_+^m$ which minimize the sum of pairwise \emph{effective-resistances} of $G_w$ (Kirchhoff's index). By contrast, $\ell_\infty$-Lewis weights minimize the \emph{maximum} effective-resistance of \emph{edges}, but are much cheaper to approximate, especially for Laplacians. With this algorithmic motivation, we study the ER-approximation ratio obtained by Lewis weights. Our first main result is that $\ell_\infty$-Lewis weights provide a constant ($\approx 3.12$) approximation for ER-minimization on \emph{trees}. The proof introduces a new technique, a local polarization process for effective-resistances ($\ell_2$-congestion) on trees, which is of independent interest in electrical network analysis. For general graphs, we prove an upper bound $α(G)$ on the approximation ratio obtained by Lewis weights, which is always $\leq \min\{ \text{diam}(G), κ(L_{w_\infty})\}$, where $κ$ is the condition number of the weighted Laplacian. All our approximation algorithms run in \emph{input-sparsity} time $\tilde{O}(m)$, a major improvement over Saberi et al.'s $O(m^{3.5})$ SDP for exact ER-minimization. Finally, we demonstrate the favorable effects of $\ell_\infty$-LW reweighting on the \emph{spectral-gap} of graphs and on their \emph{spectral-thinness} (Anari and Gharan, 2015). En-route to our results, we prove a weighted analogue of Mohar's classical bound on $λ_2(G)$, and provide a new characterization of leverage-scores of a matrix, as the gradient (w.r.t weights) of the volume of the enclosing ellipsoid. △ Less

Submitted 12 February, 2023; originally announced February 2023.

ACM Class: F.2

arXiv:2211.14825 [pdf, other]

Dynamic Kernel Sparsifiers

Authors: Yichuan Deng, Wenyu **, Zhao Song, Xiaorui Sun, Omri Weinstein

Abstract: A geometric graph associated with a set of points $P= \{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$ and a fixed kernel function $\mathsf{K}:\mathbb{R}^d\times \mathbb{R}^d\to\mathbb{R}_{\geq 0}$ is a complete graph on $P$ such that the weight of edge $(x_i, x_j)$ is $\mathsf{K}(x_i, x_j)$. We present a fully-dynamic data structure that maintains a spectral sparsifier of a geometric graph under… ▽ More A geometric graph associated with a set of points $P= \{x_1, x_2, \cdots, x_n \} \subset \mathbb{R}^d$ and a fixed kernel function $\mathsf{K}:\mathbb{R}^d\times \mathbb{R}^d\to\mathbb{R}_{\geq 0}$ is a complete graph on $P$ such that the weight of edge $(x_i, x_j)$ is $\mathsf{K}(x_i, x_j)$. We present a fully-dynamic data structure that maintains a spectral sparsifier of a geometric graph under updates that change the locations of points in $P$ one at a time. The update time of our data structure is $n^{o(1)}$ with high probability, and the initialization time is $n^{1+o(1)}$. Under certain assumption, we can provide a fully dynamic spectral sparsifier with the robostness to adaptive adversary. We further show that, for the Laplacian matrices of these geometric graphs, it is possible to maintain random sketches for the results of matrix vector multiplication and inverse-matrix vector multiplication in $n^{o(1)}$ time, under updates that change the locations of points in $P$ or change the query vector by a sparse difference. △ Less

Submitted 27 November, 2022; originally announced November 2022.

arXiv:2210.12495 [pdf, other]

Quartic Samples Suffice for Fourier Interpolation

Authors: Zhao Song, Baocheng Sun, Omri Weinstein, Ruizhe Zhang

Abstract: We study the problem of interpolating a noisy Fourier-sparse signal in the time duration $[0, T]$ from noisy samples in the same range, where the ground truth signal can be any $k$-Fourier-sparse signal with band-limit $[-F, F]$. Our main result is an efficient Fourier Interpolation algorithm that improves the previous best algorithm by [Chen, Kane, Price, and Song, FOCS 2016] in the following thr… ▽ More We study the problem of interpolating a noisy Fourier-sparse signal in the time duration $[0, T]$ from noisy samples in the same range, where the ground truth signal can be any $k$-Fourier-sparse signal with band-limit $[-F, F]$. Our main result is an efficient Fourier Interpolation algorithm that improves the previous best algorithm by [Chen, Kane, Price, and Song, FOCS 2016] in the following three aspects: $\bullet$ The sample complexity is improved from $\widetilde{O}(k^{51})$ to $\widetilde{O}(k^{4})$. $\bullet$ The time complexity is improved from $ \widetilde{O}(k^{10ω+40})$ to $\widetilde{O}(k^{4 ω})$. $\bullet$ The output sparsity is improved from $\widetilde{O}(k^{10})$ to $\widetilde{O}(k^{4})$. Here, $ω$ denotes the exponent of fast matrix multiplication. The state-of-the-art sample complexity of this problem is $\sim k^4$, but was only known to be achieved by an *exponential-time* algorithm. Our algorithm uses the same number of samples but has a polynomial runtime, laying the groundwork for an efficient Fourier Interpolation algorithm. The centerpiece of our algorithm is a new sufficient condition for the frequency estimation task -- a high signal-to-noise (SNR) band condition -- which allows for efficient and accurate signal reconstruction. Based on this condition together with a new structural decomposition of Fourier signals (Signal Equivalent Method), we design a cheap algorithm for estimating each "significant" frequency within a narrow range, which is then combined with a signal estimation algorithm into a new Fourier Interpolation framework to reconstruct the ground-truth signal. △ Less

Submitted 8 February, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

arXiv:2210.12468 [pdf, other]

Discrepancy Minimization in Input-Sparsity Time

Authors: Yichuan Deng, Zhao Song, Omri Weinstein

Abstract: A recent work of Larsen [Lar23] gave a faster combinatorial alternative to Bansal's SDP algorithm for finding a coloring $x\in\{-1,1\}^n$ that approximately minimizes the discrepancy $\mathrm{disc}(A,x) : = \| A x \|_{\infty}$ of a general real-valued $m\times n$ matrix $A$. Larsen's algorithm runs in $\widetilde{O}(mn^2)$ time compared to Bansal's $\widetilde{O}(mn^{4.5})$-time algorithm, at the… ▽ More A recent work of Larsen [Lar23] gave a faster combinatorial alternative to Bansal's SDP algorithm for finding a coloring $x\in\{-1,1\}^n$ that approximately minimizes the discrepancy $\mathrm{disc}(A,x) : = \| A x \|_{\infty}$ of a general real-valued $m\times n$ matrix $A$. Larsen's algorithm runs in $\widetilde{O}(mn^2)$ time compared to Bansal's $\widetilde{O}(mn^{4.5})$-time algorithm, at the price of a slightly weaker logarithmic approximation ratio in terms of the hereditary discrepancy of $A$ [Ban10]. In this work we present a combinatorial $\widetilde{O}(\mathrm{nnz}(A) + n^3)$ time algorithm with the same approximation guarantee as Larsen, which is optimal for tall matrices $m=\mathrm{poly}(n)$. Using a more intricate analysis and fast matrix-multiplication, we achieve $\widetilde{O}(\mathrm{nnz}(A) + n^{2.53})$ time, which breaks cubic runtime for square matrices, and bypasses the barrier of linear-programming approaches [ES14] for which input-sparsity time is currently out of reach. Our algorithm relies on two main ideas: (i) A new sketching technique for finding a projection matrix with short $\ell_2$-basis using implicit leverage-score sampling; (ii) A data structure for faster implementation of the iterative Edge-Walk partial-coloring algorithm of Lovett-Meka, using an alternative analysis that enables ``lazy" batch-updates with low-rank corrections. Our result nearly closes the computational gap between real-valued and binary matrices (set-systems), for which input-sparsity time coloring was very recently obtained [JSS23]. △ Less

Submitted 22 October, 2022; originally announced October 2022.

arXiv:2210.10594 [pdf, other]

Motion-Based Weak Supervision for Video Parsing with Application to Colonoscopy

Authors: Ori Kelner, Or Weinstein, Ehud Rivlin, Roman Goldenberg

Abstract: We propose a two-stage unsupervised approach for parsing videos into phases. We use motion cues to divide the video into coarse segments. Noisy segment labels are then used to weakly supervise an appearance-based classifier. We show the effectiveness of the method for phase detection in colonoscopy videos. We propose a two-stage unsupervised approach for parsing videos into phases. We use motion cues to divide the video into coarse segments. Noisy segment labels are then used to weakly supervise an appearance-based classifier. We show the effectiveness of the method for phase detection in colonoscopy videos. △ Less

Submitted 16 October, 2022; originally announced October 2022.

arXiv:2208.04508 [pdf, ps, other]

Training Overparametrized Neural Networks in Sublinear Time

Authors: Yichuan Deng, Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo

Abstract: The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI). Despite the popularity and low cost-per-iteration of traditional backpropagation via gradient decent, stochastic gradient descent (SGD) has prohibitive convergence rat… ▽ More The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI). Despite the popularity and low cost-per-iteration of traditional backpropagation via gradient decent, stochastic gradient descent (SGD) has prohibitive convergence rate in non-convex settings, both in theory and practice. To mitigate this cost, recent works have proposed to employ alternative (Newton-type) training methods with much faster convergence rate, albeit with higher cost-per-iteration. For a typical neural network with $m=\mathrm{poly}(n)$ parameters and input batch of $n$ datapoints in $\mathbb{R}^d$, the previous work of [Brand, Peng, Song, and Weinstein, ITCS'2021] requires $\sim mnd + n^3$ time per iteration. In this paper, we present a novel training method that requires only $m^{1-α} n d + n^3$ amortized time in the same overparametrized regime, where $α\in (0.01,1)$ is some fixed constant. This method relies on a new and alternative view of neural networks, as a set of binary search trees, where each iteration corresponds to modifying a small subset of the nodes in the tree. We believe this view would have further applications in the design and analysis of deep neural networks (DNNs). △ Less

Submitted 7 February, 2024; v1 submitted 8 August, 2022; originally announced August 2022.

arXiv:2205.14816 [pdf, ps, other]

Fast Distance Oracles for Any Symmetric Norm

Authors: Yichuan Deng, Zhao Song, Omri Weinstein, Ruizhe Zhang

Abstract: In the Distance Oracle problem, the goal is to preprocess $n$ vectors $x_1, x_2, \cdots, x_n$ in a $d$-dimensional metric space $(\mathbb{X}^d, \| \cdot \|_l)$ into a cheap data structure, so that given a query vector $q \in \mathbb{X}^d$ and a subset $S\subseteq [n]$ of the input data points, all distances $\| q - x_i \|_l$ for $x_i\in S$ can be quickly approximated (faster than the trivial… ▽ More In the Distance Oracle problem, the goal is to preprocess $n$ vectors $x_1, x_2, \cdots, x_n$ in a $d$-dimensional metric space $(\mathbb{X}^d, \| \cdot \|_l)$ into a cheap data structure, so that given a query vector $q \in \mathbb{X}^d$ and a subset $S\subseteq [n]$ of the input data points, all distances $\| q - x_i \|_l$ for $x_i\in S$ can be quickly approximated (faster than the trivial $\sim d|S|$ query time). This primitive is a basic subroutine in machine learning, data mining and similarity search applications. In the case of $\ell_p$ norms, the problem is well understood, and optimal data structures are known for most values of $p$. Our main contribution is a fast $(1+\varepsilon)$ distance oracle for any symmetric norm $\|\cdot\|_l$. This class includes $\ell_p$ norms and Orlicz norms as special cases, as well as other norms used in practice, e.g. top-$k$ norms, max-mixture and sum-mixture of $\ell_p$ norms, small-support norms and the box-norm. We propose a novel data structure with $\tilde{O}(n (d + \mathrm{mmc}(l)^2 ) )$ preprocessing time and space, and $t_q = \tilde{O}(d + |S| \cdot \mathrm{mmc}(l)^2)$ query time, for computing distances to a subset $S$ of data points, where $\mathrm{mmc}(l)$ is a complexity-measure (concentration modulus) of the symmetric norm. When $l = \ell_{p}$ , this runtime matches the aforementioned state-of-art oracles. △ Less

Submitted 29 May, 2022; originally announced May 2022.

arXiv:2205.00658 [pdf, other]

Improved Reconstruction for Fourier-Sparse Signals

Authors: Yeqi Gao, Zhao Song, Baocheng Sun, Omri Weinstein, Ruizhe Zhang

Abstract: We revisit the classical problem of Fourier-sparse signal reconstruction -- a variant of the \emph{Set Query} problem -- which asks to efficiently reconstruct (a subset of) a $d$-dimensional Fourier-sparse signal ($\|\hat{x}(t)\|_0 \leq k$), from minimum \emph{noisy} samples of $x(t)$ in the time domain. We present a unified framework for this problem by develo** a theory of sparse Fourier trans… ▽ More We revisit the classical problem of Fourier-sparse signal reconstruction -- a variant of the \emph{Set Query} problem -- which asks to efficiently reconstruct (a subset of) a $d$-dimensional Fourier-sparse signal ($\|\hat{x}(t)\|_0 \leq k$), from minimum \emph{noisy} samples of $x(t)$ in the time domain. We present a unified framework for this problem by develo** a theory of sparse Fourier transforms (SFT) for frequencies lying on a \emph{lattice}, which can be viewed as a ``semi-continuous'' version of SFT in between discrete and continuous domains. Using this framework, we obtain the following results: $\bullet$ **Dimension-free Fourier sparse recovery** We present a sample-optimal discrete Fourier Set-Query algorithm with $O(k^{ω+1})$ reconstruction time in one dimension, \emph{independent} of the signal's length ($n$) and $\ell_\infty$-norm. This complements the state-of-art algorithm of [Kapralov, STOC 2017], whose reconstruction time is $\tilde{O}(k \log^2 n \log R^*)$, where $R^* \approx \|\hat{x}\|_\infty$ is a signal-dependent parameter, and the algorithm is limited to low dimensions. By contrast, our algorithm works for arbitrary $d$ dimensions, mitigating the $\exp(d)$ blowup in decoding time to merely linear in $d$. A key component in our algorithm is fast spectral sparsification of the Fourier basis. $\bullet$ **High-accuracy Fourier interpolation** In one dimension, we design a poly-time $(3+ \sqrt{2} +ε)$-approximation algorithm for continuous Fourier interpolation. This bypasses a barrier of all previous algorithms [Price and Song, FOCS 2015, Chen, Kane, Price and Song, FOCS 2016], which only achieve $c>100$ approximation for this basic problem. Our main contribution is a new analytic tool for hierarchical frequency decomposition based on \emph{noise cancellation}. △ Less

Submitted 17 November, 2023; v1 submitted 2 May, 2022; originally announced May 2022.

arXiv:2202.12329 [pdf, other]

A Dynamic Low-Rank Fast Gaussian Transform

Authors: Baihe Huang, Zhao Song, Omri Weinstein, Junze Yin, Hengjie Zhang, Ruizhe Zhang

Abstract: The \emph{Fast Gaussian Transform} (FGT) enables subquadratic-time multiplication of an $n\times n$ Gaussian kernel matrix $\mathsf{K}_{i,j}= \exp ( - \| x_i - x_j \|_2^2 ) $ with an arbitrary vector $h \in \mathbb{R}^n$, where $x_1,\dots, x_n \in \mathbb{R}^d$ are a set of \emph{fixed} source points. This kernel plays a central role in machine learning and random feature maps. Nevertheless, in mo… ▽ More The \emph{Fast Gaussian Transform} (FGT) enables subquadratic-time multiplication of an $n\times n$ Gaussian kernel matrix $\mathsf{K}_{i,j}= \exp ( - \| x_i - x_j \|_2^2 ) $ with an arbitrary vector $h \in \mathbb{R}^n$, where $x_1,\dots, x_n \in \mathbb{R}^d$ are a set of \emph{fixed} source points. This kernel plays a central role in machine learning and random feature maps. Nevertheless, in most modern data analysis applications, datasets are dynamically changing (yet often have low rank), and recomputing the FGT from scratch in (kernel-based) algorithms incurs a major computational overhead ($\gtrsim n$ time for a single source update $\in \mathbb{R}^d$). These applications motivate a \emph{dynamic FGT} algorithm, which maintains a dynamic set of sources under \emph{kernel-density estimation} (KDE) queries in \emph{sublinear time} while retaining Mat-Vec multiplication accuracy and speed. Assuming the dynamic data-points $x_i$ lie in a (possibly changing) $k$-dimensional subspace ($k\leq d$), our main result is an efficient dynamic FGT algorithm, supporting the following operations in $\log^{O(k)}(n/\varepsilon)$ time: (1) Adding or deleting a source point, and (2) Estimating the ``kernel-density'' of a query point with respect to sources with $\varepsilon$ additive accuracy. The core of the algorithm is a dynamic data structure for maintaining the \emph{projected} ``interaction rank'' between source and target boxes, decoupled into finite truncation of Taylor and Hermite expansions. △ Less

Submitted 5 February, 2024; v1 submitted 24 February, 2022; originally announced February 2022.

arXiv:2202.08489 [pdf, ps, other]

A Faster Interior-Point Method for Sum-of-Squares Optimization

Authors: Shunhua Jiang, Bento Natura, Omri Weinstein

Abstract: We present a faster interior-point method for optimizing sum-of-squares (SOS) polynomials, which are a central tool in polynomial optimization and capture convex programming in the Lasserre hierarchy. Let $p = \sum_i q^2_i$ be an $n$-variate SOS polynomial of degree $2d$. Denoting by $L := \binom{n+d}{d}$ and $U := \binom{n+2d}{2d}$ the dimensions of the vector spaces in which $q_i$'s and $p$ live… ▽ More We present a faster interior-point method for optimizing sum-of-squares (SOS) polynomials, which are a central tool in polynomial optimization and capture convex programming in the Lasserre hierarchy. Let $p = \sum_i q^2_i$ be an $n$-variate SOS polynomial of degree $2d$. Denoting by $L := \binom{n+d}{d}$ and $U := \binom{n+2d}{2d}$ the dimensions of the vector spaces in which $q_i$'s and $p$ live respectively, our algorithm runs in time $\tilde{O}(LU^{1.87})$. This is polynomially faster than state-of-art SOS and semidefinite programming solvers, which achieve runtime $\tilde{O}(L^{0.5}\min\{U^{2.37}, L^{4.24}\})$. The centerpiece of our algorithm is a dynamic data structure for maintaining the inverse of the Hessian of the SOS barrier function under the polynomial interpolant basis, which efficiently extends to multivariate SOS optimization, and requires maintaining spectral approximations to low-rank perturbations of elementwise (Hadamard) products. This is the main challenge and departure from recent IPM breakthroughs using inverse-maintenance, where low-rank updates to the slack matrix readily imply the same for the Hessian matrix. △ Less

Submitted 17 February, 2022; originally announced February 2022.

arXiv:2201.00228 [pdf, other]

The Complexity of Dynamic Least-Squares Regression

Authors: Shunhua Jiang, Binghui Peng, Omri Weinstein

Abstract: We settle the complexity of dynamic least-squares regression (LSR), where rows and labels $(\mathbf{A}^{(t)}, \mathbf{b}^{(t)})$ can be adaptively inserted and/or deleted, and the goal is to efficiently maintain an $ε$-approximate solution to $\min_{\mathbf{x}^{(t)}} \| \mathbf{A}^{(t)} \mathbf{x}^{(t)} - \mathbf{b}^{(t)} \|_2$ for all $t\in [T]$. We prove sharp separations ($d^{2-o(1)}$ vs.… ▽ More We settle the complexity of dynamic least-squares regression (LSR), where rows and labels $(\mathbf{A}^{(t)}, \mathbf{b}^{(t)})$ can be adaptively inserted and/or deleted, and the goal is to efficiently maintain an $ε$-approximate solution to $\min_{\mathbf{x}^{(t)}} \| \mathbf{A}^{(t)} \mathbf{x}^{(t)} - \mathbf{b}^{(t)} \|_2$ for all $t\in [T]$. We prove sharp separations ($d^{2-o(1)}$ vs. $\sim d$) between the amortized update time of: (i) Fully vs. Partially dynamic $0.01$-LSR; (ii) High vs. low-accuracy LSR in the partially-dynamic (insertion-only) setting. Our lower bounds follow from a gap-amplification reduction -- reminiscent of iterative refinement -- rom the exact version of the Online Matrix Vector Conjecture (OMv) [HKNS15], to constant approximate OMv over the reals, where the $i$-th online product $\mathbf{H}\mathbf{v}^{(i)}$ only needs to be computed to $0.1$-relative error. All previous fine-grained reductions from OMv to its approximate versions only show hardness for inverse polynomial approximation $ε= n^{-ω(1)}$ (additive or multiplicative) . This result is of independent interest in fine-grained complexity and for the investigation of the OMv Conjecture, which is still widely open. △ Less

Submitted 6 April, 2023; v1 submitted 1 January, 2022; originally announced January 2022.

arXiv:2008.09792 [pdf, ps, other]

doi 10.4064/fm847-1-2020

On the pointwise Lyapunov exponent of holomorphic maps

Authors: Israel Or Weinstein

Abstract: We prove that for any holomorphic map, and any bounded orbit which does not accumulate to a singular set or to an attracting cycle, its lower Lyapunov exponent is non-negative. The same result holds for unbounded orbits, for maps with a bounded singular set. Furthermore, the orbit may accumulate to infinity or to a singular set, as long as it is slow enough. We prove that for any holomorphic map, and any bounded orbit which does not accumulate to a singular set or to an attracting cycle, its lower Lyapunov exponent is non-negative. The same result holds for unbounded orbits, for maps with a bounded singular set. Furthermore, the orbit may accumulate to infinity or to a singular set, as long as it is slow enough. △ Less

Submitted 22 August, 2020; originally announced August 2020.

Comments: To appear in Fundamenta Mathematicae

MSC Class: Primary 37F10; 37F15; 37F50

arXiv:2006.11648 [pdf, ps, other]

Training (Overparametrized) Neural Networks in Near-Linear Time

Authors: Jan van den Brand, Binghui Peng, Zhao Song, Omri Weinstein

Abstract: The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for develo** faster $\mathit{second}$-$\mathit{order}$ optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate ($\mathit{independent}$ of the training batch size $n$), second… ▽ More The slow convergence rate and pathological curvature issues of first-order gradient methods for training deep neural networks, initiated an ongoing effort for develo** faster $\mathit{second}$-$\mathit{order}$ optimization algorithms beyond SGD, without compromising the generalization error. Despite their remarkable convergence rate ($\mathit{independent}$ of the training batch size $n$), second-order algorithms incur a daunting slowdown in the $\mathit{cost}$ $\mathit{per}$ $\mathit{iteration}$ (inverting the Hessian matrix of the loss function), which renders them impractical. Very recently, this computational overhead was mitigated by the works of [ZMG19,CGH+19}, yielding an $O(mn^2)$-time second-order algorithm for training two-layer overparametrized neural networks of polynomial width $m$. We show how to speed up the algorithm of [CGH+19], achieving an $\tilde{O}(mn)$-time backpropagation algorithm for training (mildly overparametrized) ReLU networks, which is near-linear in the dimension ($mn$) of the full gradient (Jacobian) matrix. The centerpiece of our algorithm is to reformulate the Gauss-Newton iteration as an $\ell_2$-regression problem, and then use a Fast-JL type dimension reduction to $\mathit{precondition}$ the underlying Gram matrix in time independent of $M$, allowing to find a sufficiently good approximate solution via $\mathit{first}$-$\mathit{order}$ conjugate gradient. Our result provides a proof-of-concept that advanced machinery from randomized linear algebra -- which led to recent breakthroughs in $\mathit{convex}$ $\mathit{optimization}$ (ERM, LPs, Regression) -- can be carried over to the realm of deep learning as well. △ Less

Submitted 8 December, 2020; v1 submitted 20 June, 2020; originally announced June 2020.

arXiv:2004.07470 [pdf, other]

Faster Dynamic Matrix Inverse for Faster LPs

Authors: Shunhua Jiang, Zhao Song, Omri Weinstein, Hengjie Zhang

Abstract: Motivated by recent Linear Programming solvers, we design dynamic data structures for maintaining the inverse of an $n\times n$ real matrix under $\textit{low-rank}$ updates, with polynomially faster amortized running time. Our data structure is based on a recursive application of the Woodbury-Morrison identity for implementing $\textit{cascading}$ low-rank updates, combined with recent sketching… ▽ More Motivated by recent Linear Programming solvers, we design dynamic data structures for maintaining the inverse of an $n\times n$ real matrix under $\textit{low-rank}$ updates, with polynomially faster amortized running time. Our data structure is based on a recursive application of the Woodbury-Morrison identity for implementing $\textit{cascading}$ low-rank updates, combined with recent sketching technology. Our techniques and amortized analysis of multi-level partial updates, may be of broader interest to dynamic matrix problems. This data structure leads to the fastest known LP solver for general (dense) linear programs, improving the running time of the recent algorithms of (Cohen et al.'19, Lee et al.'19, Brand'20) from $O^*(n^{2+ \max\{\frac{1}{6}, ω-2, \frac{1-α}{2}\}})$ to $O^*(n^{2+\max\{\frac{1}{18}, ω-2, \frac{1-α}{2}\}})$, where $ω$ and $α$ are the fast matrix multiplication exponent and its dual. Hence, under the common belief that $ω\approx 2$ and $α\approx 1$, our LP solver runs in $O^*(n^{2.055})$ time instead of $O^*(n^{2.16})$. △ Less

Submitted 16 April, 2020; originally announced April 2020.

arXiv:1912.02858 [pdf, other]

Settling the relationship between Wilber's bounds for dynamic optimality

Authors: Victor Lecomte, Omri Weinstein

Abstract: In FOCS 1986, Wilber proposed two combinatorial lower bounds on the operational cost of any binary search tree (BST) for a given access sequence $X \in [n]^m$. Both bounds play a central role in the ongoing pursuit of the dynamic optimality conjecture (Sleator and Tarjan, 1985), but their relationship remained unknown for more than three decades. We show that Wilber's Funnel bound dominates his Al… ▽ More In FOCS 1986, Wilber proposed two combinatorial lower bounds on the operational cost of any binary search tree (BST) for a given access sequence $X \in [n]^m$. Both bounds play a central role in the ongoing pursuit of the dynamic optimality conjecture (Sleator and Tarjan, 1985), but their relationship remained unknown for more than three decades. We show that Wilber's Funnel bound dominates his Alternation bound for all $X$, and give a tight $Θ(\lg\lg n)$ separation for some $X$, answering Wilber's conjecture and an open problem of Iacono, Demaine et. al. The main ingredient of the proof is a new "symmetric" characterization of Wilber's Funnel bound, which proves that it is invariant under rotations of $X$. We use this characterization to provide initial indication that the Funnel bound matches the Independent Rectangle bound (Demaine et al., 2009), by proving that when the Funnel bound is constant, $\mathsf{IRB}_{\diagup\hspace{-.6em}\square}$ is linear. To the best of our knowledge, our results provide the first progress on Wilber's conjecture that the Funnel bound is dynamically optimal (1986). △ Less

Submitted 28 June, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

Comments: ESA 2020; 25 pages, 18 figures; v3 applies reviewers' comments

arXiv:1910.13543 [pdf, other]

An Adaptive Step Toward the Multiphase Conjecture

Authors: Young Kun Ko, Omri Weinstein

Abstract: In 2010, Pǎtraşcu proposed the following three-phase dynamic problem, as a candidate for proving polynomial lower bounds on the operational time of dynamic data structures: I: Preprocess a collection of sets $\vec{S} = S_1, \ldots , S_k \subseteq [n]$, where $k=\operatorname{poly}(n)$. II: A set $T\subseteq [n]$ is revealed, and the data structure updates its memory. III: An index… ▽ More In 2010, Pǎtraşcu proposed the following three-phase dynamic problem, as a candidate for proving polynomial lower bounds on the operational time of dynamic data structures: I: Preprocess a collection of sets $\vec{S} = S_1, \ldots , S_k \subseteq [n]$, where $k=\operatorname{poly}(n)$. II: A set $T\subseteq [n]$ is revealed, and the data structure updates its memory. III: An index $i \in [k]$ is revealed, and the data structure must determine if $S_i\cap T=^? \emptyset$. Pǎtraşcu conjectured that any data structure for the Multiphase problem must make $n^ε$ cell-probes in either Phase II or III, and showed that this would imply similar unconditional lower bounds on many important dynamic data structure problems. Alas, there has been almost no progress on this conjecture in the past decade since its introduction. We show an $\tildeΩ(\sqrt{n})$ cell-probe lower bound on the Multiphase problem for data structures with general (adaptive) updates, and queries with unbounded but "layered" adaptivity. This result captures all known set-intersection data structures and significantly strengthens previous Multiphase lower bounds, which only captured non-adaptive data structures. Our main technical result is a communication lower bound on a 4-party variant of Pǎtraşcu's Number-On-Forehead Multiphase game, using information complexity techniques. We also show that a lower bound on Pǎtraşcu's original NOF game would imply a polynomial ($n^{1+ε}$) lower bound on the number of wires of any constant-depth circuit with arbitrary gates computing a random $\tilde{O}(n)\times n$ linear operator $x \mapsto Ax$, a long-standing open problem in circuit complexity. This suggests that the NOF conjecture is much stronger than its data structure counterpart. △ Less

Submitted 29 October, 2019; originally announced October 2019.

Comments: 26 pages, 4 figures

arXiv:1907.10874 [pdf, ps, other]

How to Store a Random Walk

Authors: Emanuele Viola, Omri Weinstein, Huacheng Yu

Abstract: Motivated by storage applications, we study the following data structure problem: An encoder wishes to store a collection of jointly-distributed files $\overline{X}:=(X_1,X_2,\ldots, X_n) \sim μ$ which are \emph{correlated} ($H_μ(\overline{X}) \ll \sum_i H_μ(X_i)$), using as little (expected) memory as possible, such that each individual file $X_i$ can be recovered quickly with few (ideally consta… ▽ More Motivated by storage applications, we study the following data structure problem: An encoder wishes to store a collection of jointly-distributed files $\overline{X}:=(X_1,X_2,\ldots, X_n) \sim μ$ which are \emph{correlated} ($H_μ(\overline{X}) \ll \sum_i H_μ(X_i)$), using as little (expected) memory as possible, such that each individual file $X_i$ can be recovered quickly with few (ideally constant) memory accesses. In the case of independent random files, a dramatic result by \Pat (FOCS'08) and subsequently by Dodis, \Pat and Thorup (STOC'10) shows that it is possible to store $\overline{X}$ using just a \emph{constant} number of extra bits beyond the information-theoretic minimum space, while at the same time decoding each $X_i$ in constant time. However, in the (realistic) case where the files are correlated, much weaker results are known, requiring at least $Ω(n/poly\lg n)$ extra bits for constant decoding time, even for "simple" joint distributions $μ$. We focus on the natural case of compressing\emph{Markov chains}, i.e., storing a length-$n$ random walk on any (possibly directed) graph $G$. Denoting by $κ(G,n)$ the number of length-$n$ walks on $G$, we show that there is a succinct data structure storing a random walk using $\lg_2 κ(G,n) + O(\lg n)$ bits of space, such that any vertex along the walk can be decoded in $O(1)$ time on a word-RAM. For the harder task of matching the \emph{point-wise} optimal space of the walk, i.e., the empirical entropy $\sum_{i=1}^{n-1} \lg (deg(v_i))$, we present a data structure with $O(1)$ extra bits at the price of $O(\lg n)$ decoding time, and show that any improvement on this would lead to an improved solution on the long-standing Dictionary problem. All of our data structures support the \emph{online} version of the problem with constant update and query time. △ Less

Submitted 25 July, 2019; originally announced July 2019.

arXiv:1904.04828 [pdf, ps, other]

Lower Bounds for Oblivious Near-Neighbor Search

Authors: Kasper Green Larsen, Tal Malkin, Omri Weinstein, Kevin Yeo

Abstract: We prove an $Ω(d \lg n/ (\lg\lg n)^2)$ lower bound on the dynamic cell-probe complexity of statistically $\mathit{oblivious}$ approximate-near-neighbor search ($\mathsf{ANN}$) over the $d$-dimensional Hamming cube. For the natural setting of $d = Θ(\log n)$, our result implies an $\tildeΩ(\lg^2 n)$ lower bound, which is a quadratic improvement over the highest (non-oblivious) cell-probe lower boun… ▽ More We prove an $Ω(d \lg n/ (\lg\lg n)^2)$ lower bound on the dynamic cell-probe complexity of statistically $\mathit{oblivious}$ approximate-near-neighbor search ($\mathsf{ANN}$) over the $d$-dimensional Hamming cube. For the natural setting of $d = Θ(\log n)$, our result implies an $\tildeΩ(\lg^2 n)$ lower bound, which is a quadratic improvement over the highest (non-oblivious) cell-probe lower bound for $\mathsf{ANN}$. This is the first super-logarithmic $\mathit{unconditional}$ lower bound for $\mathsf{ANN}$ against general (non black-box) data structures. We also show that any oblivious $\mathit{static}$ data structure for decomposable search problems (like $\mathsf{ANN}$) can be obliviously dynamized with $O(\log n)$ overhead in update and query time, strengthening a classic result of Bentley and Saxe (Algorithmica, 1980). △ Less

Submitted 9 April, 2019; originally announced April 2019.

Comments: 28 pages

arXiv:1811.02725 [pdf, ps, other]

Static Data Structure Lower Bounds Imply Rigidity

Authors: Zeev Dvir, Alexander Golovnev, Omri Weinstein

Abstract: We show that static data structure lower bounds in the group (linear) model imply semi-explicit lower bounds on matrix rigidity. In particular, we prove that an explicit lower bound of $t \geq ω(\log^2 n)$ on the cell-probe complexity of linear data structures in the group model, even against arbitrarily small linear space $(s= (1+\varepsilon)n)$, would already imply a semi-explicit (… ▽ More We show that static data structure lower bounds in the group (linear) model imply semi-explicit lower bounds on matrix rigidity. In particular, we prove that an explicit lower bound of $t \geq ω(\log^2 n)$ on the cell-probe complexity of linear data structures in the group model, even against arbitrarily small linear space $(s= (1+\varepsilon)n)$, would already imply a semi-explicit ($\bf P^{NP}\rm$) construction of rigid matrices with significantly better parameters than the current state of art (Alon, Panigrahy and Yekhanin, 2009). Our results further assert that polynomial ($t\geq n^δ$) data structure lower bounds against near-optimal space, would imply super-linear circuit lower bounds for log-depth linear circuits (a four-decade open question). In the succinct space regime $(s=n+o(n))$, we show that any improvement on current cell-probe lower bounds in the linear model would also imply new rigidity bounds. Our results rely on a new connection between the "inner" and "outer" dimensions of a matrix (Paturi and Pudlak, 2006), and on a new reduction from worst-case to average-case rigidity, which is of independent interest. △ Less

Submitted 13 February, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

arXiv:1808.03978 [pdf, other]

Local Decodability of the Burrows-Wheeler Transform

Authors: Sandip Sinha, Omri Weinstein

Abstract: The Burrows-Wheeler Transform (BWT) is among the most influential discoveries in text compression and DNA storage. It is a reversible preprocessing step that rearranges an $n$-letter string into runs of identical characters (by exploiting context regularities), resulting in highly compressible strings, and is the basis of the \texttt{bzip} compression program. Alas, the decoding process of BWT is… ▽ More The Burrows-Wheeler Transform (BWT) is among the most influential discoveries in text compression and DNA storage. It is a reversible preprocessing step that rearranges an $n$-letter string into runs of identical characters (by exploiting context regularities), resulting in highly compressible strings, and is the basis of the \texttt{bzip} compression program. Alas, the decoding process of BWT is inherently sequential and requires $Ω(n)$ time even to retrieve a \emph{single} character. We study the succinct data structure problem of locally decoding short substrings of a given text under its \emph{compressed} BWT, i.e., with small additive redundancy $r$ over the \emph{Move-To-Front} (\texttt{bzip}) compression. The celebrated BWT-based FM-index (FOCS '00), as well as other related literature, yield a trade-off of $r=\tilde{O}(n/\sqrt{t})$ bits, when a single character is to be decoded in $O(t)$ time. We give a near-quadratic improvement $r=\tilde{O}(n\lg(t)/t)$. As a by-product, we obtain an \emph{exponential} (in $t$) improvement on the redundancy of the FM-index for counting pattern-matches on compressed text. In the interesting regime where the text compresses to $n^{1-o(1)}$ bits, these results provide an $\exp(t)$ \emph{overall} space reduction. For the local decoding problem of BWT, we also prove an $Ω(n/t^2)$ cell-probe lower bound for "symmetric" data structures. We achieve our main result by designing a compressed partial-sums (Rank) data structure over BWT. The key component is a \emph{locally-decodable} Move-to-Front (MTF) code: with only $O(1)$ extra bits per block of length $n^{Ω(1)}$, the decoding time of a single character can be decreased from $Ω(n)$ to $O(\lg n)$. This result is of independent interest in algorithmic information theory. △ Less

Submitted 5 December, 2018; v1 submitted 12 August, 2018; originally announced August 2018.

Comments: The following two technical typos were fixed: (1) On page 2, following Theorem 1, the decoding time of a contiguous substring of size $\ell$ was corrected from $O(t + \ell)$ to $O(t + \ell \cdot \lg t)$. (2) In the statement of Theorem 2, the query time to count occurrences of patterns of length $\ell$ was corrected to $O(t \ell)$, independent of the number of occurrences

arXiv:1805.02974 [pdf, ps, other]

Massively Parallel Algorithms for Finding Well-Connected Components in Sparse Graphs

Authors: Sepehr Assadi, Xiaorui Sun, Omri Weinstein

Abstract: A fundamental question that shrouds the emergence of massively parallel computing (MPC) platforms is how can the additional power of the MPC paradigm be leveraged to achieve faster algorithms compared to classical parallel models such as PRAM? Previous research has identified the sparse graph connectivity problem as a major obstacle to such improvement: While classical logarithmic-round PRAM alg… ▽ More A fundamental question that shrouds the emergence of massively parallel computing (MPC) platforms is how can the additional power of the MPC paradigm be leveraged to achieve faster algorithms compared to classical parallel models such as PRAM? Previous research has identified the sparse graph connectivity problem as a major obstacle to such improvement: While classical logarithmic-round PRAM algorithms for finding connected components in any $n$-vertex graph have been known for more than three decades, no $o(\log{n})$-round MPC algorithms are known for this task with truly sublinear in $n$ memory per machine. This problem arises when processing massive yet sparse graphs with $O(n)$ edges, for which the interesting setting of parameters is $n^{1-Ω(1)}$ memory per machine. It is conjectured that achieving an $o(\log{n})$-round algorithm for connectivity on general sparse graphs with $n^{1-Ω(1)}$ per-machine memory may not be possible, and this conjecture also forms the basis for multiple conditional hardness results on the round complexity of other problems in the MPC model. We take an opportunistic approach towards the sparse graph connectivity problem, by designing an algorithm with improved performance guarantees in terms of the connectivity structure of the input graph. Formally, we design an algorithm that finds all connected components with spectral gap at least $λ$ in a graph in $O(\log\log{n} + \log{(1/λ)})$ MPC rounds and $n^{Ω(1)}$ memory per machine. As such, this algorithm achieves an exponential round reduction on sparse "well-connected" components (i.e., $λ\geq 1/\text{polylog}{(n)}$) using only $n^{Ω(1)}$ memory per machine and $\widetilde{O}(n)$ total memory, and still operates in $o(\log n)$ rounds even when $λ= 1/n^{o(1)}$. △ Less

Submitted 8 May, 2018; originally announced May 2018.

arXiv:1707.04875 [pdf, ps, other]

Coding sets with asymmetric information

Authors: Alexandr Andoni, Javad Ghaderi, Daniel Hsu, Dan Rubenstein, Omri Weinstein

Abstract: We study the following one-way asymmetric transmission problem, also a variant of model-based compressed sensing: a resource-limited encoder has to report a small set $S$ from a universe of $N$ items to a more powerful decoder (server). The distinguishing feature is asymmetric information: the subset $S$ is comprised of i.i.d. samples from a prior distribution $μ$, and $μ$ is only known to the dec… ▽ More We study the following one-way asymmetric transmission problem, also a variant of model-based compressed sensing: a resource-limited encoder has to report a small set $S$ from a universe of $N$ items to a more powerful decoder (server). The distinguishing feature is asymmetric information: the subset $S$ is comprised of i.i.d. samples from a prior distribution $μ$, and $μ$ is only known to the decoder. The goal for the encoder is to encode $S$ obliviously, while achieving the information-theoretic bound of $|S| \cdot H(μ)$, i.e., the Shannon entropy bound. We first show that any such compression scheme must be {\em randomized}, if it gains non-trivially from the prior $μ$. This stands in contrast to the symmetric case (when both the encoder and decoder know $μ$), where the Huffman code provides a near-optimal deterministic solution. On the other hand, a rather simple argument shows that, when $|S|=k$, a random linear code achieves near-optimal communication rate of about $k\cdot H(μ)$ bits. Alas, the resulting scheme has prohibitive decoding time: about ${N\choose k} \approx (N/k)^k$. Our main result is a computationally efficient and linear coding scheme, which achieves an $O(\lg\lg N)$-competitive communication ratio compared to the optimal benchmark, and runs in $\text{poly}(N,k)$ time. Our "multi-level" coding scheme uses a combination of hashing and syndrome-decoding of Reed-Solomon codes, and relies on viewing the (unknown) prior $μ$ as a rather small convex combination of uniform ("flat") distributions. △ Less

Submitted 26 July, 2018; v1 submitted 16 July, 2017; originally announced July 2017.

arXiv:1703.03575 [pdf, other]

Crossing the Logarithmic Barrier for Dynamic Boolean Data Structure Lower Bounds

Authors: Kasper Green Larsen, Omri Weinstein, Huacheng Yu

Abstract: This paper proves the first super-logarithmic lower bounds on the cell probe complexity of dynamic boolean (a.k.a. decision) data structure problems, a long-standing milestone in data structure lower bounds. We introduce a new method for proving dynamic cell probe lower bounds and use it to prove a $\tildeΩ(\log^{1.5} n)$ lower bound on the operational time of a wide range of boolean data struct… ▽ More This paper proves the first super-logarithmic lower bounds on the cell probe complexity of dynamic boolean (a.k.a. decision) data structure problems, a long-standing milestone in data structure lower bounds. We introduce a new method for proving dynamic cell probe lower bounds and use it to prove a $\tildeΩ(\log^{1.5} n)$ lower bound on the operational time of a wide range of boolean data structure problems, most notably, on the query time of dynamic range counting over $\mathbb{F}_2$ ([Pat07]). Proving an $ω(\lg n)$ lower bound for this problem was explicitly posed as one of five important open problems in the late Mihai Pǎtraşcu's obituary [Tho13]. This result also implies the first $ω(\lg n)$ lower bound for the classical 2D range counting problem, one of the most fundamental data structure problems in computational geometry and spatial databases. We derive similar lower bounds for boolean versions of dynamic polynomial evaluation and 2D rectangle stabbing, and for the (non-boolean) problems of range selection and range median. Our technical centerpiece is a new way of "weakly" simulating dynamic data structures using efficient one-way communication protocols with small advantage over random guessing. This simulation involves a surprising excursion to low-degree (Chebychev) polynomials which may be of independent interest, and offers an entirely new algorithmic angle on the "cell sampling" method of Panigrahy et al. [PTW10]. △ Less

Submitted 10 March, 2017; originally announced March 2017.

arXiv:1607.04842 [pdf, ps, other]

The Minrank of Random Graphs

Authors: Alexander Golovnev, Oded Regev, Omri Weinstein

Abstract: The minrank of a graph $G$ is the minimum rank of a matrix $M$ that can be obtained from the adjacency matrix of $G$ by switching some ones to zeros (i.e., deleting edges) and then setting all diagonal entries to one. This quantity is closely related to the fundamental information-theoretic problems of (linear) index coding (Bar-Yossef et al., FOCS'06), network coding and distributed storage, and… ▽ More The minrank of a graph $G$ is the minimum rank of a matrix $M$ that can be obtained from the adjacency matrix of $G$ by switching some ones to zeros (i.e., deleting edges) and then setting all diagonal entries to one. This quantity is closely related to the fundamental information-theoretic problems of (linear) index coding (Bar-Yossef et al., FOCS'06), network coding and distributed storage, and to Valiant's approach for proving superlinear circuit lower bounds (Valiant, Boolean Function Complexity '92). We prove tight bounds on the minrank of random Erdős-Rényi graphs $G(n,p)$ for all regimes of $p\in[0,1]$. In particular, for any constant $p$, we show that $\mathsf{minrk}(G) = Θ(n/\log n)$ with high probability, where $G$ is chosen from $G(n,p)$. This bound gives a near quadratic improvement over the previous best lower bound of $Ω(\sqrt{n})$ (Haviv and Langberg, ISIT'12), and partially settles an open problem raised by Lubetzky and Stav (FOCS '07). Our lower bound matches the well-known upper bound obtained by the "clique covering" solution, and settles the linear index coding problem for random graphs. Finally, our result suggests a new avenue of attack, via derandomization, on Valiant's approach for proving superlinear lower bounds for logarithmic-depth semilinear circuits. △ Less

Submitted 16 February, 2017; v1 submitted 17 July, 2016; originally announced July 2016.

arXiv:1604.03030 [pdf, ps, other]

Amortized Dynamic Cell-Probe Lower Bounds from Four-Party Communication

Authors: Omri Weinstein, Huacheng Yu

Abstract: This paper develops a new technique for proving amortized, randomized cell-probe lower bounds on dynamic data structure problems. We introduce a new randomized nondeterministic four-party communication model that enables "accelerated", error-preserving simulations of dynamic data structures. We use this technique to prove an $Ω(n(\log n/\log\log n)^2)$ cell-probe lower bound for the dynamic 2D w… ▽ More This paper develops a new technique for proving amortized, randomized cell-probe lower bounds on dynamic data structure problems. We introduce a new randomized nondeterministic four-party communication model that enables "accelerated", error-preserving simulations of dynamic data structures. We use this technique to prove an $Ω(n(\log n/\log\log n)^2)$ cell-probe lower bound for the dynamic 2D weighted orthogonal range counting problem (2D-ORC) with $n/\mathrm{poly}\log n$ updates and $n$ queries, that holds even for data structures with $\exp(-\tildeΩ(n))$ success probability. This result not only proves the highest amortized lower bound to date, but is also tight in the strongest possible sense, as a matching upper bound can be obtained by a deterministic data structure with worst-case operational time. This is the first demonstration of a "sharp threshold" phenomenon for dynamic data structures. Our broader motivation is that cell-probe lower bounds for exponentially small success facilitate reductions from dynamic to static data structures. As a proof-of-concept, we show that a slightly strengthened version of our lower bound would imply an $Ω((\log n /\log\log n)^2)$ lower bound for the static 3D-ORC problem with $O(n\log^{O(1)}n)$ space. Such result would give a near quadratic improvement over the highest known static cell-probe lower bound, and break the long standing $Ω(\log n)$ barrier for static data structures. △ Less

Submitted 11 April, 2016; originally announced April 2016.

arXiv:1505.05794 [pdf, ps, other]

An Improved Upper Bound for the Most Informative Boolean Function Conjecture

Authors: Or Ordentlich, Ofer Shayevitz, Omri Weinstein

Abstract: Suppose $X$ is a uniformly distributed $n$-dimensional binary vector and $Y$ is obtained by passing $X$ through a binary symmetric channel with crossover probability $α$. A recent conjecture by Courtade and Kumar postulates that $I(f(X);Y)\leq 1-h(α)$ for any Boolean function $f$. So far, the best known upper bound was $I(f(X);Y)\leq (1-2α)^2$. In this paper, we derive a new upper bound that holds… ▽ More Suppose $X$ is a uniformly distributed $n$-dimensional binary vector and $Y$ is obtained by passing $X$ through a binary symmetric channel with crossover probability $α$. A recent conjecture by Courtade and Kumar postulates that $I(f(X);Y)\leq 1-h(α)$ for any Boolean function $f$. So far, the best known upper bound was $I(f(X);Y)\leq (1-2α)^2$. In this paper, we derive a new upper bound that holds for all balanced functions, and improves upon the best known bound for all $\tfrac{1}{3}<α<\tfrac{1}{2}$. △ Less

Submitted 31 May, 2015; v1 submitted 21 May, 2015; originally announced May 2015.

arXiv:1504.08352 [pdf, ps, other]

ETH Hardness for Densest-$k$-Subgraph with Perfect Completeness

Authors: Mark Braverman, Young Kun Ko, Aviad Rubinstein, Omri Weinstein

Abstract: We show that, assuming the (deterministic) Exponential Time Hypothesis, distinguishing between a graph with an induced $k$-clique and a graph in which all k-subgraphs have density at most $1-ε$, requires $n^{\tilde Ω(log n)}$ time. Our result essentially matches the quasi-polynomial algorithms of Feige and Seltser [FS97] and Barman [Bar15] for this problem, and is the first one to rule out an addi… ▽ More We show that, assuming the (deterministic) Exponential Time Hypothesis, distinguishing between a graph with an induced $k$-clique and a graph in which all k-subgraphs have density at most $1-ε$, requires $n^{\tilde Ω(log n)}$ time. Our result essentially matches the quasi-polynomial algorithms of Feige and Seltser [FS97] and Barman [Bar15] for this problem, and is the first one to rule out an additive PTAS for Densest $k$-Subgraph. We further strengthen this result by showing that our lower bound continues to hold when, in the soundness case, even subgraphs smaller by a near-polynomial factor ($k' = k 2^{-\tilde Ω(log n)}$) are assumed to be at most ($1-ε$)-dense. Our reduction is inspired by recent applications of the "birthday repetition" technique [AIM14,BKW15]. Our analysis relies on information theoretical machinery and is similar in spirit to analyzing a parallel repetition of two-prover games in which the provers may choose to answer some challenges multiple times, while completely ignoring other challenges. △ Less

Submitted 30 April, 2015; originally announced April 2015.

arXiv:1504.06830 [pdf, other]

Information Complexity and the Quest for Interactive Compression (A Survey)

Authors: Omri Weinstein

Abstract: Information complexity is the interactive analogue of Shannon's classical information theory. In recent years this field has emerged as a powerful tool for proving strong communication lower bounds, and for addressing some of the major open problems in communication complexity and circuit complexity. A notable achievement of information complexity is the breakthrough in understanding of the fundam… ▽ More Information complexity is the interactive analogue of Shannon's classical information theory. In recent years this field has emerged as a powerful tool for proving strong communication lower bounds, and for addressing some of the major open problems in communication complexity and circuit complexity. A notable achievement of information complexity is the breakthrough in understanding of the fundamental direct sum and direct product conjectures, which aim to quantify the power of parallel computation. This survey provides a brief introduction to information complexity, and overviews some of the recent progress on these conjectures and their tight relationship with the fascinating problem of compressing interactive protocols. △ Less

Submitted 26 April, 2015; originally announced April 2015.

arXiv:1504.01780 [pdf, ps, other]

Welfare Maximization with Limited Interaction

Authors: Noga Alon, Noam Nisan, Ran Raz, Omri Weinstein

Abstract: We continue the study of welfare maximization in unit-demand (matching) markets, in a distributed information model where agent's valuations are unknown to the central planner, and therefore communication is required to determine an efficient allocation. Dobzinski, Nisan and Oren (STOC'14) showed that if the market size is $n$, then $r$ rounds of interaction (with logarithmic bandwidth) suffice to… ▽ More We continue the study of welfare maximization in unit-demand (matching) markets, in a distributed information model where agent's valuations are unknown to the central planner, and therefore communication is required to determine an efficient allocation. Dobzinski, Nisan and Oren (STOC'14) showed that if the market size is $n$, then $r$ rounds of interaction (with logarithmic bandwidth) suffice to obtain an $n^{1/(r+1)}$-approximation to the optimal social welfare. In particular, this implies that such markets converge to a stable state (constant approximation) in time logarithmic in the market size. We obtain the first multi-round lower bound for this setup. We show that even if the allowable per-round bandwidth of each agent is $n^{ε(r)}$, the approximation ratio of any $r$-round (randomized) protocol is no better than $Ω(n^{1/5^{r+1}})$, implying an $Ω(\log \log n)$ lower bound on the rate of convergence of the market to equilibrium. Our construction and technique may be of interest to round-communication tradeoffs in the more general setting of combinatorial auctions, for which the only known lower bound is for simultaneous ($r=1$) protocols [DNO14]. △ Less

Submitted 7 April, 2015; originally announced April 2015.

ACM Class: C.2.4; F.2.3; H.1.1

arXiv:1406.0576 [pdf, ps, other]

Welfare and Revenue Guarantees for Competitive Bundling Equilibrium

Authors: Shahar Dobzinski, Michal Feldman, Inbal Talgam-Cohen, Omri Weinstein

Abstract: We study equilibria of markets with $m$ heterogeneous indivisible goods and $n$ consumers with combinatorial preferences. It is well known that a competitive equilibrium is not guaranteed to exist when valuations are not gross substitutes. Given the widespread use of bundling in real-life markets, we study its role as a stabilizing and coordinating device by considering the notion of \emph{competi… ▽ More We study equilibria of markets with $m$ heterogeneous indivisible goods and $n$ consumers with combinatorial preferences. It is well known that a competitive equilibrium is not guaranteed to exist when valuations are not gross substitutes. Given the widespread use of bundling in real-life markets, we study its role as a stabilizing and coordinating device by considering the notion of \emph{competitive bundling equilibrium}: a competitive equilibrium over the market induced by partitioning the goods for sale into fixed bundles. Compared to other equilibrium concepts involving bundles, this notion has the advantage of simulatneous succinctness ($O(m)$ prices) and market clearance. Our first set of results concern welfare guarantees. We show that in markets where consumers care only about the number of goods they receive (known as multi-unit or homogeneous markets), even in the presence of complementarities, there always exists a competitive bundling equilibrium that guarantees a logarithmic fraction of the optimal welfare, and this guarantee is tight. We also establish non-trivial welfare guarantees for general markets, two-consumer markets, and markets where the consumer valuations are additive up to a fixed budget (budget-additive). Our second set of results concern revenue guarantees. Motivated by the fact that the revenue extracted in a standard competitive equilibrium may be zero (even with simple unit-demand consumers), we show that for natural subclasses of gross substitutes valuations, there always exists a competitive bundling equilibrium that extracts a logarithmic fraction of the optimal welfare, and this guarantee is tight. The notion of competitive bundling equilibrium can thus be useful even in markets which possess a standard competitive equilibrium. △ Less

Submitted 3 June, 2014; originally announced June 2014.

arXiv:1404.2861 [pdf, ps, other]

Distributed Signaling Games

Authors: Moran Feldman, Moshe Tennenholtz, Omri Weinstein

Abstract: A recurring theme in recent computer science literature is that proper design of signaling schemes is a crucial aspect of effective mechanisms aiming to optimize social welfare or revenue. One of the research endeavors of this line of work is understanding the algorithmic and computational complexity of designing efficient signaling schemes. In reality, however, information is typically not held b… ▽ More A recurring theme in recent computer science literature is that proper design of signaling schemes is a crucial aspect of effective mechanisms aiming to optimize social welfare or revenue. One of the research endeavors of this line of work is understanding the algorithmic and computational complexity of designing efficient signaling schemes. In reality, however, information is typically not held by a central authority, but is distributed among multiple sources (third-party "mediators"), a fact that dramatically changes the strategic and combinatorial nature of the signaling problem, making it a game between information providers, as opposed to a traditional mechanism design problem. In this paper we introduce {\em distributed signaling games}, while using display advertising as a canonical example for introducing this foundational framework. A distributed signaling game may be a pure coordination game (i.e., a distributed optimization task), or a non-cooperative game. In the context of pure coordination games, we show a wide gap between the computational complexity of the centralized and distributed signaling problems. On the other hand, we show that if the information structure of each mediator is assumed to be "local", then there is an efficient algorithm that finds a near-optimal ($5$-approximation) distributed signaling scheme. In the context of non-cooperative games, the outcome generated by the mediators' signals may have different value to each (due to the auctioneer's desire to align the incentives of the mediators with his own by relative compensations). We design a mechanism for this problem via a novel application of Shapley's value, and show that it possesses some interesting properties, in particular, it always admits a pure Nash equilibrium, and it never decreases the revenue of the auctioneer. △ Less

Submitted 5 July, 2015; v1 submitted 10 April, 2014; originally announced April 2014.

arXiv:1112.2000 [pdf, ps, other]

A discrepancy lower bound for information complexity

Authors: Mark Braverman, Omri Weinstein

Abstract: This paper provides the first general technique for proving information lower bounds on two-party unbounded-rounds communication problems. We show that the discrepancy lower bound, which applies to randomized communication complexity, also applies to information complexity. More precisely, if the discrepancy of a two-party function $f$ with respect to a distribution $μ$ is $Disc_μf$, then any two… ▽ More This paper provides the first general technique for proving information lower bounds on two-party unbounded-rounds communication problems. We show that the discrepancy lower bound, which applies to randomized communication complexity, also applies to information complexity. More precisely, if the discrepancy of a two-party function $f$ with respect to a distribution $μ$ is $Disc_μf$, then any two party randomized protocol computing $f$ must reveal at least $Ω(\log (1/Disc_μf))$ bits of information to the participants. As a corollary, we obtain that any two-party protocol for computing a random function on $\{0,1\}^n\times\{0,1\}^n$ must reveal $Ω(n)$ bits of information to the participants. In addition, we prove that the discrepancy of the Greater-Than function is $Ω(1/\sqrt{n})$, which provides an alternative proof to the recent proof of Viola \cite{Viola11} of the $Ω(\log n)$ lower bound on the communication complexity of this well-studied function and, combined with our main result, proves the tight $Ω(\log n)$ lower bound on its information complexity. The proof of our main result develops a new simulation procedure that may be of an independent interest. In a very recent breakthrough work of Kerenidis et al. \cite{kerenidis2012lower}, this simulation procedure was the main building block for proving that almost all known lower bound techniques for communication complexity (and not just discrepancy) apply to information complexity. △ Less

Submitted 12 June, 2012; v1 submitted 8 December, 2011; originally announced December 2011.

arXiv:1107.1358 [pdf, ps, other]

On the Furthest Hyperplane Problem and Maximal Margin Clustering

Authors: Zohar Karnin, Edo Liberty, Shachar Lovett, Roy Schwartz, Omri Weinstein

Abstract: This paper introduces the Furthest Hyperplane Problem (FHP), which is an unsupervised counterpart of Support Vector Machines. Given a set of n points in Rd, the objective is to produce the hyperplane (passing through the origin) which maximizes the separation margin, that is, the minimal distance between the hyperplane and any input point. To the best of our knowledge, this is the first paper achi… ▽ More This paper introduces the Furthest Hyperplane Problem (FHP), which is an unsupervised counterpart of Support Vector Machines. Given a set of n points in Rd, the objective is to produce the hyperplane (passing through the origin) which maximizes the separation margin, that is, the minimal distance between the hyperplane and any input point. To the best of our knowledge, this is the first paper achieving provable results regarding FHP. We provide both lower and upper bounds to this NP-hard problem. First, we give a simple randomized algorithm whose running time is n^O(1/θ^2) where θ is the optimal separation margin. We show that its exponential dependency on 1/θ^2 is tight, up to sub-polynomial factors, assuming SAT cannot be solved in sub-exponential time. Next, we give an efficient approxima- tion algorithm. For any α \in [0, 1], the algorithm produces a hyperplane whose distance from at least 1 - 5α fraction of the points is at least α times the optimal separation margin. Finally, we show that FHP does not admit a PTAS by presenting a gap preserving reduction from a particular version of the PCP theorem. △ Less

Submitted 2 February, 2012; v1 submitted 7 July, 2011; originally announced July 2011.

arXiv:1101.5345 [pdf, ps, other]

Approximating the Influence of a monotone Boolean function in O(\sqrt{n}) query complexity

Authors: Dana Ron, Ronitt Rubinfeld, Muli Safra, Omri Weinstein

Abstract: The {\em Total Influence} ({\em Average Sensitivity) of a discrete function is one of its fundamental measures. We study the problem of approximating the total influence of a monotone Boolean function \ifnum\plusminus=1 $f: \{\pm1\}^n \longrightarrow \{\pm1\}$, \else $f: \bitset^n \to \bitset$, \fi which we denote by $I[f]$. We present a randomized algorithm that approximates the influence of such… ▽ More The {\em Total Influence} ({\em Average Sensitivity) of a discrete function is one of its fundamental measures. We study the problem of approximating the total influence of a monotone Boolean function \ifnum\plusminus=1 $f: \{\pm1\}^n \longrightarrow \{\pm1\}$, \else $f: \bitset^n \to \bitset$, \fi which we denote by $I[f]$. We present a randomized algorithm that approximates the influence of such functions to within a multiplicative factor of $(1\pm \eps)$ by performing $O(\frac{\sqrt{n}\log n}{I[f]} \poly(1/\eps)) $ queries. % \mnote{D: say something about technique?} We also prove a lower bound of % $Ω(\frac{\sqrt{n/\log n}}{I[f]})$ $Ω(\frac{\sqrt{n}}{\log n \cdot I[f]})$ on the query complexity of any constant-factor approximation algorithm for this problem (which holds for $I[f] = Ω(1)$), % and $I[f] = O(\sqrt{n}/\log n)$), hence showing that our algorithm is almost optimal in terms of its dependence on $n$. For general functions we give a lower bound of $Ω(\frac{n}{I[f]})$, which matches the complexity of a simple sampling algorithm. △ Less

Submitted 27 January, 2011; originally announced January 2011.

Showing 1–35 of 35 results for author: Weinstein, O