-
Aiming for Relevance
Authors:
Bar Eini Porat,
Danny Eytan,
Uri Shalit
Abstract:
Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign predictio…
▽ More
Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care. 10 pages, 9 figures.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Sinusoidal Transmission Grating Spectrometer for EUV Measure
Authors:
N. Kliss,
J. Wengrowicz,
J. Papeer,
E. Porat,
A. Zigler,
Y. Frank
Abstract:
Spectral measurements play a vital role in understanding laser-plasma interactions. The ability to accurately measure the spectrum of radiation sources is crucial for unraveling the underlying physics. In this article, we introduce a novel approach that significantly enhances the efficiency of binary Sinusoidal Transmission Grating Spectrometers (STGS). The grating was tailored especially for Extr…
▽ More
Spectral measurements play a vital role in understanding laser-plasma interactions. The ability to accurately measure the spectrum of radiation sources is crucial for unraveling the underlying physics. In this article, we introduce a novel approach that significantly enhances the efficiency of binary Sinusoidal Transmission Grating Spectrometers (STGS). The grating was tailored especially for Extreme Ultraviolet (EUV) measurements. The new design, High Contrast Sinusoidal Transmission Grating (HCSTG), not only suppresses high diffraction orders and retains the advantageous properties of previous designs but also exhibits a fourfold improvement in first-order efficiency. In addition, the HCSTG offers exceptional purity in the first order due to effectively eliminating half-order contributions from the diffraction pattern. The HCSTG spectrometer was employed to measure the emission of laser-produced Sn plasma in the 1-50 nm spectral range, achieving spectral resolution of $λ/Δλ=60$. We provide a comprehensive analysis comparing the diffraction patterns of different STGs, highlighting the advantages offered by the HCSTG design. This novel, enhanced efficiency HCSTG spectrometer, opens new possibilities for accurate and sensitive EUV spectral measurements.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Charge transfer-induced Lifshitz transition and magnetic symmetry breaking in ultrathin CrSBr crystals
Authors:
Marco Bianchi,
Kimberly Hsieh,
Esben Juel Porat,
Florian Dirnberger,
Julian Klein,
Kseniia Mosina,
Zdenek Sofer,
Alexander N. Rudenko,
Mikhail I. Katsnelson,
Yong P. Chen,
Malte Rösner,
Philip Hofmann
Abstract:
Ultrathin CrSBr flakes are exfoliated \emph{in situ} on Au(111) and Ag(111) and their electronic structure is studied by angle-resolved photoemission spectroscopy. The thin flakes' electronic properties are drastically different from those of the bulk material and also substrate-dependent. For both substrates, a strong charge transfer to the flakes is observed, partly populating the conduction ban…
▽ More
Ultrathin CrSBr flakes are exfoliated \emph{in situ} on Au(111) and Ag(111) and their electronic structure is studied by angle-resolved photoemission spectroscopy. The thin flakes' electronic properties are drastically different from those of the bulk material and also substrate-dependent. For both substrates, a strong charge transfer to the flakes is observed, partly populating the conduction band and giving rise to a highly anisotropic Fermi contour with an Ohmic contact to the substrate. The fundamental CrSBr band gap is strongly renormalized compared to the bulk. The charge transfer to the CrSBr flake is substantially larger for Ag(111) than for Au(111), but a rigid energy shift of the chemical potential is insufficient to describe the observed band structure modifications. In particular, the Fermi contour shows a Lifshitz transition, the fundamental band gap undergoes a transition from direct on Au(111) to indirect on Ag(111) and a do**-induced symmetry breaking between the intra-layer Cr magnetic moments further modifies the band structure. Electronic structure calculations can account for non-rigid Lifshitz-type band structure changes in thin CrSBr as a function of do** and strain. In contrast to undoped bulk band structure calculations that require self-consistent $GW$ theory, the doped thin film properties are well-approximated by density functional theory if local Coulomb interactions are taken into account on the mean-field level and the charge transfer is considered.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
Simple Tests of Quantumness Also Certify Qubits
Authors:
Zvika Brakerski,
Alexandru Gheorghiu,
Gregory D. Kahanamoku-Meyer,
Eitan Porat,
Thomas Vidick
Abstract:
A test of quantumness is a protocol that allows a classical verifier to certify (only) that a prover is not classical. We show that tests of quantumness that follow a certain template, which captures recent proposals such as (Kalai et al., 2022), can in fact do much more. Namely, the same protocols can be used for certifying a qubit, a building-block that stands at the heart of applications such a…
▽ More
A test of quantumness is a protocol that allows a classical verifier to certify (only) that a prover is not classical. We show that tests of quantumness that follow a certain template, which captures recent proposals such as (Kalai et al., 2022), can in fact do much more. Namely, the same protocols can be used for certifying a qubit, a building-block that stands at the heart of applications such as certifiable randomness and classical delegation of quantum computation.
Certifying qubits was previously only known to be possible based on the hardness of the Learning with Errors problem and the use of adaptive hardcore (Brakerski et al., 2018). Our framework allows certification of qubits based only on the existence of post-quantum trapdoor claw-free functions, or on quantum fully homomorphic encryption. These can be instantiated, for example, from Ring Learning with Errors.
On the technical side, we show that the quantum soundness of any such protocol can be reduced to proving a bound on a simple algorithmic task: informally, answering ``two challenges simultaneously'' in the protocol. Our reduction formalizes the intuition that these protocols demonstrate quantumness by leveraging the impossibility of rewinding a general quantum prover. This allows us to prove tight bounds on the quantum soundness of (Kahanamoku-Meyer et al., 2021) and (Kalai et al., 2022), showing that no quantum polynomial-time prover can succeed with probability larger than $\cos^2 \fracπ{8}\approx 0.853$. Previously, only an upper bound on the success probability of classical provers, and a lower bound on the success probability of quantum provers, were known. We then extend this proof of quantum soundness to show that provers that approach the quantum soundness bound must perform almost anti-commuting measurements. This certifies that the prover holds a qubit.
△ Less
Submitted 18 May, 2023; v1 submitted 2 March, 2023;
originally announced March 2023.
-
ScrollTest: Evaluating Scrolling Speed and Accuracy
Authors:
Chaoran Chen,
Brad A. Myers,
Cem Ergin,
Emily Porat,
Sijia Li,
Chun Wang
Abstract:
Scrolling is an essential interaction technique enabling users to display previously off-screen content. Existing evaluation models for scrolling are often entangled with the selection of content, e.g., when scrolling on the phone for reading. Furthermore, some evaluation models overlook whether the user knows the target position. We have developed ScrollTest, a general-purpose evaluation tool for…
▽ More
Scrolling is an essential interaction technique enabling users to display previously off-screen content. Existing evaluation models for scrolling are often entangled with the selection of content, e.g., when scrolling on the phone for reading. Furthermore, some evaluation models overlook whether the user knows the target position. We have developed ScrollTest, a general-purpose evaluation tool for scrolling speed and accuracy that avoids the need for selection. We tested it across four dimensions: 11 different scrolling techniques/devices, 5 frame heights, 13 scrolling distances, and 2 scrolling conditions (i.e., with or without knowing the target position). The results show that flicking and two-finger scrolling are the fastest; flicking is also relatively precise for scrolling to targets already onscreen, but pressing arrow buttons on the scrollbar is the most accurate for scrolling to nearby targets. Mathematical models of scrolling are highly linear when the target position is unknown but like Fitts' law when known.
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
Iterative-Free Quantum Approximate Optimization Algorithm Using Neural Networks
Authors:
Ohad Amosy,
Tamuz Danzig,
Ely Porat,
Gal Chechik,
Adi Makmal
Abstract:
The quantum approximate optimization algorithm (QAOA) is a leading iterative variational quantum algorithm for heuristically solving combinatorial optimization problems. A large portion of the computational effort in QAOA is spent by the optimization steps, which require many executions of the quantum circuit. Therefore, there is active research focusing on finding better initial circuit parameter…
▽ More
The quantum approximate optimization algorithm (QAOA) is a leading iterative variational quantum algorithm for heuristically solving combinatorial optimization problems. A large portion of the computational effort in QAOA is spent by the optimization steps, which require many executions of the quantum circuit. Therefore, there is active research focusing on finding better initial circuit parameters, which would reduce the number of required iterations and hence the overall execution time. While existing methods for parameter initialization have shown great success, they often offer a single set of parameters for all problem instances. We propose a practical method that uses a simple, fully connected neural network that leverages previous executions of QAOA to find better initialization parameters tailored to a new given problem instance. We benchmark state-of-the-art initialization methods for solving the MaxCut problem of Erdős-Rényi graphs using QAOA and show that our method is consistently the fastest to converge while also yielding the best final result. Furthermore, the parameters predicted by the neural network are shown to match very well with the fully optimized parameters, to the extent that no iterative steps are required, thereby effectively realizing an iterative-free QAOA scheme.
△ Less
Submitted 21 August, 2022;
originally announced August 2022.
-
An Improved Algorithm for The $k$-Dyck Edit Distance Problem
Authors:
Dvir Fried,
Shay Golan,
Tomasz Kociumaka,
Tsvi Kopelowitz,
Ely Porat,
Tatiana Starikovskaya
Abstract:
A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses $S$ is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform $S$ into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses $S$ an…
▽ More
A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses $S$ is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform $S$ into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses $S$ and a positive integer $k$, and the goal is to compute the Dyck edit distance of $S$ only if the distance is at most $k$, and otherwise report that the distance is larger than $k$. Backurs and Onak [PODS'16] showed that the threshold Dyck edit distance problem can be solved in $O(n+k^{16})$ time.
In this work, we design new algorithms for the threshold Dyck edit distance problem which costs $O(n+k^{4.544184})$ time with high probability or $O(n+k^{4.853059})$ deterministically. Our algorithms combine several new structural properties of the Dyck edit distance problem, a refined algorithm for fast $(\min,+)$ matrix product, and a careful modification of ideas used in Valiant's parsing algorithm.
△ Less
Submitted 22 August, 2022; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Incremental Edge Orientation in Forests
Authors:
Michael A. Bender,
Tsvi Kopelowitz,
William Kuszmaul,
Ely Porat,
Clifford Stein
Abstract:
For any forest $G = (V, E)$ it is possible to orient the edges $E$ so that no vertex in $V$ has out-degree greater than $1$. This paper considers the incremental edge-orientation problem, in which the edges $E$ arrive over time and the algorithm must maintain a low-out-degree edge orientation at all times. We give an algorithm that maintains a maximum out-degree of $3$ while flip** at most…
▽ More
For any forest $G = (V, E)$ it is possible to orient the edges $E$ so that no vertex in $V$ has out-degree greater than $1$. This paper considers the incremental edge-orientation problem, in which the edges $E$ arrive over time and the algorithm must maintain a low-out-degree edge orientation at all times. We give an algorithm that maintains a maximum out-degree of $3$ while flip** at most $O(\log \log n)$ edge orientations per edge insertion, with high probability in $n$. The algorithm requires worst-case time $O(\log n \log \log n)$ per insertion, and takes amortized time $O(1)$. The previous state of the art required up to $O(\log n / \log \log n)$ edge flips per insertion.
We then apply our edge-orientation results to the problem of dynamic Cuckoo hashing. The problem of designing simple families $\mathcal{H}$ of hash functions that are compatible with Cuckoo hashing has received extensive attention. These families $\mathcal{H}$ are known to satisfy \emph{static guarantees}, but do not come typically with \emph{dynamic guarantees} for the running time of inserts and deletes. We show how to transform static guarantees (for $1$-associativity) into near-state-of-the-art dynamic guarantees (for $O(1)$-associativity) in a black-box fashion. Rather than relying on the family $\mathcal{H}$ to supply randomness, as in past work, we instead rely on randomness within our table-maintenance algorithm.
△ Less
Submitted 5 July, 2021;
originally announced July 2021.
-
Small space and streaming pattern matching with k edits
Authors:
Tomasz Kociumaka,
Ely Porat,
Tatiana Starikovskaya
Abstract:
In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer $k$, a pattern $P$ of length $m$, and a text $T$ of length $n \ge m$, the task is to find substrings of $T$ that are within edit distance $k$ from $P$. Our main result is a streaming algorithm that solves the problem in $\tilde{O}(k^5)$ space and $\tilde{O}(k^8)$…
▽ More
In this work, we revisit the fundamental and well-studied problem of approximate pattern matching under edit distance. Given an integer $k$, a pattern $P$ of length $m$, and a text $T$ of length $n \ge m$, the task is to find substrings of $T$ that are within edit distance $k$ from $P$. Our main result is a streaming algorithm that solves the problem in $\tilde{O}(k^5)$ space and $\tilde{O}(k^8)$ amortised time per character of the text, providing answers correct with high probability. (Hereafter, $\tilde{O}(\cdot)$ hides a $\mathrm{poly}(\log n)$ factor.) This answers a decade-old question: since the discovery of a $\mathrm{poly}(k\log n)$-space streaming algorithm for pattern matching under Hamming distance by Porat and Porat [FOCS 2009], the existence of an analogous result for edit distance remained open. Up to this work, no $\mathrm{poly}(k\log n)$-space algorithm was known even in the simpler semi-streaming model, where $T$ comes as a stream but $P$ is available for read-only access. In this model, we give a deterministic algorithm that achieves slightly better complexity.
In order to develop the fully streaming algorithm, we introduce a new edit distance sketch parametrised by integers $n\ge k$. For any string of length at most $n$, the sketch is of size $\tilde{O}(k^2)$ and it can be computed with an $\tilde{O}(k^2)$-space streaming algorithm. Given the sketches of two strings, in $\tilde{O}(k^3)$ time we can compute their edit distance or certify that it is larger than $k$. This result improves upon $\tilde{O}(k^8)$-size sketches of Belazzougui and Zhu [FOCS 2016] and very recent $\tilde{O}(k^3)$-size sketches of **, Nelson, and Wu [STACS 2021].
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Support Optimality and Adaptive Cuckoo Filters
Authors:
Tsvi Kopelowitz,
Samuel McCauley,
Ely Porat
Abstract:
Filters (such as Bloom Filters) are data structures that speed up network routing and measurement operations by storing a compressed representation of a set. Filters are space efficient, but can make bounded one-sided errors: with tunable probability epsilon, they may report that a query element is stored in the filter when it is not. This is called a false positive. Recent research has focused on…
▽ More
Filters (such as Bloom Filters) are data structures that speed up network routing and measurement operations by storing a compressed representation of a set. Filters are space efficient, but can make bounded one-sided errors: with tunable probability epsilon, they may report that a query element is stored in the filter when it is not. This is called a false positive. Recent research has focused on designing methods for dynamically adapting filters to false positives, reducing the number of false positives when some elements are queried repeatedly.
Ideally, an adaptive filter would incur a false positive with bounded probability epsilon for each new query element, and would incur o(epsilon) total false positives over all repeated queries to that element. We call such a filter support optimal.
In this paper we design a new Adaptive Cuckoo Filter and show that it is support optimal (up to additive logarithmic terms) over any n queries when storing a set of size n. Our filter is simple: fixing previous false positives requires a simple cuckoo operation, and the filter does not need to store any additional metadata. This data structure is the first practical data structure that is support optimal, and the first filter that does not require additional space to fix false positives.
We complement these bounds with experiments showing that our data structure is effective at fixing false positives on network traces, outperforming previous Adaptive Cuckoo Filters.
Finally, we investigate adversarial adaptivity, a stronger notion of adaptivity in which an adaptive adversary repeatedly queries the filter, using the result of previous queries to drive the false positive rate as high as possible. We prove a lower bound showing that a broad family of filters, including all known Adaptive Cuckoo Filters, can be forced by such an adversary to incur a large number of false positives.
△ Less
Submitted 21 May, 2021;
originally announced May 2021.
-
An $O(\log^{3/2}n)$ Parallel Time Population Protocol for Majority with $O(\log n)$ States
Authors:
Stav Ben-Nun,
Tsvi Kopelowitz,
Matan Kraus,
Ely Porat
Abstract:
In population protocols, the underlying distributed network consists of $n$ nodes (or agents), denoted by $V$, and a scheduler that continuously selects uniformly random pairs of nodes to interact. When two nodes interact, their states are updated by applying a state transition function that depends only on the states of the two nodes prior to the interaction. The efficiency of a population protoc…
▽ More
In population protocols, the underlying distributed network consists of $n$ nodes (or agents), denoted by $V$, and a scheduler that continuously selects uniformly random pairs of nodes to interact. When two nodes interact, their states are updated by applying a state transition function that depends only on the states of the two nodes prior to the interaction. The efficiency of a population protocol is measured in terms of both time (which is the number of interactions until the nodes collectively have a valid output) and the number of possible states of nodes used by the protocol. By convention, we consider the parallel time cost, which is the time divided by $n$.
In this paper we consider the majority problem, where each node receives as input a color that is either black or white, and the goal is to have all of the nodes output the color that is the majority of the input colors. We design a population protocol that solves the majority problem in $O(\log^{3/2}n)$ parallel time, both with high probability and in expectation, while using $O(\log n)$ states. Our protocol improves on a recent protocol of Berenbrink et al. that runs in $O(\log^{5/3}n)$ parallel time, both with high probability and in expectation, using $O(\log n)$ states.
△ Less
Submitted 25 November, 2020;
originally announced November 2020.
-
Plasma mirrors as a path to the Schwinger limit
Authors:
L. Chopineau,
A. Denoeud,
A. Leblanc,
E. Porat,
Ph. Martin,
H. Vincenti,
F. Quéré
Abstract:
Reaching light intensities above $10^{25}$ W/cm$^{2}$ and up to the Schwinger limit ($10^{29}$ W/cm$^{2}$) would enable testing decades-old fundamental predictions of Quantum Electrodynamics. A promising yet challenging approach to achieve such extreme fields consists in reflecting a high-power femtosecond laser pulse off a curved relativistic mirror. This enhances the intensity of the reflected b…
▽ More
Reaching light intensities above $10^{25}$ W/cm$^{2}$ and up to the Schwinger limit ($10^{29}$ W/cm$^{2}$) would enable testing decades-old fundamental predictions of Quantum Electrodynamics. A promising yet challenging approach to achieve such extreme fields consists in reflecting a high-power femtosecond laser pulse off a curved relativistic mirror. This enhances the intensity of the reflected beam by simultaneously compressing it in time down to the attosecond range, and focusing it to sub-micron focal spots. Here we show that such curved relativistic mirrors can be produced when an ultra-intense laser pulse ionizes a solid target and creates a dense plasma that specularly reflects the incident light. This is evidenced by measuring for the first time the temporal and spatial effects induced on the reflected beam by this so-called 'plasma mirror'. The all-optical measurement technique demonstrated here will be instrumental for the use of relativistic plasma mirrors with the emerging generation of Petawatt lasers, which constitutes a viable experimental path to the Schwinger limit.
△ Less
Submitted 10 July, 2020;
originally announced July 2020.
-
Improved Circular $k$-Mismatch Sketches
Authors:
Shay Golan,
Tomasz Kociumaka,
Tsvi Kopelowitz,
Ely Porat,
Przemysław Uznański
Abstract:
The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are given to two identical players (encoders), who inde…
▽ More
The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are given to two identical players (encoders), who independently compute sketches (summaries) $\mathtt{sk}(S_1)$ and $\mathtt{sk}(S_2)$, respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) $\mathsf{sh}(S_1,S_2)$ with high probability.
This paper primarily focuses on the more general $k$-mismatch version of the problem, where the decoder is allowed to declare a failure if $\mathsf{sh}(S_1,S_2)>k$, where $k$ is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular $k$-mismatch sketches of size $\widetilde{O}(k+D(n))$, where $D(n)$ is the number of divisors of $n$. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches.
We circumvent this lower bound by designing a (non-linear) exact circular $k$-mismatch sketch of size $\widetilde{O}(k)$; this size matches communication-complexity lower bounds. We also design $(1\pm \varepsilon)$-approximate circular $k$-mismatch sketch of size $\widetilde{O}(\min(\varepsilon^{-2}\sqrt{k}, \varepsilon^{-1.5}\sqrt{n}))$, which improves upon an $\widetilde{O}(\varepsilon^{-2}\sqrt{n})$-size sketch of Crouch and McGregor (APPROX'11).
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time
Authors:
Shay Golan,
Tomasz Kociumaka,
Tsvi Kopelowitz,
Ely Porat
Abstract:
We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$σ$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per character. The space complexity is known to be (uncondition…
▽ More
We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$σ$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\tilde O(n\sqrt k)$, and the fastest known offline algorithm, which costs $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},σn\big)\big)$ time. Moreover, it is not known whether improvements over the $\tilde O(n\sqrt k)$ total time are possible when using more than $O(k)$ space.
We address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\le s \le m$, uses $\tilde O(s)$ space and costs $\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{σnm}s\big)\big)$ total time. For $s=m$, the total runtime becomes $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},σn\big)\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\tilde O\big(\sqrt k\big)$.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Approximating Text-to-Pattern Hamming Distances
Authors:
Timothy M. Chan,
Shay Golan,
Tomasz Kociumaka,
Tsvi Kopelowitz,
Ely Porat
Abstract:
We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $σ$, compute the Hamming distance between the pattern and the text at every location. Several $(1+ε)$-approximation algorithms have been proposed in the literature, with running time of the form $O(ε^{-O(1)}n\log n\log m)$, all using fast Fourier transform (FFT). W…
▽ More
We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $σ$, compute the Hamming distance between the pattern and the text at every location. Several $(1+ε)$-approximation algorithms have been proposed in the literature, with running time of the form $O(ε^{-O(1)}n\log n\log m)$, all using fast Fourier transform (FFT). We describe a simple $(1+ε)$-approximation algorithm that is faster and does not need FFT. Combining our approach with additional ideas leads to numerous new results:
- We obtain the first linear-time approximation algorithm; the running time is $O(ε^{-2}n)$.
- We obtain a faster exact algorithm computing all Hamming distances up to a given threshold k; its running time improves previous results by logarithmic factors and is linear if $k\le\sqrt m$.
- We obtain approximation algorithms with better $ε$-dependence using rectangular matrix multiplication. The time-bound is $Õ(n)$ when the pattern is sufficiently long: $m\ge ε^{-28}$. Previous algorithms require $Õ(ε^{-1}n)$ time.
- When k is not too small, we obtain a truly sublinear-time algorithm to find all locations with Hamming distance approximately (up to a constant factor) less than k, in $O((n/k^{Ω(1)}+occ)n^{o(1)})$ time, where occ is the output size. The algorithm leads to a property tester, returning true if an exact match exists and false if the Hamming distance is more than $δm$ at every location, running in $Õ(δ^{-1/3}n^{2/3}+δ^{-1}n/m)$ time.
- We obtain a streaming algorithm to report all locations with Hamming distance approximately less than k, using $Õ(ε^{-2}\sqrt k)$ space. Previously, streaming algorithms were known for the exact problem with Õ(k) space or for the approximate problem with $Õ(ε^{-O(1)}\sqrt m)$ space.
△ Less
Submitted 1 January, 2020;
originally announced January 2020.
-
$\{-1,0,1\}$-APSP and (min,max)-Product Problems
Authors:
Hodaya Barr,
Tsvi Kopelowitz,
Ely Porat,
Liam Roditty
Abstract:
In the $\{-1,0,1\}$-APSP problem the goal is to compute all-pairs shortest paths (APSP) on a directed graph whose edge weights are all from $\{-1,0,1\}$. In the (min,max)-product problem the input is two $n\times n$ matrices $A$ and $B$, and the goal is to output the (min,max)-product of $A$ and $B$.
This paper provides a new algorithm for the $\{-1,0,1\}$-APSP problem via a simple reduction to…
▽ More
In the $\{-1,0,1\}$-APSP problem the goal is to compute all-pairs shortest paths (APSP) on a directed graph whose edge weights are all from $\{-1,0,1\}$. In the (min,max)-product problem the input is two $n\times n$ matrices $A$ and $B$, and the goal is to output the (min,max)-product of $A$ and $B$.
This paper provides a new algorithm for the $\{-1,0,1\}$-APSP problem via a simple reduction to the target-(min,max)-product problem where the input is three $n\times n$ matrices $A,B$, and $T$, and the goal is to output a Boolean $n\times n$ matrix $C$ such that the $(i,j)$ entry of $C$ is 1 if and only if the $(i,j)$ entry of the (min,max)-product of $A$ and $B$ is exactly the $(i,j)$ entry of the target matrix $T$. If (min,max)-product can be solved in $T_{MM}(n) = Ω(n^2)$ time then it is straightforward to solve target-(min,max)-product in $O(T_{MM}(n))$ time. Thus, given the recent result of Bringmann, Künnemann, and Wegrzycki [STOC 2019], the $\{-1,0,1\}$-APSP problem can be solved in the same time needed for solving approximate APSP on graphs with positive weights.
Moreover, we design a simple algorithm for target-(min,max)-product when the inputs are restricted to the family of inputs generated by our reduction. Using fast rectangular matrix multiplication, the new algorithm is faster than the current best known algorithm for (min,max)-product.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
On the Hardness of Set Disjointness and Set Intersection with Bounded Universe
Authors:
Isaac Goldstein,
Moshe Lewenstein,
Ely Porat
Abstract:
In the SetDisjointness problem, a collection of $m$ sets $S_1,S_2,...,S_m$ from some universe $U$ is preprocessed in order to answer queries on the emptiness of the intersection of some two query sets from the collection. In the SetIntersection variant, all the elements in the intersection of the query sets are required to be reported. These are two fundamental problems that were considered in sev…
▽ More
In the SetDisjointness problem, a collection of $m$ sets $S_1,S_2,...,S_m$ from some universe $U$ is preprocessed in order to answer queries on the emptiness of the intersection of some two query sets from the collection. In the SetIntersection variant, all the elements in the intersection of the query sets are required to be reported. These are two fundamental problems that were considered in several papers from both the upper bound and lower bound perspective.
Several conditional lower bounds for these problems were proven for the tradeoff between preprocessing and query time or the tradeoff between space and query time. Moreover, there are several unconditional hardness results for these problems in some specific computational models. The fundamental nature of the SetDisjointness and SetIntersection problems makes them useful for proving the conditional hardness of other problems from various areas. However, the universe of the elements in the sets may be very large, which may cause the reduction to some other problems to be inefficient and therefore it is not useful for proving their conditional hardness.
In this paper, we prove the conditional hardness of SetDisjointness and SetIntersection with bounded universe. This conditional hardness is shown for both the interplay between preprocessing and query time and the interplay between space and query time. Moreover, we present several applications of these new conditional lower bounds. These applications demonstrates the strength of our new conditional lower bounds as they exploit the limited universe size. We believe that this new framework of conditional lower bounds with bounded universe can be useful for further significant applications.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
The Strong 3SUM-INDEXING Conjecture is False
Authors:
Tsvi Kopelowitz,
Ely Porat
Abstract:
In the 3SUM-Indexing problem the goal is to preprocess two lists of elements from $U$, $A=(a_1,a_2,\ldots,a_n)$ and $B=(b_1,b_2,...,b_n)$, such that given an element $c\in U$ one can quickly determine whether there exists a pair $(a,b)\in A \times B$ where $a+b=c$. Goldstein et al.~[WADS'2017] conjectured that there is no algorithm for 3SUM-Indexing which uses $n^{2-Ω(1)}$ space and $n^{1-Ω(1)}$ q…
▽ More
In the 3SUM-Indexing problem the goal is to preprocess two lists of elements from $U$, $A=(a_1,a_2,\ldots,a_n)$ and $B=(b_1,b_2,...,b_n)$, such that given an element $c\in U$ one can quickly determine whether there exists a pair $(a,b)\in A \times B$ where $a+b=c$. Goldstein et al.~[WADS'2017] conjectured that there is no algorithm for 3SUM-Indexing which uses $n^{2-Ω(1)}$ space and $n^{1-Ω(1)}$ query time.
We show that the conjecture is false by reducing the 3SUM-Indexing problem to the problem of inverting functions, and then applying an algorithm of Fiat and Naor [SICOMP'1999] for inverting functions.
△ Less
Submitted 25 July, 2019;
originally announced July 2019.
-
Locally Consistent Parsing for Text Indexing in Small Space
Authors:
Or Birenzwige,
Shay Golan,
Ely Porat
Abstract:
We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction of a set of suffixes $B$ using only $O(|B|)$ words of space. The second problem is the Longest Common Extension (LCE) problem, where for some parameter $1\leτ\le n$, the goal is to construct a data structure that uses $O(\frac {n}τ)$ words of space…
▽ More
We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction of a set of suffixes $B$ using only $O(|B|)$ words of space. The second problem is the Longest Common Extension (LCE) problem, where for some parameter $1\leτ\le n$, the goal is to construct a data structure that uses $O(\frac {n}τ)$ words of space and can compute the longest common prefix length of any pair of suffixes. We show how to use ideas based on the Locally Consistent Parsing technique, that was introduced by Sahinalp and Vishkin [STOC '94], in some non-trivial ways in order to improve the known results for the above problems. We introduce new Las-Vegas and deterministic algorithms for both problems.
We introduce the first Las-Vegas SST construction algorithm that takes $O(n)$ time. This is an improvement over the last result of Gawrychowski and Kociumaka [SODA '17] who obtained $O(n)$ time for Monte-Carlo algorithm, and $O(n\sqrt{\log |B|})$ time for Las-Vegas algorithm. In addition, we introduce a randomized Las-Vegas construction for an LCE data structure that can be constructed in linear time and answers queries in $O(τ)$ time.
For the deterministic algorithms, we introduce an SST construction algorithm that takes $O(n\log \frac{n}{|B|})$ time (for $|B|=Ω(\log n)$). This is the first almost linear time, $O(n\cdot poly\log{n})$, deterministic SST construction algorithm, where all previous algorithms take at least $Ω\left(\min\{n|B|,\frac{n^2}{|B|}\}\right)$ time. For the LCE problem, we introduce a data structure that answers LCE queries in $O(τ\sqrt{\log^*n})$ time, with $O(n\logτ)$ construction time (for $τ=O(\frac{n}{\log n})$). This data structure improves both query time and construction time upon the results of Tanimura et al. [CPM '16].
△ Less
Submitted 1 January, 2020; v1 submitted 2 December, 2018;
originally announced December 2018.
-
Improved Space-Time Tradeoffs for kSUM
Authors:
Isaac Goldstein,
Moshe Lewenstein,
Ely Porat
Abstract:
In the kSUM problem we are given an array of numbers $a_1,a_2,...,a_n$ and we are required to determine if there are $k$ different elements in this array such that their sum is 0. This problem is a parameterized version of the well-studied SUBSET-SUM problem, and a special case is the 3SUM problem that is extensively used for proving conditional hardness. Several works investigated the interplay b…
▽ More
In the kSUM problem we are given an array of numbers $a_1,a_2,...,a_n$ and we are required to determine if there are $k$ different elements in this array such that their sum is 0. This problem is a parameterized version of the well-studied SUBSET-SUM problem, and a special case is the 3SUM problem that is extensively used for proving conditional hardness. Several works investigated the interplay between time and space in the context of SUBSET-SUM. Recently, improved time-space tradeoffs were proven for kSUM using both randomized and deterministic algorithms.
In this paper we obtain an improvement over the best known results for the time-space tradeoff for kSUM. A major ingredient in achieving these results is a general self-reduction from kSUM to mSUM where $m<k$, and several useful observations that enable this reduction and its implications. The main results we prove in this paper include the following: (i) The best known Las Vegas solution to kSUM running in approximately $O(n^{k-δ\sqrt{2k}})$ time and using $O(n^δ)$ space, for $0 \leq δ\leq 1$. (ii) The best known deterministic solution to kSUM running in approximately $O(n^{k-δ\sqrt{k}})$ time and using $O(n^δ)$ space, for $0 \leq δ\leq 1$. (iii) A space-time tradeoff for solving kSUM using $O(n^δ)$ space, for $δ>1$. (iv) An algorithm for 6SUM running in $O(n^4)$ time using just $O(n^{2/3})$ space. (v) A solution to 3SUM on random input using $O(n^2)$ time and $O(n^{1/3})$ space, under the assumption of a random read-only access to random bits.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
Improved Worst-Case Deterministic Parallel Dynamic Minimum Spanning Forest
Authors:
Tsvi Kopelowitz,
Ely Porat,
Yair Rosenmutter
Abstract:
This paper gives a new deterministic algorithm for the dynamic Minimum Spanning Forest (MSF) problem in the EREW PRAM model, where the goal is to maintain a MSF of a weighted graph with $n$ vertices and $m$ edges while supporting edge insertions and deletions. We show that one can solve the dynamic MSF problem using $O(\sqrt n)$ processors and $O(\log n)$ worst-case update time, for a total of…
▽ More
This paper gives a new deterministic algorithm for the dynamic Minimum Spanning Forest (MSF) problem in the EREW PRAM model, where the goal is to maintain a MSF of a weighted graph with $n$ vertices and $m$ edges while supporting edge insertions and deletions. We show that one can solve the dynamic MSF problem using $O(\sqrt n)$ processors and $O(\log n)$ worst-case update time, for a total of $O(\sqrt n \log n)$ work. This improves on the work of Ferragina [IPPS 1995] which costs $O(\log n)$ worst-case update time and $O(n^{2/3} \log{\frac{m}{n}})$ work.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
Orthogonal Vectors Indexing
Authors:
Isaac Goldstein,
Moshe Lewenstein,
Ely Porat
Abstract:
In the recent years, intensive research work has been dedicated to prove conditional lower bounds in order to reveal the inner structure of the class P. These conditional lower bounds are based on many popular conjectures on well-studied problems. One of the most heavily used conjectures is the celebrated Strong Exponential Time Hypothesis (SETH). It turns out that conditional hardness proved base…
▽ More
In the recent years, intensive research work has been dedicated to prove conditional lower bounds in order to reveal the inner structure of the class P. These conditional lower bounds are based on many popular conjectures on well-studied problems. One of the most heavily used conjectures is the celebrated Strong Exponential Time Hypothesis (SETH). It turns out that conditional hardness proved based on SETH goes, in many cases, through an intermediate problem - the Orthogonal Vectors (OV) problem.
Almost all research work regarding conditional lower bound was concentrated on time complexity. Very little attention was directed toward space complexity. In a recent work, Goldstein et al.[WADS 2017] set the stage for proving conditional lower bounds regarding space and its interplay with time. In this spirit, it is tempting to investigate the space complexity of a data structure variant of OV which is called \emph{OV indexing}. In this problem $n$ boolean vectors of size $c\log{n}$ are given for preprocessing. As a query, a vector $v$ is given and we are required to verify if there is an input vector that is orthogonal to it or not.
This OV indexing problem is interesting in its own, but it also likely to have strong implications on problems known to be conditionally hard, in terms of time complexity, based on OV. Having this in mind, we study OV indexing in this paper from many aspects. We give some space-efficient algorithms for the problem, show a tradeoff between space and query time, describe how to solve its reporting variant, shed light on an interesting connection between this problem and the well-studied SetDisjointness problem and demonstrate how it can be solved more efficiently on random input.
△ Less
Submitted 3 October, 2017; v1 submitted 2 October, 2017;
originally announced October 2017.
-
The streaming $k$-mismatch problem
Authors:
Raphaël Clifford,
Tomasz Kociumaka,
Ely Porat
Abstract:
We consider the streaming complexity of a fundamental task in approximate pattern matching: the $k$-mismatch problem. It asks to compute Hamming distances between a pattern of length $n$ and all length-$n$ substrings of a text for which the Hamming distance does not exceed a given threshold $k$. In our problem formulation, we report not only the Hamming distance but also, on demand, the full \emph…
▽ More
We consider the streaming complexity of a fundamental task in approximate pattern matching: the $k$-mismatch problem. It asks to compute Hamming distances between a pattern of length $n$ and all length-$n$ substrings of a text for which the Hamming distance does not exceed a given threshold $k$. In our problem formulation, we report not only the Hamming distance but also, on demand, the full \emph{mismatch information}, that is the list of mismatched pairs of symbols and their indices. The twin challenges of streaming pattern matching derive from the need both to achieve small working space and also to guarantee that every arriving input symbol is processed quickly.
We present a streaming algorithm for the $k$-mismatch problem which uses $O(k\log{n}\log\frac{n}{k})$ bits of space and spends \ourcomplexity time on each symbol of the input stream, which consists of the pattern followed by the text. The running time almost matches the classic offline solution and the space usage is within a logarithmic factor of optimal.
Our new algorithm therefore effectively resolves and also extends an open problem first posed in FOCS'09. En route to this solution, we also give a deterministic $O( k (\log \frac{n}{k} + \log |Σ|) )$-bit encoding of all the alignments with Hamming distance at most $k$ of a length-$n$ pattern within a text of length $O(n)$. This secondary result provides an optimal solution to a natural communication complexity problem which may be of independent interest.
△ Less
Submitted 9 April, 2018; v1 submitted 17 August, 2017;
originally announced August 2017.
-
Conditional Lower Bounds for Space/Time Tradeoffs
Authors:
Isaac Goldstein,
Tsvi Kopelowitz,
Moshe Lewenstein,
Ely Porat
Abstract:
In recent years much effort has been concentrated towards achieving polynomial time lower bounds on algorithms for solving various well-known problems. A useful technique for showing such lower bounds is to prove them conditionally based on well-studied hardness assumptions such as 3SUM, APSP, SETH, etc. This line of research helps to obtain a better understanding of the complexity inside P.
A r…
▽ More
In recent years much effort has been concentrated towards achieving polynomial time lower bounds on algorithms for solving various well-known problems. A useful technique for showing such lower bounds is to prove them conditionally based on well-studied hardness assumptions such as 3SUM, APSP, SETH, etc. This line of research helps to obtain a better understanding of the complexity inside P.
A related question asks to prove conditional space lower bounds on data structures that are constructed to solve certain algorithmic tasks after an initial preprocessing stage. This question received little attention in previous research even though it has potential strong impact.
In this paper we address this question and show that surprisingly many of the well-studied hard problems that are known to have conditional polynomial time lower bounds are also hard when concerning space. This hardness is shown as a tradeoff between the space consumed by the data structure and the time needed to answer queries. The tradeoff may be either smooth or admit one or more singularity points.
We reveal interesting connections between different space hardness conjectures and present matching upper bounds. We also apply these hardness conjectures to both static and dynamic problems and prove their conditional space hardness.
We believe that this novel framework of polynomial space conjectures can play an important role in expressing polynomial space lower bounds of many important algorithmic problems. Moreover, it seems that it can also help in achieving a better understanding of the hardness of their corresponding problems in terms of time.
△ Less
Submitted 25 July, 2017; v1 submitted 19 June, 2017;
originally announced June 2017.
-
How Hard is it to Find (Honest) Witnesses?
Authors:
Isaac Goldstein,
Tsvi Kopelowitz,
Moshe Lewenstein,
Ely Porat
Abstract:
In recent years much effort was put into develo** polynomial-time conditional lower bounds for algorithms and data structures in both static and dynamic settings. Along these lines we suggest a framework for proving conditional lower bounds based on the well-known 3SUM conjecture. Our framework creates a \emph{compact representation} of an instance of the 3SUM problem using hashing and domain sp…
▽ More
In recent years much effort was put into develo** polynomial-time conditional lower bounds for algorithms and data structures in both static and dynamic settings. Along these lines we suggest a framework for proving conditional lower bounds based on the well-known 3SUM conjecture. Our framework creates a \emph{compact representation} of an instance of the 3SUM problem using hashing and domain specific encoding. This compact representation admits false solutions to the original 3SUM problem instance which we reveal and eliminate until we find a true solution. In other words, from all \emph{witnesses} (candidate solutions) we figure out if an \emph{honest} one (a true solution) exists. This enumeration of witnesses is used to prove conditional lower bound on \emph{reporting} problems that generate all witnesses. In turn, these reporting problems are reduced to various decision problems. These help to enumerate the witnesses by constructing appropriate search data structures. Hence, 3SUM-hardness of the decision problems is deduced.
We utilize this framework to show conditional lower bounds for several variants of convolutions, matrix multiplication and string problems. Our framework uses a strong connection between all of these problems and the ability to find \emph{witnesses}.
While these specific applications are used to demonstrate the techniques of our framework, we believe that this novel framework is useful for many other problems as well.
△ Less
Submitted 19 June, 2017;
originally announced June 2017.
-
Streaming Pattern Matching with d Wildcards
Authors:
Shay Golan,
Tsvi Kopelowitz,
Ely Porat
Abstract:
In the pattern matching with $d$ wildcards problem one is given a text $T$ of length $n$ and a pattern $P$ of length $m$ that contains $d$ wildcard characters, each denoted by a special symbol $'?'$. A wildcard character matches any other character. The goal is to establish for each $m$-length substring of $T$ whether it matches $P$. In the streaming model variant of the pattern matching with $d$…
▽ More
In the pattern matching with $d$ wildcards problem one is given a text $T$ of length $n$ and a pattern $P$ of length $m$ that contains $d$ wildcard characters, each denoted by a special symbol $'?'$. A wildcard character matches any other character. The goal is to establish for each $m$-length substring of $T$ whether it matches $P$. In the streaming model variant of the pattern matching with $d$ wildcards problem the text $T$ arrives one character at a time and the goal is to report, before the next character arrives, if the last $m$ characters match $P$ while using only $o(m)$ words of space.
In this paper we introduce two new algorithms for the $d$ wildcard pattern matching problem in the streaming model. The first is a randomized Monte Carlo algorithm that is parameterized by a constant $0\leq δ\leq 1$. This algorithm uses $\tilde{O}(d^{1-δ})$ amortized time per character and $\tilde{O}(d^{1+δ})$ words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses $O(d+\log m)$ worst-case time per character and $O(d\log m)$ words of space.
△ Less
Submitted 5 April, 2017;
originally announced April 2017.
-
Identification of Parallel Passages Across a Large Hebrew/Aramaic Corpus
Authors:
Avi Shmidman,
Moshe Koppel,
Ely Porat
Abstract:
We propose a method for efficiently finding all parallel passages in a large corpus, even if the passages are not quite identical due to rephrasing and orthographic variation. The key ideas are the representation of each word in the corpus by its two most infrequent letters, finding matched pairs of strings of four or five words that differ by at most one word and then identifying clusters of such…
▽ More
We propose a method for efficiently finding all parallel passages in a large corpus, even if the passages are not quite identical due to rephrasing and orthographic variation. The key ideas are the representation of each word in the corpus by its two most infrequent letters, finding matched pairs of strings of four or five words that differ by at most one word and then identifying clusters of such matched pairs. Using this method, over 4600 parallel pairs of passages were identified in the Babylonian Talmud, a Hebrew-Aramaic corpus of over 1.8 million words, in just over 30 seconds. Empirical comparisons on sample data indicate that the coverage obtained by our method is essentially the same as that obtained using slow exhaustive methods.
△ Less
Submitted 31 December, 2017; v1 submitted 28 February, 2016;
originally announced February 2016.
-
Breaking the Variance: Approximating the Hamming Distance in $\tilde O(1/ε)$ Time Per Alignment
Authors:
Tsvi Kopelowitz,
Ely Porat
Abstract:
The algorithmic tasks of computing the Hamming distance between a given pattern of length $m$ and each location in a text of length $n$ is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a text $T$ of size $n$ and a pattern $P$ of size $m$, one cannot compute the exact Hamming distance for all locations in $T$ in time which is less than…
▽ More
The algorithmic tasks of computing the Hamming distance between a given pattern of length $m$ and each location in a text of length $n$ is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a text $T$ of size $n$ and a pattern $P$ of size $m$, one cannot compute the exact Hamming distance for all locations in $T$ in time which is less than $\tilde O(n\sqrt m)$. However, Karloff~\cite{karloff} showed that if one is willing to suffer a $1\pmε$ approximation, then it is possible to solve the problem with high probability, in $\tilde O(\frac n {ε^2})$ time.
Due to related lower bounds for computing the Hamming distance of two strings in the one-way communication complexity model, it is strongly believed that obtaining an algorithm for solving the approximation version cannot be done much faster as a function of $\frac 1 ε$. We show here that this belief is false by introducing a new $\tilde O(\frac{n}ε)$ time algorithm that succeeds with high probability.
The main idea behind our algorithm, which is common in sparse recovery problems, is to reduce the variance of a specific randomized experiment by (approximately) separating heavy hitters from non-heavy hitters. However, while known sparse recovery techniques work very well on vectors, they do not seem to apply here, where we are dealing with mismatches between pairs of characters. We introduce two main algorithmic ingredients. The first is a new sparse recovery method that applies for pair inputs (such as in our setting). The second is a new construction of hash/projection functions, which allows to count the number of projections that induce mismatches between two characters exponentially faster than brute force. We expect that these algorithmic techniques will be of independent interest.
△ Less
Submitted 14 December, 2015;
originally announced December 2015.
-
The k-mismatch problem revisited
Authors:
Raphaël Clifford,
Allyx Fontaine,
Ely Porat,
Benjamin Sach,
Tatiana Starikovskaya
Abstract:
We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No".
We study this…
▽ More
We revisit the complexity of one of the most basic problems in pattern matching. In the k-mismatch problem we must compute the Hamming distance between a pattern of length m and every m-length substring of a text of length n, as long as that Hamming distance is at most k. Where the Hamming distance is greater than k at some alignment of the pattern and text, we simply output "No".
We study this problem in both the standard offline setting and also as a streaming problem. In the streaming k-mismatch problem the text arrives one symbol at a time and we must give an output before processing any future symbols. Our main results are as follows:
1) Our first result is a deterministic $O(n k^2\log{k} / m+n \text{polylog} m)$ time offline algorithm for k-mismatch on a text of length n. This is a factor of k improvement over the fastest previous result of this form from SODA 2000 by Amihood Amir et al.
2) We then give a randomised and online algorithm which runs in the same time complexity but requires only $O(k^2\text{polylog} {m})$ space in total.
3) Next we give a randomised $(1+ε)$-approximation algorithm for the streaming k-mismatch problem which uses $O(k^2\text{polylog} m / ε^2)$ space and runs in $O(\text{polylog} m / ε^2)$ worst-case time per arriving symbol.
4) Finally we combine our new results to derive a randomised $O(k^2\text{polylog} {m})$ space algorithm for the streaming k-mismatch problem which runs in $O(\sqrt{k}\log{k} + \text{polylog} {m})$ worst-case time per arriving symbol. This improves the best previous space complexity for streaming k-mismatch from FOCS 2009 by Benny Porat and Ely Porat by a factor of k. We also improve the time complexity of this previous result by an even greater factor to match the fastest known offline algorithm (up to logarithmic factors).
△ Less
Submitted 27 August, 2015; v1 submitted 4 August, 2015;
originally announced August 2015.
-
Distance labeling schemes for trees
Authors:
Stephen Alstrup,
Inge Li Gørtz,
Esben Bistrup Halvorsen,
Ely Porat
Abstract:
We consider distance labeling schemes for trees: given a tree with $n$ nodes, label the nodes with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the distance in the tree between the two nodes.
A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg (J. Graph Theory 2000) establish that labels must use…
▽ More
We consider distance labeling schemes for trees: given a tree with $n$ nodes, label the nodes with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the distance in the tree between the two nodes.
A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg (J. Graph Theory 2000) establish that labels must use $Θ(\log^2 n)$ bits\footnote{Throughout this paper we use $\log$ for $\log_2$.}. Gavoille et. al. (ESA 2001) show that for very small approximate stretch, labels use $Θ(\log n \log \log n)$ bits. Several other papers investigate various variants such as, for example, small distances in trees (Alstrup et. al., SODA'03).
We improve the known upper and lower bounds of exact distance labeling by showing that $\frac{1}{4} \log^2 n$ bits are needed and that $\frac{1}{2} \log^2 n$ bits are sufficient. We also give ($1+ε$)-stretch labeling schemes using $Θ(\log n)$ bits for constant $ε>0$. ($1+ε$)-stretch labeling schemes with polylogarithmic label size have previously been established for doubling dimension graphs by Talwar (STOC 2004).
In addition, we present matching upper and lower bounds for distance labeling for caterpillars, showing that labels must have size $2\log n - Θ(\log\log n)$. For simple paths with $k$ nodes and edge weights in $[1,n]$, we show that labels must have size $\frac{k-1}{k}\log n+Θ(\log k)$.
△ Less
Submitted 14 July, 2015;
originally announced July 2015.
-
Sublinear Distance Labeling
Authors:
Stephen Alstrup,
Søren Dahlgaard,
Mathias Bæk Tejs Knudsen,
Ely Porat
Abstract:
A distance labeling scheme labels the $n$ nodes of a graph with binary strings such that, given the labels of any two nodes, one can determine the distance in the graph between the two nodes by looking only at the labels. A $D$-preserving distance labeling scheme only returns precise distances between pairs of nodes that are at distance at least $D$ from each other. In this paper we consider dista…
▽ More
A distance labeling scheme labels the $n$ nodes of a graph with binary strings such that, given the labels of any two nodes, one can determine the distance in the graph between the two nodes by looking only at the labels. A $D$-preserving distance labeling scheme only returns precise distances between pairs of nodes that are at distance at least $D$ from each other. In this paper we consider distance labeling schemes for the classical case of unweighted graphs with both directed and undirected edges.
We present a $O(\frac{n}{D}\log^2 D)$ bit $D$-preserving distance labeling scheme, improving the previous bound by Bollobás et. al. [SIAM J. Discrete Math. 2005]. We also give an almost matching lower bound of $Ω(\frac{n}{D})$. With our $D$-preserving distance labeling scheme as a building block, we additionally achieve the following results:
1. We present the first distance labeling scheme of size $o(n)$ for sparse graphs (and hence bounded degree graphs). This addresses an open problem by Gavoille et. al. [J. Algo. 2004], hereby separating the complexity from distance labeling in general graphs which require $Ω(n)$ bits, Moon [Proc. of Glasgow Math. Association 1965].
2. For approximate $r$-additive labeling schemes, that return distances within an additive error of $r$ we show a scheme of size $O\left ( \frac{n}{r} \cdot\frac{\operatorname{polylog} (r\log n)}{\log n} \right )$ for $r \ge 2$. This improves on the current best bound of $O\left(\frac{n}{r}\right)$ by Alstrup et. al. [SODA 2016] for sub-polynomial $r$, and is a generalization of a result by Gawrychowski et al. [arXiv preprint 2015] who showed this for $r=2$.
△ Less
Submitted 8 September, 2016; v1 submitted 9 July, 2015;
originally announced July 2015.
-
Dictionary matching in a stream
Authors:
Raphael Clifford,
Allyx Fontaine,
Ely Porat,
Benjamin Sach,
Tatiana Starikovskaya
Abstract:
We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log m) words of space, where k is the number of strin…
▽ More
We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the stream. We present a randomised algorithm which takes O(log log(k + m)) time per arriving character and uses O(k log m) words of space, where k is the number of strings in the dictionary and m is the length of the longest string in the dictionary.
△ Less
Submitted 23 April, 2015;
originally announced April 2015.
-
Mind the Gap
Authors:
Amihood Amir,
Tsvi Kopelowitz,
Avivit Levy,
Seth Pettie,
Ely Porat,
B. Riva Shalom
Abstract:
We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary $D$ of $d$ patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from $D$ that are suffixes of the text that has arrived so far, befo…
▽ More
We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary $D$ of $d$ patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from $D$ that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Finding efficient algorithmic solutions for (online) DMOG has proven to be a difficult algorithmic challenge. We demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem even, in the offline setting, can be traced back to the infamous 3SUM conjecture. Interestingly, our reduction deviates from the known reduction paths that follow from 3SUM. In particular, most reductions from 3SUM go through the set-disjointness problem, which corresponds to the problem of preprocessing a graph to answer edge-triangles queries. We use a new path of reductions by considering the complementary, although structurally very different, vertex-triangles queries. Using this new path we show a conditional lower bound of $Ω(δ(G_D)+op)$ time per text character, where $G_D$ is a bipartite graph that captures the structure of $D$, $δ(G_D)$ is the degeneracy of this graph, and $op$ is the output size. We also provide matching upper-bounds (up to sub-polynomial factors) for the vertex-triangles problem, and then extend these techniques to the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on $δ(G_D)$. Our algorithms make use of graph orientations, together with some additional techniques. Finally, when $δ(G_D)$ is large we are able to obtain even more efficient solutions.
△ Less
Submitted 9 July, 2015; v1 submitted 25 March, 2015;
originally announced March 2015.
-
Polynomials: a new tool for length reduction in binary discrete convolutions
Authors:
Amihood Amir,
Oren Kapah,
Ely Porat,
Amir Rothschild
Abstract:
Efficient handling of sparse data is a key challenge in Computer Science. Binary convolutions, such as polynomial multiplication or the Walsh Transform are a useful tool in many applications and are efficiently solved.
In the last decade, several problems required efficient solution of sparse binary convolutions. both randomized and deterministic algorithms were developed for efficiently computi…
▽ More
Efficient handling of sparse data is a key challenge in Computer Science. Binary convolutions, such as polynomial multiplication or the Walsh Transform are a useful tool in many applications and are efficiently solved.
In the last decade, several problems required efficient solution of sparse binary convolutions. both randomized and deterministic algorithms were developed for efficiently computing the sparse polynomial multiplication. The key operation in all these algorithms was length reduction. The sparse data is mapped into small vectors that preserve the convolution result. The reduction method used to-date was the modulo function since it preserves location (of the "1" bits) up to cyclic shift.
To date there is no known efficient algorithm for computing the sparse Walsh transform. Since the modulo function does not preserve the Walsh transform a new method for length reduction is needed. In this paper we present such a new method - polynomials. This method enables the development of an efficient algorithm for computing the binary sparse Walsh transform. To our knowledge, this is the first such algorithm. We also show that this method allows a faster deterministic computation of sparse polynomial multiplication than currently known in the literature.
△ Less
Submitted 21 October, 2014;
originally announced October 2014.
-
Magnetoresistance Anisotropy in Amorphous Superconducting Thin Films: A Site-Bond Percolation Approach
Authors:
Elkana Porat,
Yigal Meir
Abstract:
Recent measurements of the magnetoresistance (MR) of amorphous superconducting thin films in tilted magnetic fields have displayed several surprising experimental details, in particular a strong dependence of the MR on field angle at low magnetic fields, which diminishes and then changes sign at large fields. Using a generalized site-bond percolation model, that takes into account both the orbital…
▽ More
Recent measurements of the magnetoresistance (MR) of amorphous superconducting thin films in tilted magnetic fields have displayed several surprising experimental details, in particular a strong dependence of the MR on field angle at low magnetic fields, which diminishes and then changes sign at large fields. Using a generalized site-bond percolation model, that takes into account both the orbital and Zeeman effects of the magnetic field, we show that the resulting MR curves reproduce the main experimental features. Such measurements, accompanied by the corresponding theory, may be crucial in pinpointing the correct theory of the superconductor-insulator transition and of the MR peak in thin disordered films.
△ Less
Submitted 9 October, 2014;
originally announced October 2014.
-
Dictionary Matching with One Gap
Authors:
Amihood Amir,
Avivit Levy,
Ely Porat,
B. Riva Shalom
Abstract:
The dictionary matching with gaps problem is to preprocess a dictionary $D$ of $d$ gapped patterns $P_1,\ldots,P_d$ over alphabet $Σ$, where each gapped pattern $P_i$ is a sequence of subpatterns separated by bounded sequences of don't cares. Then, given a query text $T$ of length $n$ over alphabet $Σ$, the goal is to output all locations in $T$ in which a pattern $P_i\in D$, $1\leq i\leq d$, ends…
▽ More
The dictionary matching with gaps problem is to preprocess a dictionary $D$ of $d$ gapped patterns $P_1,\ldots,P_d$ over alphabet $Σ$, where each gapped pattern $P_i$ is a sequence of subpatterns separated by bounded sequences of don't cares. Then, given a query text $T$ of length $n$ over alphabet $Σ$, the goal is to output all locations in $T$ in which a pattern $P_i\in D$, $1\leq i\leq d$, ends. There is a renewed current interest in the gapped matching problem stemming from cyber security. In this paper we solve the problem where all patterns in the dictionary have one gap with at least $α$ and at most $β$ don't cares, where $α$ and $β$ are given parameters. Specifically, we show that the dictionary matching with a single gap problem can be solved in either $O(d\log d + |D|)$ time and $O(d\log^{\varepsilon} d + |D|)$ space, and query time $O(n(β-α)\log\log d \log ^2 \min \{ d, \log |D| \} + occ)$, where $occ$ is the number of patterns found, or preprocessing time and space: $O(d^2 + |D|)$, and query time $O(n(β-α) + occ)$, where $occ$ is the number of patterns found. As far as we know, this is the best solution for this setting of the problem, where many overlaps may exist in the dictionary.
△ Less
Submitted 11 August, 2014;
originally announced August 2014.
-
The Family Holiday Gathering Problem or Fair and Periodic Scheduling of Independent Sets
Authors:
Amihood Amir,
Oren Kapah,
Tsvi Kopelowitz,
Moni Naor,
Ely Porat
Abstract:
We introduce and examine the {\em Holiday Gathering Problem} which models the difficulty that couples have when trying to decide with which parents should they spend the holiday. Our goal is to schedule the family gatherings so that the parents that will be {\em happy}, i.e.\ all their children will be home {\em simultaneously} for the holiday festivities, while minimizing the number of consecutiv…
▽ More
We introduce and examine the {\em Holiday Gathering Problem} which models the difficulty that couples have when trying to decide with which parents should they spend the holiday. Our goal is to schedule the family gatherings so that the parents that will be {\em happy}, i.e.\ all their children will be home {\em simultaneously} for the holiday festivities, while minimizing the number of consecutive holidays in which parents are not happy.
The holiday gathering problem is closely related to several classical problems in computer science, such as the {\em dining philosophers problem} on a general graph and periodic scheduling,and has applications in scheduling of transmissions made by cellular radios. We also show interesting connections between periodic scheduling, coloring, and universal prefix free encodings.
The combinatorial definition of the Holiday Gathering Problem is: given a graph $G$, find an infinite sequence of independent-sets of $G$. The objective function is to minimize, for every node $v$, the maximal gap between two appearances of $v$. In good solutions this gap depends on local properties of the node (i.e., its degree) and the the solution should be periodic, i.e.\ a node appears every fixed number of periods. We show a coloring-based construction where the period of each node colored with the $c$ is at most $2^{1+\log^*c}\cdot\prod_{i=0}^{\log^*c} \log^{(i)}c$ (where $\log^{(i)}$ means iterating the $\log$ function $i$ times). This is achieved via a connection with {\it prefix-free encodings}. We prove that this is the best possible for coloring-based solutions. We also show a construction with period at most $2d$ for a node of degree $d$.
△ Less
Submitted 10 August, 2014;
originally announced August 2014.
-
Higher Lower Bounds from the 3SUM Conjecture
Authors:
Tsvi Kopelowitz,
Seth Pettie,
Ely Porat
Abstract:
The 3SUM conjecture has proven to be a valuable tool for proving conditional lower bounds on dynamic data structures and graph problems. This line of work was initiated by Pǎtraşcu (STOC 2010) who reduced 3SUM to an offline SetDisjointness problem. However, the reduction introduced by Pǎtraşcu suffers from several inefficiencies, making it difficult to obtain tight conditional lower bounds from th…
▽ More
The 3SUM conjecture has proven to be a valuable tool for proving conditional lower bounds on dynamic data structures and graph problems. This line of work was initiated by Pǎtraşcu (STOC 2010) who reduced 3SUM to an offline SetDisjointness problem. However, the reduction introduced by Pǎtraşcu suffers from several inefficiencies, making it difficult to obtain tight conditional lower bounds from the 3SUM conjecture.
In this paper we address many of the deficiencies of Pǎtraşcu's framework. We give new and efficient reductions from 3SUM to offline SetDisjointness and offline SetIntersection (the reporting version of SetDisjointness) which leads to polynomially higher lower bounds on several problems. Using our reductions, we are able to show the essential optimality of several algorithms, assuming the 3SUM conjecture.
- Chiba and Nishizeki's $O(mα)$-time algorithm (SICOMP 1985) for enumerating all triangles in a graph with arboricity/degeneracy $α$ is essentially optimal, for any $α$.
- Bjørklund, Pagh, Williams, and Zwick's algorithm (ICALP 2014) for listing $t$ triangles is essentially optimal (assuming the matrix multiplication exponent is $ω=2$).
- Any static data structure for SetDisjointness that answers queries in constant time must spend $Ω(N^{2-o(1)})$ time in preprocessing, where $N$ is the size of the set system.
These statements were unattainable via Pǎtraşcu's reductions.
We also introduce several new reductions from 3SUM to pattern matching problems and dynamic graph problems. Of particular interest are new conditional lower bounds for dynamic versions of Maximum Cardinality Matching, which introduce a new technique for obtaining amortized lower bounds.
△ Less
Submitted 12 January, 2019; v1 submitted 24 July, 2014;
originally announced July 2014.
-
Dynamic Set Intersection
Authors:
Tsvi Kopelowitz,
Seth Pettie,
Ely Porat
Abstract:
Consider the problem of maintaining a family $F$ of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given $S,S'\in F$, report every member of $S\cap S'$ in any order. We show that in the word RAM model, where $w$ is the word size, given a cap $d$ on the maximum size of any set, we can support set intersection queries in $O(\frac{d}{w/\log^2 w})$ expected time…
▽ More
Consider the problem of maintaining a family $F$ of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given $S,S'\in F$, report every member of $S\cap S'$ in any order. We show that in the word RAM model, where $w$ is the word size, given a cap $d$ on the maximum size of any set, we can support set intersection queries in $O(\frac{d}{w/\log^2 w})$ expected time, and updates in $O(\log w)$ expected time. Using this algorithm we can list all $t$ triangles of a graph $G=(V,E)$ in $O(m+\frac{mα}{w/\log^2 w} +t)$ expected time, where $m=|E|$ and $α$ is the arboricity of $G$. This improves a 30-year old triangle enumeration algorithm of Chiba and Nishizeki running in $O(m α)$ time.
We provide an incremental data structure on $F$ that supports intersection {\em witness} queries, where we only need to find {\em one} $e\in S\cap S'$. Both queries and insertions take $O\paren{\sqrt \frac{N}{w/\log^2 w}}$ expected time, where $N=\sum_{S\in F} |S|$. Finally, we provide time/space tradeoffs for the fully dynamic set intersection reporting problem. Using $M$ words of space, each update costs $O(\sqrt {M \log N})$ expected time, each reporting query costs $O(\frac{N\sqrt{\log N}}{\sqrt M}\sqrt{op+1})$ expected time where $op$ is the size of the output, and each witness query costs $O(\frac{N\sqrt{\log N}}{\sqrt M} + \log N)$ expected time.
△ Less
Submitted 4 May, 2015; v1 submitted 24 July, 2014;
originally announced July 2014.
-
For-all Sparse Recovery in Near-Optimal Time
Authors:
Anna C. Gilbert,
Yi Li,
Ely Porat,
Martin J. Strauss
Abstract:
An approximate sparse recovery system in $\ell_1$ norm consists of parameters $k$, $ε$, $N$, an $m$-by-$N$ measurement $Φ$, and a recovery algorithm, $\mathcal{R}$. Given a vector, $\mathbf{x}$, the system approximates $x$ by $\widehat{\mathbf{x}} = \mathcal{R}(Φ\mathbf{x})$, which must satisfy $\|\widehat{\mathbf{x}}-\mathbf{x}\|_1 \leq (1+ε)\|\mathbf{x}-\mathbf{x}_k\|_1$. We consider the 'for al…
▽ More
An approximate sparse recovery system in $\ell_1$ norm consists of parameters $k$, $ε$, $N$, an $m$-by-$N$ measurement $Φ$, and a recovery algorithm, $\mathcal{R}$. Given a vector, $\mathbf{x}$, the system approximates $x$ by $\widehat{\mathbf{x}} = \mathcal{R}(Φ\mathbf{x})$, which must satisfy $\|\widehat{\mathbf{x}}-\mathbf{x}\|_1 \leq (1+ε)\|\mathbf{x}-\mathbf{x}_k\|_1$. We consider the 'for all' model, in which a single matrix $Φ$, possibly 'constructed' non-explicitly using the probabilistic method, is used for all signals $\mathbf{x}$. The best existing sublinear algorithm by Porat and Strauss (SODA'12) uses $O(ε^{-3} k\log(N/k))$ measurements and runs in time $O(k^{1-α}N^α)$ for any constant $α> 0$.
In this paper, we improve the number of measurements to $O(ε^{-2} k \log(N/k))$, matching the best existing upper bound (attained by super-linear algorithms), and the runtime to $O(k^{1+β}\textrm{poly}(\log N,1/ε))$, with a modest restriction that $ε\leq (\log k/\log N)^γ$, for any constants $β,γ> 0$. When $k\leq \log^c N$ for some $c>0$, the runtime is reduced to $O(k\textrm{poly}(N,1/ε))$. With no restrictions on $ε$, we have an approximation recovery system with $m = O(k/ε\log(N/k)((\log N/\log k)^γ+ 1/ε))$ measurements.
△ Less
Submitted 7 March, 2017; v1 submitted 7 February, 2014;
originally announced February 2014.
-
Sharing Rewards in Cooperative Connectivity Games
Authors:
Yoram Bachrach,
Ely Porat Porat,
Jeffrey S. Rosenschein
Abstract:
We consider how selfish agents are likely to share revenues derived from maintaining connectivity between important network servers. We model a network where a failure of one node may disrupt communication between other nodes as a cooperative game called the vertex Connectivity Game (CG). In this game, each agent owns a vertex, and controls all the edges going to and from that vertex. A coalition…
▽ More
We consider how selfish agents are likely to share revenues derived from maintaining connectivity between important network servers. We model a network where a failure of one node may disrupt communication between other nodes as a cooperative game called the vertex Connectivity Game (CG). In this game, each agent owns a vertex, and controls all the edges going to and from that vertex. A coalition of agents wins if it fully connects a certain subset of vertices in the graph, called the primary vertices. Power indices measure an agents ability to affect the outcome of the game. We show that in our domain, such indices can be used to both determine the fair share of the revenues an agent is entitled to, and identify significant possible points of failure affecting the reliability of communication in the network. We show that in general graphs, calculating the Shapley and Banzhaf power indices is #P-complete, but suggest a polynomial algorithm for calculating them in trees. We also investigate finding stable payoff divisions of the revenues in CGs, captured by the game theoretic solution of the core, and its relaxations, the epsilon-core and least core. We show a polynomial algorithm for computing the core of a CG, but show that testing whether an imputation is in the epsilon-core is coNP-complete. Finally, we show that for trees, it is possible to test for epsilon-core imputations in polynomial time.
△ Less
Submitted 3 February, 2014;
originally announced February 2014.
-
Orienting Fully Dynamic Graphs with Worst-Case Time Bounds
Authors:
Tsvi Kopelowitz,
Robert Krauthgamer,
Ely Porat,
Shay Solomon
Abstract:
In edge orientations, the goal is usually to orient (direct) the edges of an undirected $n$-vertex graph $G$ such that all out-degrees are bounded. When the graph $G$ is fully dynamic, i.e., admits edge insertions and deletions, we wish to maintain such an orientation while kee** a tab on the update time. Low out-degree orientations turned out to be a surprisingly useful tool, with several algor…
▽ More
In edge orientations, the goal is usually to orient (direct) the edges of an undirected $n$-vertex graph $G$ such that all out-degrees are bounded. When the graph $G$ is fully dynamic, i.e., admits edge insertions and deletions, we wish to maintain such an orientation while kee** a tab on the update time. Low out-degree orientations turned out to be a surprisingly useful tool, with several algorithmic applications involving static or dynamic graphs.
Brodal and Fagerberg (1999) initiated the study of the edge orientation problem in terms of the graph's arboricity, which is very natural in this context. They provided a solution with constant out-degree and \emph{amortized} logarithmic update time for all graphs with constant arboricity, which include all planar and excluded-minor graphs. However, it remained an open question (first proposed by Brodal and Fagerberg, later by others) to obtain similar bounds with worst-case update time.
We resolve this 15 year old question in the affirmative, by providing a simple algorithm with worst-case bounds that nearly match the previous amortized bounds. Our algorithm is based on a new approach of a combinatorial invariant, and achieves a logarithmic out-degree with logarithmic worst-case update times. This result has applications in various dynamic graph problems such as maintaining a maximal matching, where we obtain $O(\log n)$ worst-case update time compared to the $O(\frac{\log n}{\log\log n})$ amortized update time of Neiman and Solomon (2013).
△ Less
Submitted 4 December, 2013;
originally announced December 2013.
-
L2/L2-foreach sparse recovery with low risk
Authors:
Anna C. Gilbert,
Hung Q. Ngo,
Ely Porat,
Atri Rudra,
Martin J. Strauss
Abstract:
In this paper, we consider the "foreach" sparse recovery problem with failure probability $p$. The goal of which is to design a distribution over $m \times N$ matrices $Φ$ and a decoding algorithm $\algo$ such that for every $\vx\in\R^N$, we have the following error guarantee with probability at least $1-p$ \[\|\vx-\algo(Φ\vx)\|_2\le C\|\vx-\vx_k\|_2,\] where $C$ is a constant (ideally arbitrarily…
▽ More
In this paper, we consider the "foreach" sparse recovery problem with failure probability $p$. The goal of which is to design a distribution over $m \times N$ matrices $Φ$ and a decoding algorithm $\algo$ such that for every $\vx\in\R^N$, we have the following error guarantee with probability at least $1-p$ \[\|\vx-\algo(Φ\vx)\|_2\le C\|\vx-\vx_k\|_2,\] where $C$ is a constant (ideally arbitrarily close to 1) and $\vx_k$ is the best $k$-sparse approximation of $\vx$.
Much of the sparse recovery or compressive sensing literature has focused on the case of either $p = 0$ or $p = Ω(1)$. We initiate the study of this problem for the entire range of failure probability. Our two main results are as follows: \begin{enumerate} \item We prove a lower bound on $m$, the number measurements, of $Ω(k\log(n/k)+\log(1/p))$ for $2^{-Θ(N)}\le p <1$. Cohen, Dahmen, and DeVore \cite{CDD2007:NearOptimall2l2} prove that this bound is tight. \item We prove nearly matching upper bounds for \textit{sub-linear} time decoding. Previous such results addressed only $p = Ω(1)$. \end{enumerate}
Our results and techniques lead to the following corollaries: (i) the first ever sub-linear time decoding $\lolo$ "forall" sparse recovery system that requires a $\log^γ{N}$ extra factor (for some $γ<1$) over the optimal $O(k\log(N/k))$ number of measurements, and (ii) extensions of Gilbert et al. \cite{GHRSW12:SimpleSignals} results for information-theoretically bounded adversaries.
△ Less
Submitted 23 April, 2013;
originally announced April 2013.
-
Feasible Sampling of Non-strict Turnstile Data Streams
Authors:
Neta Barkay,
Ely Porat,
Bar Shalem
Abstract:
We present the first feasible method for sampling a dynamic data stream with deletions, where the sample consists of pairs $(k,C_k)$ of a value $k$ and its exact total count $C_k$. Our algorithms are for both Strict Turnstile data streams and the most general Non-strict Turnstile data streams, where each element may have a negative total count. Our method improves by an order of magnitude the know…
▽ More
We present the first feasible method for sampling a dynamic data stream with deletions, where the sample consists of pairs $(k,C_k)$ of a value $k$ and its exact total count $C_k$. Our algorithms are for both Strict Turnstile data streams and the most general Non-strict Turnstile data streams, where each element may have a negative total count. Our method improves by an order of magnitude the known processing time of each element in the stream, which is extremely crucial for data stream applications. For example, for a sample of size $O(ε^{-2} \log{(1/δ)})$ in Non-strict streams, our solution requires $O((\log\log(1/ε))^2 + (\log\log(1/δ)) ^ 2)$ operations per stream element, whereas the best previous solution requires $O(ε^{-2} \log^2(1/δ))$ evaluations of a fully independent hash function per element. Here $1-δ$ is the success probability and $ε$ is the additive approximation error.
We achieve this improvement by constructing a single data structure from which multiple elements can be extracted with very high success probability. The sample we generate is useful for calculating both forward and inverse distribution statistics, within an additive error, with provable guarantees on the success probability. Furthermore, our algorithms can run on distributed systems and extract statistics on the union or difference between data streams. They can be used to calculate the Jaccard similarity coefficient as well.
△ Less
Submitted 25 September, 2012;
originally announced September 2012.
-
Worst-case Optimal Join Algorithms
Authors:
Hung Q. Ngo,
Ely Porat,
Christopher Ré,
Atri Rudra
Abstract:
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a full conjunctiv…
▽ More
Efficient join processing is one of the most fundamental and well-studied tasks in database research. In this work, we examine algorithms for natural join queries over many relations and describe a novel algorithm to process these queries optimally in terms of worst-case data complexity. Our result builds on recent work by Atserias, Grohe, and Marx, who gave bounds on the size of a full conjunctive query in terms of the sizes of the individual relations in the body of the query. These bounds, however, are not constructive: they rely on Shearer's entropy inequality which is information-theoretic. Thus, the previous results leave open the question of whether there exist algorithms whose running time achieve these optimal bounds. An answer to this question may be interesting to database practice, as it is known that any algorithm based on the traditional select-project-join style plans typically employed in an RDBMS are asymptotically slower than the optimal for some queries. We construct an algorithm whose running time is worst-case optimal for all natural join queries. Our result may be of independent interest, as our algorithm also yields a constructive proof of the general fractional cover bound by Atserias, Grohe, and Marx without using Shearer's inequality. This bound implies two famous inequalities in geometry: the Loomis-Whitney inequality and the Bollobás-Thomason inequality. Hence, our results algorithmically prove these inequalities as well. Finally, we discuss how our algorithm can be used to compute a relaxed notion of joins.
△ Less
Submitted 8 March, 2012;
originally announced March 2012.
-
Pattern Matching in Multiple Streams
Authors:
Raphael Clifford,
Markus Jalsenius,
Ely Porat,
Benjamin Sach
Abstract:
We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still…
▽ More
We investigate the problem of deterministic pattern matching in multiple streams. In this model, one symbol arrives at a time and is associated with one of s streaming texts. The task at each time step is to report if there is a new match between a fixed pattern of length m and a newly updated stream. As is usual in the streaming context, the goal is to use as little space as possible while still reporting matches quickly. We give almost matching upper and lower space bounds for three distinct pattern matching problems. For exact matching we show that the problem can be solved in constant time per arriving symbol and O(m+s) words of space. For the k-mismatch and k-difference problems we give O(k) time solutions that require O(m+ks) words of space. In all three cases we also give space lower bounds which show our methods are optimal up to a single logarithmic factor. Finally we set out a number of open problems related to this new model for pattern matching.
△ Less
Submitted 25 April, 2012; v1 submitted 15 February, 2012;
originally announced February 2012.
-
Pattern Matching under Polynomial Transformation
Authors:
Ayelet Butman,
Peter Clifford,
Raphael Clifford,
Markus Jalsenius,
Noa Lewenstein,
Benny Porat,
Ely Porat,
Benjamin Sach
Abstract:
We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing where application specific transformations are often applied to the input. By considering the class of polynomial transformations of the input, we provide fast algo…
▽ More
We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing where application specific transformations are often applied to the input. By considering the class of polynomial transformations of the input, we provide fast algorithms and the first lower bounds for both new and old problems. Given a pattern of length m and a longer text of length n where both are assumed to contain integer values only, we first show O(n log m) time algorithms for pattern matching under linear transformations even when wildcard symbols can occur in the input. We then show how to extend the technique to polynomial transformations of arbitrary degree. Next we consider the problem of finding the minimum Hamming distance under polynomial transformation. We show that, for any epsilon>0, there cannot exist an O(n m^(1-epsilon)) time algorithm for additive and linear transformations conditional on the hardness of the classic 3SUM problem. Finally, we consider a version of the Hamming distance problem under additive transformations with a bound k on the maximum distance that need be reported. We give a deterministic O(nk log k) time solution which we then improve by careful use of randomisation to O(n sqrt(k log k) log n) time for sufficiently small k. Our randomised solution outputs the correct answer at every position with high probability.
△ Less
Submitted 28 October, 2011; v1 submitted 7 September, 2011;
originally announced September 2011.
-
Space Lower Bounds for Online Pattern Matching
Authors:
Raphael Clifford,
Markus Jalsenius,
Ely Porat,
Benjamin Sach
Abstract:
We present space lower bounds for online pattern matching under a number of different distance measures. Given a pattern of length m and a text that arrives one character at a time, the online pattern matching problem is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. We require that the correct answer is given at each position with…
▽ More
We present space lower bounds for online pattern matching under a number of different distance measures. Given a pattern of length m and a text that arrives one character at a time, the online pattern matching problem is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. We require that the correct answer is given at each position with constant probability. We give Omega(m) bit space lower bounds for L_1, L_2, L_\infty, Hamming, edit and swap distances as well as for any algorithm that computes the cross-correlation/convolution. We then show a dichotomy between distance functions that have wildcard-like properties and those that do not. In the former case which includes, as an example, pattern matching with character classes, we give Omega(m) bit space lower bounds. For other distance functions, we show that there exist space bounds of Omega(log m) and O(log^2 m) bits. Finally we discuss space lower bounds for non-binary inputs and show how in some cases they can be improved.
△ Less
Submitted 22 June, 2011;
originally announced June 2011.
-
A cuckoo hashing variant with improved memory utilization and insertion time
Authors:
Ely Porat,
Bar Shalem
Abstract:
Cuckoo hashing [4] is a multiple choice hashing scheme in which each item can be placed in multiple locations, and collisions are resolved by moving items to their alternative locations. In the classical implementation of two-way cuckoo hashing, the memory is partitioned into contiguous disjoint fixed-size buckets. Each item is hashed to two buckets, and may be stored in any of the positions withi…
▽ More
Cuckoo hashing [4] is a multiple choice hashing scheme in which each item can be placed in multiple locations, and collisions are resolved by moving items to their alternative locations. In the classical implementation of two-way cuckoo hashing, the memory is partitioned into contiguous disjoint fixed-size buckets. Each item is hashed to two buckets, and may be stored in any of the positions within those buckets. Ref. [2] analyzed a variation in which the buckets are contiguous and overlap. However, many systems retrieve data from secondary storage in same-size blocks called pages. Fetching a page is a relatively expensive process; but once a page is fetched, its contents can be accessed orders of magnitude faster. We utilize this property of memory retrieval, presenting a variant of cuckoo hashing incorporating the following constraint: each bucket must be fully contained in a single page, but buckets are not necessarily contiguous. Empirical results show that this modification increases memory utilization and decreases the number of iterations required to insert an item. If each item is hashed to two buckets of capacity two, the page size is 8, and each bucket is fully contained in a single page, the memory utilization equals 89.71% in the classical contiguous disjoint bucket variant, 93.78% in the contiguous overlap** bucket variant, and increases to 97.46% in our new non-contiguous bucket variant. When the memory utilization is 92% and we use breadth first search to look for a vacant position, the number of iterations required to insert a new item is dramatically reduced from 545 in the contiguous overlap** buckets variant to 52 in our new non-contiguous bucket variant. In addition to the empirical results, we present a theoretical lower bound on the memory utilization of our variation as a function of the page size.
△ Less
Submitted 14 November, 2011; v1 submitted 28 April, 2011;
originally announced April 2011.
-
Even Better Framework for min-wise Based Algorithms
Authors:
Guy Feigenblat,
Ely Porat,
Ariel Shiftan
Abstract:
In a recent paper from SODA11 \cite{kminwise} the authors introduced a general framework for exponential time improvement of \minwise based algorithms by defining and constructing almost \kmin independent family of hash functions. Here we take it a step forward and reduce the space and the independent needed for representing the functions, by defining and constructing a \dkmin independent family o…
▽ More
In a recent paper from SODA11 \cite{kminwise} the authors introduced a general framework for exponential time improvement of \minwise based algorithms by defining and constructing almost \kmin independent family of hash functions. Here we take it a step forward and reduce the space and the independent needed for representing the functions, by defining and constructing a \dkmin independent family of hash functions. Surprisingly, for most cases only 8-wise independent is needed for exponential time and space improvement. Moreover, we bypass the $O(\log{\frac{1}ε})$ independent lower bound for approximately \minwise functions \cite{patrascu10kwise-lb}, as we use alternative definition. In addition, as the independent's degree is a small constant it can be implemented efficiently.
Informally, under this definition, all subsets of size $d$ of any fixed set $X$ have an equal probability to have hash values among the minimal $k$ values in $X$, where the probability is over the random choice of hash function from the family. This property measures the randomness of the family, as choosing a truly random function, obviously, satisfies the definition for $d=k=|X|$. We define and give an efficient time and space construction of approximately \dkmin independent family of hash functions. The degree of independent required is optimal, i.e. only $O(d)$ for $2 \le d < k=O(\frac{d}{ε^2})$, where $ε\in (0,1)$ is the desired error bound. This construction can be used to improve many \minwise based algorithms, such as \cite{sizeEstimationFramework,Datar02estimatingrarity,NearDuplicate,SimilaritySearch,DBLP:conf/podc/CohenK07}, as will be discussed here. To our knowledge such definitions, for hash functions, were never studied and no construction was given before.
△ Less
Submitted 17 February, 2011;
originally announced February 2011.