-
The NFA Acceptance Hypothesis: Non-Combinatorial and Dynamic Lower Bounds
Authors:
Karl Bringmann,
Allan Grønlund,
Marvin Künnemann,
Kasper Green Larsen
Abstract:
We pose the fine-grained hardness hypothesis that the textbook algorithm for the NFA Acceptance problem is optimal up to subpolynomial factors, even for dense NFAs and fixed alphabets.
We show that this barrier appears in many variations throughout the algorithmic literature by introducing a framework of Colored Walk problems. These yield fine-grained equivalent formulations of the NFA Acceptanc…
▽ More
We pose the fine-grained hardness hypothesis that the textbook algorithm for the NFA Acceptance problem is optimal up to subpolynomial factors, even for dense NFAs and fixed alphabets.
We show that this barrier appears in many variations throughout the algorithmic literature by introducing a framework of Colored Walk problems. These yield fine-grained equivalent formulations of the NFA Acceptance problem as problems concerning detection of an $s$-$t$-walk with a prescribed color sequence in a given edge- or node-colored graph. For NFA Acceptance on sparse NFAs (or equivalently, Colored Walk in sparse graphs), a tight lower bound under the Strong Exponential Time Hypothesis has been rediscovered several times in recent years. We show that our hardness hypothesis, which concerns dense NFAs, has several interesting implications:
- It gives a tight lower bound for Context-Free Language Reachability. This proves conditional optimality for the class of 2NPDA-complete problems, explaining the cubic bottleneck of interprocedural program analysis.
- It gives a tight $(n+nm^{1/3})^{1-o(1)}$ lower bound for the Word Break problem on strings of length $n$ and dictionaries of total size $m$.
- It implies the popular OMv hypothesis. Since the NFA acceptance problem is a static (i.e., non-dynamic) problem, this provides a static reason for the hardness of many dynamic problems.
Thus, a proof of the NFA Acceptance hypothesis would resolve several interesting barriers. Conversely, a refutation of the NFA Acceptance hypothesis may lead the way to attacking the current barriers observed for Context-Free Language Reachability, the Word Break problem and the growing list of dynamic problems proven hard under the OMv hypothesis.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
Sublinear Time Shortest Path in Expander Graphs
Authors:
Noga Alon,
Allan Grønlund,
Søren Fuglede Jørgensen,
Kasper Green Larsen
Abstract:
Computing a shortest path between two nodes in an undirected unweighted graph is among the most basic algorithmic tasks. Breadth first search solves this problem in linear time, which is clearly also a lower bound in the worst case. However, several works have shown how to solve this problem in sublinear time in expectation when the input graph is drawn from one of several classes of random graphs…
▽ More
Computing a shortest path between two nodes in an undirected unweighted graph is among the most basic algorithmic tasks. Breadth first search solves this problem in linear time, which is clearly also a lower bound in the worst case. However, several works have shown how to solve this problem in sublinear time in expectation when the input graph is drawn from one of several classes of random graphs. In this work, we extend these results by giving sublinear time shortest path (and short path) algorithms for expander graphs. We thus identify a natural deterministic property of a graph (that is satisfied by typical random regular graphs) which suffices for sublinear time shortest paths. The algorithms are very simple, involving only bidirectional breadth first search and short random walks. We also complement our new algorithms by near-matching lower bounds.
△ Less
Submitted 31 July, 2023; v1 submitted 12 July, 2023;
originally announced July 2023.
-
Compression Implies Generalization
Authors:
Allan Grønlund,
Mikael Høgsgaard,
Lior Kamma,
Kasper Green Larsen
Abstract:
Explaining the surprising generalization performance of deep neural networks is an active and important line of research in theoretical machine learning. Influential work by Arora et al. (ICML'18) showed that, noise stability properties of deep nets occurring in practice can be used to provably compress model representations. They then argued that the small representations of compressed networks i…
▽ More
Explaining the surprising generalization performance of deep neural networks is an active and important line of research in theoretical machine learning. Influential work by Arora et al. (ICML'18) showed that, noise stability properties of deep nets occurring in practice can be used to provably compress model representations. They then argued that the small representations of compressed networks imply good generalization performance albeit only of the compressed nets. Extending their compression framework to yield generalization bounds for the original uncompressed networks remains elusive.
Our main contribution is the establishment of a compression-based framework for proving generalization bounds. The framework is simple and powerful enough to extend the generalization bounds by Arora et al. to also hold for the original network. To demonstrate the flexibility of the framework, we also show that it allows us to give simple proofs of the strongest known generalization bounds for other popular machine learning models, namely Support Vector Machines and Boosting.
△ Less
Submitted 1 July, 2021; v1 submitted 15 June, 2021;
originally announced June 2021.
-
Learning to Detect Fortified Areas
Authors:
Allan Grønlund,
Jonas Tranberg
Abstract:
High resolution data models like grid terrain models made from LiDAR data are a prerequisite for modern day Geographic Information Systems applications. Besides providing the foundation for the very accurate digital terrain models, LiDAR data is also extensively used to classify which parts of the considered surface comprise relevant elements like water, buildings and vegetation. In this paper we…
▽ More
High resolution data models like grid terrain models made from LiDAR data are a prerequisite for modern day Geographic Information Systems applications. Besides providing the foundation for the very accurate digital terrain models, LiDAR data is also extensively used to classify which parts of the considered surface comprise relevant elements like water, buildings and vegetation. In this paper we consider the problem of classifying which areas of a given surface are fortified by for instance, roads, sidewalks, parking spaces, paved driveways and terraces. We consider using LiDAR data and orthophotos, combined and alone, to show how well the modern machine learning algorithms Gradient Boosted Trees and Convolutional Neural Networks are able to detect fortified areas on large real world data. The LiDAR data features, in particular the intensity feature that measures the signal strength of the return, that we consider in this project are heavily dependent on the actual LiDAR sensor that made the measurement. This is highly problematic, in particular for the generalisation capability of pattern matching algorithms, as this means that data features for test data may be very different from the data the model is trained on. We propose an algorithmic solution to this problem by designing a neural net embedding architecture that transforms data from all the different sensor systems into a new common representation that works as well as if the training data and test data originated from the same sensor. The final algorithm result has an accuracy above 96 percent, and an AUC score above 0.99.
△ Less
Submitted 26 May, 2021;
originally announced May 2021.
-
Desperately seeking the impact of learning analytics in education at scale: Marrying data analysis with teaching and learning
Authors:
Olga Viberg,
Ake Gronlund
Abstract:
Learning analytics (LA) is argued to be able to improve learning outcomes, learner support and teaching. However, despite an increasingly expanding amount of student (digital) data accessible from various online education and learning platforms and the growing interest in LA worldwide as well as considerable research efforts already made, there is still little empirical evidence of impact on pract…
▽ More
Learning analytics (LA) is argued to be able to improve learning outcomes, learner support and teaching. However, despite an increasingly expanding amount of student (digital) data accessible from various online education and learning platforms and the growing interest in LA worldwide as well as considerable research efforts already made, there is still little empirical evidence of impact on practice that shows the effectiveness of LA in education settings. Based on a selection of theoretical and empirical research, this chapter provides a critical discussion about the possibilities of collecting and using student data as well as barriers and challenges to overcome in providing data-informed support to educators' everyday teaching practices. We argue that in order to increase the impact of data-driven decision-making aimed at students' improved learning in education at scale, we need to better understand educators' needs, their teaching practices and the context in which these practices occur, and how to support them in develo** relevant knowledge, strategies and skills to facilitate the data-informed process of digitalization of education.
△ Less
Submitted 14 May, 2021;
originally announced May 2021.
-
Margins are Insufficient for Explaining Gradient Boosting
Authors:
Allan Grønlund,
Lior Kamma,
Kasper Green Larsen
Abstract:
Boosting is one of the most successful ideas in machine learning, achieving great practical performance with little fine-tuning. The success of boosted classifiers is most often attributed to improvements in margins. The focus on margin explanations was pioneered in the seminal work by Schapire et al. (1998) and has culminated in the $k$'th margin generalization bound by Gao and Zhou (2013), which…
▽ More
Boosting is one of the most successful ideas in machine learning, achieving great practical performance with little fine-tuning. The success of boosted classifiers is most often attributed to improvements in margins. The focus on margin explanations was pioneered in the seminal work by Schapire et al. (1998) and has culminated in the $k$'th margin generalization bound by Gao and Zhou (2013), which was recently proved to be near-tight for some data distributions (Gronlund et al. 2019). In this work, we first demonstrate that the $k$'th margin bound is inadequate in explaining the performance of state-of-the-art gradient boosters. We then explain the short comings of the $k$'th margin bound and prove a stronger and more refined margin-based generalization bound for boosted classifiers that indeed succeeds in explaining the performance of modern gradient boosters. Finally, we improve upon the recent generalization lower bound by Grønlund et al. (2019).
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
Near-Tight Margin-Based Generalization Bounds for Support Vector Machines
Authors:
Allan Grønlund,
Lior Kamma,
Kasper Green Larsen
Abstract:
Support Vector Machines (SVMs) are among the most fundamental tools for binary classification. In its simplest formulation, an SVM produces a hyperplane separating two classes of data using the largest possible margin to the data. The focus on maximizing the margin has been well motivated through numerous generalization bounds. In this paper, we revisit and improve the classic generalization bound…
▽ More
Support Vector Machines (SVMs) are among the most fundamental tools for binary classification. In its simplest formulation, an SVM produces a hyperplane separating two classes of data using the largest possible margin to the data. The focus on maximizing the margin has been well motivated through numerous generalization bounds. In this paper, we revisit and improve the classic generalization bounds in terms of margins. Furthermore, we complement our new generalization bound by a nearly matching lower bound, thus almost settling the generalization performance of SVMs in terms of margins.
△ Less
Submitted 3 June, 2020;
originally announced June 2020.
-
Explaining the poor performance of the KASS algorithm implementation
Authors:
Allan Grønlund
Abstract:
By investigating the code for the KASS algorithm implementation used in the paper "Exploring the quantum speed limit with computer games" [1, arXiv:1506.09091] by Sørensen et al. (provided by the authors), we describe how the poor performance of the KASS algorithm reported in [1] is entirely caused by a simple sign error in a derivative calculation. Changing only this one sign in the KASS implemen…
▽ More
By investigating the code for the KASS algorithm implementation used in the paper "Exploring the quantum speed limit with computer games" [1, arXiv:1506.09091] by Sørensen et al. (provided by the authors), we describe how the poor performance of the KASS algorithm reported in [1] is entirely caused by a simple sign error in a derivative calculation. Changing only this one sign in the KASS implementation, we show that the algorithm provides results comparable to all other algorithms considered for the problem [2,7], and performs better than all player solutions of [1]. Furthermore, we show that the player solutions were optimized with a different algorithm before being compared to the results from the KASS algorithm. The authors of [1] have acknowledged both findings. Finally, we show that in contrast to the claims in [1], the players did not explore two different strategies. In fact, all the players followed the same strategy.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
Margin-Based Generalization Lower Bounds for Boosted Classifiers
Authors:
Allan Grønlund,
Lior Kamma,
Kasper Green Larsen,
Alexander Mathiasen,
Jelani Nelson
Abstract:
Boosting is one of the most successful ideas in machine learning. The most well-accepted explanations for the low generalization error of boosting algorithms such as AdaBoost stem from margin theory. The study of margins in the context of boosting algorithms was initiated by Schapire, Freund, Bartlett and Lee (1998) and has inspired numerous boosting algorithms and generalization bounds. To date,…
▽ More
Boosting is one of the most successful ideas in machine learning. The most well-accepted explanations for the low generalization error of boosting algorithms such as AdaBoost stem from margin theory. The study of margins in the context of boosting algorithms was initiated by Schapire, Freund, Bartlett and Lee (1998) and has inspired numerous boosting algorithms and generalization bounds. To date, the strongest known generalization (upper bound) is the $k$th margin bound of Gao and Zhou (2013). Despite the numerous generalization upper bounds that have been proved over the last two decades, nothing is known about the tightness of these bounds. In this paper, we give the first margin-based lower bounds on the generalization error of boosted classifiers. Our lower bounds nearly match the $k$th margin bound and thus almost settle the generalization performance of boosted classifiers in terms of margins.
△ Less
Submitted 7 May, 2020; v1 submitted 27 September, 2019;
originally announced September 2019.
-
Learning to Find Hydrological Corrections
Authors:
Lars Arge,
Allan Grønlund,
Svend Christian Svendsen,
Jonas Tranberg
Abstract:
High resolution Digital Elevation models, such as the (Big) grid terrain model of Denmark with more than 200 billion measurements, is a basic requirement for water flow modelling and flood risk analysis. However, a large number of modifications often need to be made to even very accurate terrain models, such as the Danish model, before they can be used in realistic flow modeling. These modificatio…
▽ More
High resolution Digital Elevation models, such as the (Big) grid terrain model of Denmark with more than 200 billion measurements, is a basic requirement for water flow modelling and flood risk analysis. However, a large number of modifications often need to be made to even very accurate terrain models, such as the Danish model, before they can be used in realistic flow modeling. These modifications include removal of bridges, which otherwise will act as dams in flow modeling, and inclusion of culverts that transport water underneath roads. In fact, the danish model is accompanied by a detailed set of hydrological corrections for the digital elevation model. However, producing these hydrological corrections is a very slow an expensive process, since it is to a large extent done manually and often with local input. This also means that corrections can be of varying quality. In this paper we propose a new algorithmic apporach based on machine learning and convolutional neural networks for automatically detecting hydrological corrections for such large terrain data. Our model is able to detect most hydrological corrections known for the danish model and quite a few more that should have been included in the original list.
△ Less
Submitted 17 September, 2019;
originally announced September 2019.
-
Algorithms Clearly Beat Gamers at Quantum Moves. A Verification
Authors:
Allan Grønlund
Abstract:
The paper [Sørensen et al., Nature 532] considers how human players compare to algorithms for solving the Quantum Moves game BringHomeWater and design new algorithms based on the intuition extracted from players. The claim by [Sørensen et al., Nature 532] is that players outperform widely used algorithms, in particular the KASS algorithm, based on the Krotov algorithm, and that player intuition is…
▽ More
The paper [Sørensen et al., Nature 532] considers how human players compare to algorithms for solving the Quantum Moves game BringHomeWater and design new algorithms based on the intuition extracted from players. The claim by [Sørensen et al., Nature 532] is that players outperform widely used algorithms, in particular the KASS algorithm, based on the Krotov algorithm, and that player intuition is crucial to develop improved methods. However, as initially discussed by D. Sels [D. Sels, Phys. Rev. A 97], a standard Coordinate Ascent algorithm outperforms all players by a large margin. Albeit D. Sels only compare to player solutions, the simple algorithm outperforms all algorithms based on player solutions and Krotov, and it does so using much less time and iterations. In this paper we elaborate on the methods discussed by D. Sels and verify that the presented algorithm, solves the problem better than all players and algorithms derived from player solutions in [Sørensen et al., Nature 532]. We also verify the theoretical analysis presented by D. Sels, that gives a theoretically derived protocol that outperforms all players. We add a comparison with gradient ascent or GRAPE. Starting from uniform random values, GRAPE outperforms all players by a large margin. GRAPE works at least as well as the methods from [Sørensen et al., Nature 532] initialized with player solutions. A standard analysis of the results from GRAPE provides a starting point for GRAPE, that outperform all algorithms from [Sørensen et al., Nature 532]. We compare with a basic Krotov algorithm, and get results similar to GRAPE, clearly outperforming players and the KASS algorithm. These experiments verify and underline the result in [D. Sels, Phys. Rev. A 97] that the conclusions from [Sørensen et al., Nature 532] regarding algorithms are untenable. In fact the opposite conclusions are true.
△ Less
Submitted 2 July, 2019; v1 submitted 1 April, 2019;
originally announced April 2019.
-
Optimal Minimal Margin Maximization with Boosting
Authors:
Allan Grønlund,
Kasper Green Larsen,
Alexander Mathiasen
Abstract:
Boosting algorithms produce a classifier by iteratively combining base hypotheses. It has been observed experimentally that the generalization error keeps improving even after achieving zero training error. One popular explanation attributes this to improvements in margins. A common goal in a long line of research, is to maximize the smallest margin using as few base hypotheses as possible, culmin…
▽ More
Boosting algorithms produce a classifier by iteratively combining base hypotheses. It has been observed experimentally that the generalization error keeps improving even after achieving zero training error. One popular explanation attributes this to improvements in margins. A common goal in a long line of research, is to maximize the smallest margin using as few base hypotheses as possible, culminating with the AdaBoostV algorithm by (R{ä}tsch and Warmuth [JMLR'04]). The AdaBoostV algorithm was later conjectured to yield an optimal trade-off between number of hypotheses trained and the minimal margin over all training points (Nie et al. [JMLR'13]). Our main contribution is a new algorithm refuting this conjecture. Furthermore, we prove a lower bound which implies that our new algorithm is optimal.
△ Less
Submitted 30 January, 2019;
originally announced January 2019.
-
Upper and lower bounds for dynamic data structures on strings
Authors:
Raphael Clifford,
Allan Grønlund,
Kasper Green Larsen,
Tatiana Starikovskaya
Abstract:
We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length $m$ and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of…
▽ More
We consider a range of simply stated dynamic data structure problems on strings. An update changes one symbol in the input and a query asks us to compute some function of the pattern of length $m$ and a substring of a longer text. We give both conditional and unconditional lower bounds for variants of exact matching with wildcards, inner product, and Hamming distance computation via a sequence of reductions. As an example, we show that there does not exist an $O(m^{1/2-\varepsilon})$ time algorithm for a large range of these problems unless the online Boolean matrix-vector multiplication conjecture is false. We also provide nearly matching upper bounds for most of the problems we consider.
△ Less
Submitted 19 February, 2018;
originally announced February 2018.
-
Fast Exact k-Means, k-Medians and Bregman Divergence Clustering in 1D
Authors:
Allan Grønlund,
Kasper Green Larsen,
Alexander Mathiasen,
Jesper Sindahl Nielsen,
Stefan Schneider,
Mingzhou Song
Abstract:
The $k$-Means clustering problem on $n$ points is NP-Hard for any dimension $d\ge 2$, however, for the 1D case there exists exact polynomial time algorithms. Previous literature reported an $O(kn^2)$ time dynamic programming algorithm that uses $O(kn)$ space. It turns out that the problem has been considered under a different name more than twenty years ago. We present all the existing work that h…
▽ More
The $k$-Means clustering problem on $n$ points is NP-Hard for any dimension $d\ge 2$, however, for the 1D case there exists exact polynomial time algorithms. Previous literature reported an $O(kn^2)$ time dynamic programming algorithm that uses $O(kn)$ space. It turns out that the problem has been considered under a different name more than twenty years ago. We present all the existing work that had been overlooked and compare the various solutions theoretically. Moreover, we show how to reduce the space usage for some of them, as well as generalize them to data structures that can quickly report an optimal $k$-Means clustering for any $k$. Finally we also generalize all the algorithms to work for the absolute distance and to work for any Bregman Divergence. We complement our theoretical contributions by experiments that compare the practical performance of the various algorithms.
△ Less
Submitted 25 April, 2018; v1 submitted 25 January, 2017;
originally announced January 2017.
-
A Dichotomy for Regular Expression Membership Testing
Authors:
Karl Bringmann,
Allan Grønlund,
Kasper Green Larsen
Abstract:
We study regular expression membership testing: Given a regular expression of size $m$ and a string of size $n$, decide whether the string is in the language described by the regular expression. Its classic $O(nm)$ algorithm is one of the big success stories of the 70s, which allowed pattern matching to develop into the standard tool that it is today.
Many special cases of pattern matching have…
▽ More
We study regular expression membership testing: Given a regular expression of size $m$ and a string of size $n$, decide whether the string is in the language described by the regular expression. Its classic $O(nm)$ algorithm is one of the big success stories of the 70s, which allowed pattern matching to develop into the standard tool that it is today.
Many special cases of pattern matching have been studied that can be solved faster than in quadratic time. However, a systematic study of tractable cases was made possible only recently, with the first conditional lower bounds reported by Backurs and Indyk [FOCS'16]. Restricted to any "type" of homogeneous regular expressions of depth 2 or 3, they either presented a near-linear time algorithm or a quadratic conditional lower bound, with one exception known as the Word Break problem.
In this paper we complete their work as follows:
1) We present two almost-linear time algorithms that generalize all known almost-linear time algorithms for special cases of regular expression membership testing.
2) We classify all types, except for the Word Break problem, into almost-linear time or quadratic time assuming the Strong Exponential Time Hypothesis. This extends the classification from depth 2 and 3 to any constant depth.
3) For the Word Break problem we give an improved $\tilde{O}(n m^{1/3} + m)$ algorithm. Surprisingly, we also prove a matching conditional lower bound for combinatorial algorithms. This establishes Word Break as the only intermediate problem.
In total, we prove matching upper and lower bounds for any type of bounded-depth homogeneous regular expressions, which yields a full dichotomy for regular expression membership testing.
△ Less
Submitted 7 November, 2016; v1 submitted 3 November, 2016;
originally announced November 2016.
-
New Unconditional Hardness Results for Dynamic and Online Problems
Authors:
Raphael Clifford,
Allan Grønlund,
Kasper Green Larsen
Abstract:
There has been a resurgence of interest in lower bounds whose truth rests on the conjectured hardness of well known computational problems. These conditional lower bounds have become important and popular due to the painfully slow progress on proving strong unconditional lower bounds. Nevertheless, the long term goal is to replace these conditional bounds with unconditional ones. In this paper we…
▽ More
There has been a resurgence of interest in lower bounds whose truth rests on the conjectured hardness of well known computational problems. These conditional lower bounds have become important and popular due to the painfully slow progress on proving strong unconditional lower bounds. Nevertheless, the long term goal is to replace these conditional bounds with unconditional ones. In this paper we make progress in this direction by studying the cell probe complexity of two conjectured to be hard problems of particular importance: matrix-vector multiplication and a version of dynamic set disjointness known as Patrascu's Multiphase Problem. We give improved unconditional lower bounds for these problems as well as introducing new proof techniques of independent interest. These include a technique capable of proving strong threshold lower bounds of the following form: If we insist on having a very fast query time, then the update time has to be slow enough to compute a lookup table with the answer to every possible query. This is the first time a lower bound of this type has been proven.
△ Less
Submitted 8 April, 2015;
originally announced April 2015.
-
Towards Tight Lower Bounds for Range Reporting on the RAM
Authors:
Allan Grønlund,
Kasper Green Larsen
Abstract:
In the orthogonal range reporting problem, we are to preprocess a set of $n$ points with integer coordinates on a $U \times U$ grid. The goal is to support reporting all $k$ points inside an axis-aligned query rectangle. This is one of the most fundamental data structure problems in databases and computational geometry. Despite the importance of the problem its complexity remains unresolved in the…
▽ More
In the orthogonal range reporting problem, we are to preprocess a set of $n$ points with integer coordinates on a $U \times U$ grid. The goal is to support reporting all $k$ points inside an axis-aligned query rectangle. This is one of the most fundamental data structure problems in databases and computational geometry. Despite the importance of the problem its complexity remains unresolved in the word-RAM. On the upper bound side, three best tradeoffs exists: (1.) Query time $O(\lg \lg n + k)$ with $O(nlg^{\varepsilon}n)$ words of space for any constant $\varepsilon>0$. (2.) Query time $O((1 + k) \lg \lg n)$ with $O(n \lg \lg n)$ words of space. (3.) Query time $O((1+k)\lg^{\varepsilon} n)$ with optimal $O(n)$ words of space. However, the only known query time lower bound is $Ω(\log \log n +k)$, even for linear space data structures.
All three current best upper bound tradeoffs are derived by reducing range reporting to a ball-inheritance problem. Ball-inheritance is a problem that essentially encapsulates all previous attempts at solving range reporting in the word-RAM. In this paper we make progress towards closing the gap between the upper and lower bounds for range reporting by proving cell probe lower bounds for ball-inheritance. Our lower bounds are tight for a large range of parameters, excluding any further progress for range reporting using the ball-inheritance reduction.
△ Less
Submitted 3 November, 2014;
originally announced November 2014.
-
Approximate Range Emptiness in Constant Time and Optimal Space
Authors:
Mayank Goswami,
Allan Grønlund,
Kasper Green Larsen,
Rasmus Pagh
Abstract:
This paper studies the \emph{$\varepsilon$-approximate range emptiness} problem, where the task is to represent a set $S$ of $n$ points from $\{0,\ldots,U-1\}$ and answer emptiness queries of the form "$[a ; b]\cap S \neq \emptyset$ ?" with a probability of \emph{false positives} allowed. This generalizes the functionality of \emph{Bloom filters} from single point queries to any interval length…
▽ More
This paper studies the \emph{$\varepsilon$-approximate range emptiness} problem, where the task is to represent a set $S$ of $n$ points from $\{0,\ldots,U-1\}$ and answer emptiness queries of the form "$[a ; b]\cap S \neq \emptyset$ ?" with a probability of \emph{false positives} allowed. This generalizes the functionality of \emph{Bloom filters} from single point queries to any interval length $L$. Setting the false positive rate to $\varepsilon/L$ and performing $L$ queries, Bloom filters yield a solution to this problem with space $O(n \lg(L/\varepsilon))$ bits, false positive probability bounded by $\varepsilon$ for intervals of length up to $L$, using query time $O(L \lg(L/\varepsilon))$. Our first contribution is to show that the space/error trade-off cannot be improved asymptotically: Any data structure for answering approximate range emptiness queries on intervals of length up to $L$ with false positive probability $\varepsilon$, must use space $Ω(n \lg(L/\varepsilon)) - O(n)$ bits. On the positive side we show that the query time can be improved greatly, to constant time, while matching our space lower bound up to a lower order additive term. This result is achieved through a succinct data structure for (non-approximate 1d) range emptiness/reporting queries, which may be of independent interest.
△ Less
Submitted 10 July, 2014;
originally announced July 2014.
-
Threesomes, Degenerates, and Love Triangles
Authors:
Allan Grønlund,
Seth Pettie
Abstract:
The 3SUM problem is to decide, given a set of $n$ real numbers, whether any three sum to zero. It is widely conjectured that a trivial $O(n^2)$-time algorithm is optimal and over the years the consequences of this conjecture have been revealed. This 3SUM conjecture implies $Ω(n^2)$ lower bounds on numerous problems in computational geometry and a variant of the conjecture implies strong lower boun…
▽ More
The 3SUM problem is to decide, given a set of $n$ real numbers, whether any three sum to zero. It is widely conjectured that a trivial $O(n^2)$-time algorithm is optimal and over the years the consequences of this conjecture have been revealed. This 3SUM conjecture implies $Ω(n^2)$ lower bounds on numerous problems in computational geometry and a variant of the conjecture implies strong lower bounds on triangle enumeration, dynamic graph algorithms, and string matching data structures.
In this paper we refute the 3SUM conjecture. We prove that the decision tree complexity of 3SUM is $O(n^{3/2}\sqrt{\log n})$ and give two subquadratic 3SUM algorithms, a deterministic one running in $O(n^2 / (\log n/\log\log n)^{2/3})$ time and a randomized one running in $O(n^2 (\log\log n)^2 / \log n)$ time with high probability. Our results lead directly to improved bounds for $k$-variate linear degeneracy testing for all odd $k\ge 3$. The problem is to decide, given a linear function $f(x_1,\ldots,x_k) = α_0 + \sum_{1\le i\le k} α_i x_i$ and a set $A \subset \mathbb{R}$, whether $0\in f(A^k)$. We show the decision tree complexity of this problem is $O(n^{k/2}\sqrt{\log n})$.
Finally, we give a subcubic algorithm for a generalization of the $(\min,+)$-product over real-valued matrices and apply it to the problem of finding zero-weight triangles in weighted graphs. We give a depth-$O(n^{5/2}\sqrt{\log n})$ decision tree for this problem, as well as an algorithm running in time $O(n^3 (\log\log n)^2/\log n)$.
△ Less
Submitted 30 May, 2014; v1 submitted 3 April, 2014;
originally announced April 2014.
-
Fractal Profit Landscape of the Stock Market
Authors:
Andreas Gronlund,
Il Gu Yi,
Beom Jun Kim
Abstract:
We investigate the structure of the profit landscape obtained from the most basic, fluctuation based, trading strategy applied for the daily stock price data. The strategy is parameterized by only two variables, p and q. Stocks are sold and bought if the log return is bigger than p and less than -q, respectively. Repetition of this simple strategy for a long time gives the profit defined in the un…
▽ More
We investigate the structure of the profit landscape obtained from the most basic, fluctuation based, trading strategy applied for the daily stock price data. The strategy is parameterized by only two variables, p and q. Stocks are sold and bought if the log return is bigger than p and less than -q, respectively. Repetition of this simple strategy for a long time gives the profit defined in the underlying two-dimensional parameter space of p and q. It is revealed that the local maxima in the profit landscape are spread in the form of a fractal structure. The fractal structure implies that successful strategies are not localized to any region of the profit landscape and are neither spaced evenly throughout the profit landscape, which makes the optimization notoriously hard and hypersensitive for partial or limited information. The concrete implication of this property is demonstrated by showing that optimization of one stock for future values or other stocks renders worse profit than a strategy that ignores fluctuations, i.e., a long-term buy-and-hold strategy.
△ Less
Submitted 2 May, 2012;
originally announced May 2012.
-
Evolution of Rogue Waves in Interacting Wave Systems
Authors:
A. Grönlund,
B. Eliasson,
M. Marklund
Abstract:
Large amplitude water waves on deep water has long been known in the sea faring community, and the cause of great concern for, e.g., oil platform constructions. The concept of such freak waves is nowadays, thanks to satellite and radar measurements, well established within the scientific community. There are a number of important models and approaches for the theoretical description of such wave…
▽ More
Large amplitude water waves on deep water has long been known in the sea faring community, and the cause of great concern for, e.g., oil platform constructions. The concept of such freak waves is nowadays, thanks to satellite and radar measurements, well established within the scientific community. There are a number of important models and approaches for the theoretical description of such waves. By analyzing the scaling behavior of freak wave formation in a model of two interacting waves, described by two coupled nonlinear Schroedinger equations, we show that there are two different dynamical scaling behaviors above and below a critical angle theta_c of the direction of the interacting waves below theta_c all wave systems evolve and display statistics similar to a wave system of non-interacting waves. The results equally apply to other systems described by the nonlinear Schroedinger equations, and should be of interest when designing optical wave guides.
△ Less
Submitted 3 April, 2009;
originally announced April 2009.
-
Dynamic scaling regimes of collective decision making
Authors:
Andreas Gronlund,
Petter Holme,
Petter Minnhagen
Abstract:
We investigate a social system of agents faced with a binary choice. We assume there is a correct, or beneficial, outcome of this choice. Furthermore, we assume agents are influenced by others in making their decision, and that the agents can obtain information that may guide them towards making a correct decision. The dynamic model we propose is of nonequilibrium type, converging to a final dec…
▽ More
We investigate a social system of agents faced with a binary choice. We assume there is a correct, or beneficial, outcome of this choice. Furthermore, we assume agents are influenced by others in making their decision, and that the agents can obtain information that may guide them towards making a correct decision. The dynamic model we propose is of nonequilibrium type, converging to a final decision. We run it on random graphs and scale-free networks. On random graphs, we find two distinct regions in terms of the "finalizing time" -- the time until all agents have finalized their decisions. On scale-free networks on the other hand, there does not seem to be any such distinct scaling regions.
△ Less
Submitted 24 August, 2007; v1 submitted 25 May, 2006;
originally announced May 2006.
-
A network-based threshold model for the spreading of fads in society and markets
Authors:
Andreas Gronlund,
Petter Holme
Abstract:
We investigate the behavior of a threshold model for the spreading of fads and similar phenomena in society. The model is giving the fad dynamics and is intended to be confined to an underlying network structure. We investigate the whole parameter space of the fad dynamics on three types of network models. The dynamics we discover is rich and highly dependent on the underlying network structure.…
▽ More
We investigate the behavior of a threshold model for the spreading of fads and similar phenomena in society. The model is giving the fad dynamics and is intended to be confined to an underlying network structure. We investigate the whole parameter space of the fad dynamics on three types of network models. The dynamics we discover is rich and highly dependent on the underlying network structure. For some range of the parameter space, for all types of substrate networks, there are a great variety of sizes and life-lengths of the fads -- what one see in real-world social and economical systems.
△ Less
Submitted 6 May, 2005;
originally announced May 2005.
-
Searchability of Networks
Authors:
M. Rosvall,
A. Gronlund,
P. Minnhagen,
K. Sneppen
Abstract:
We investigate the searchability of complex systems in terms of their interconnectedness. Associating searchability with the number and size of branch points along the paths between the nodes, we find that scale-free networks are relatively difficult to search, and thus that the abundance of scale-free networks in nature and society may reflect an attempt to protect local areas in a highly inter…
▽ More
We investigate the searchability of complex systems in terms of their interconnectedness. Associating searchability with the number and size of branch points along the paths between the nodes, we find that scale-free networks are relatively difficult to search, and thus that the abundance of scale-free networks in nature and society may reflect an attempt to protect local areas in a highly interconnected network from nonrelated communication. In fact, starting from a random node, real-world networks with higher order organization like modular or hierarchical structure are even more difficult to navigate than random scale-free networks. The searchability at the node level opens the possibility for a generalized hierarchy measure that captures both the hierarchy in the usual terms of trees as in military structures, and the intrinsic hierarchical nature of topological hierarchies for scale-free networks as in the Internet.
△ Less
Submitted 7 December, 2005; v1 submitted 17 May, 2005;
originally announced May 2005.
-
Modelling the dynamics of youth subcultures
Authors:
Petter Holme,
Andreas Gronlund
Abstract:
What are the dynamics behind youth subcultures such as punk, hippie, or hip-hop cultures? How does the global dynamics of these subcultures relate to the individual's search for a personal identity? We propose a simple dynamical model to address these questions and find that only a few assumptions of the individual's behaviour are necessary to regenerate known features of youth culture.
What are the dynamics behind youth subcultures such as punk, hippie, or hip-hop cultures? How does the global dynamics of these subcultures relate to the individual's search for a personal identity? We propose a simple dynamical model to address these questions and find that only a few assumptions of the individual's behaviour are necessary to regenerate known features of youth culture.
△ Less
Submitted 25 April, 2005;
originally announced April 2005.
-
Networking genetic regulation and neural computation: Directed network topology and its effect on the dynamics
Authors:
Andreas Grönlund
Abstract:
Two different types of directed networks are investigated, transcriptional regulation networks and neural networks. The directed network structure are studied and also shown to reflect the different processes taking place on the networks. The distribution of influence, identified as the the number of downstream vertices, are used as a tool for investigating random vertex removal. In the transcri…
▽ More
Two different types of directed networks are investigated, transcriptional regulation networks and neural networks. The directed network structure are studied and also shown to reflect the different processes taking place on the networks. The distribution of influence, identified as the the number of downstream vertices, are used as a tool for investigating random vertex removal. In the transcriptional regulation networks we observe that only a small number of vertices have a large influence. The small influences of most vertices limit the effect of a random removal to in most cases only a small fraction of vertices in the network. The neural network has a rather different topology with respect to the influence, which are large for most vertices. To further investigate the effect of vertex removal we simulate the biological processes taking place on the networks. Opposed to the presumpted large effect of random vertex removal in the neural network, the high density of edges in conjunction with the dynamics used makes the change in the state of the system to be highly localized around the removed vertex.
△ Less
Submitted 5 October, 2004; v1 submitted 11 June, 2004;
originally announced June 2004.
-
Correlations in Networks associated to Preferential Growth
Authors:
Andreas Gronlund,
Kim Sneppen,
Petter Minnhagen
Abstract:
Combinations of random and preferential growth for both on-growing and stationary networks are studied and a hierarchical topology is observed. Thus for real world scale-free networks which do not exhibit hierarchical features preferential growth is probably not the main ingredient in the growth process. An example of such real world networks includes the protein-protein interaction network in y…
▽ More
Combinations of random and preferential growth for both on-growing and stationary networks are studied and a hierarchical topology is observed. Thus for real world scale-free networks which do not exhibit hierarchical features preferential growth is probably not the main ingredient in the growth process. An example of such real world networks includes the protein-protein interaction network in yeast, which exhibits pronounced anti-hierarchical features.
△ Less
Submitted 2 December, 2004; v1 submitted 27 January, 2004;
originally announced January 2004.
-
Scaling determination of the nonlinear I-V characteristics for 2D superconducting networks
Authors:
Petter Minnhagen,
Beom Jun Kim,
A. Gronlund
Abstract:
It is shown from computer simulations that the current-voltage ($I$-$V$) characteristics for the two-dimensional XY model with resistively-shunted Josephson junction dynamics and Monte Carlo dynamics obeys a finite-size scaling form from which the nonlinear $I$-$V$ exponent $a$ can be determined to good precision. This determination supports the conclusion $a=z+1$, where $z$ is the dynamic criti…
▽ More
It is shown from computer simulations that the current-voltage ($I$-$V$) characteristics for the two-dimensional XY model with resistively-shunted Josephson junction dynamics and Monte Carlo dynamics obeys a finite-size scaling form from which the nonlinear $I$-$V$ exponent $a$ can be determined to good precision. This determination supports the conclusion $a=z+1$, where $z$ is the dynamic critical exponent. The results are discussed in the light of the contrary conclusion reached by Tang and Chen [Phys. Rev. B {\bf 67}, 024508 (2003)] and the possibility of a breakdown of scaling suggested by Bormann [Phys. Rev. Lett. {\bf 78}, 4324 (1997)].
△ Less
Submitted 18 December, 2003;
originally announced December 2003.
-
The networked seceder model: Group formation in social and economic systems
Authors:
Andreas Gronlund,
Petter Holme
Abstract:
The seceder model illustrates how the desire to be different than the average can lead to formation of groups in a population. We turn the original, agent based, seceder model into a model of network evolution. We find that the structural characteristics our model closely matches empirical social networks. Statistics for the dynamics of group formation are also given. Extensions of the model to…
▽ More
The seceder model illustrates how the desire to be different than the average can lead to formation of groups in a population. We turn the original, agent based, seceder model into a model of network evolution. We find that the structural characteristics our model closely matches empirical social networks. Statistics for the dynamics of group formation are also given. Extensions of the model to networks of companies are also discussed.
△ Less
Submitted 1 December, 2003;
originally announced December 2003.