Search | arXiv e-print repository

Faster and Deterministic Subtrajectory Clustering

Authors: Ivor van der Hoog, Thijs van der Horst, Tim Ophelders

Abstract: Given a trajectory $T$ and a distance $Δ$, we wish to find a set $C$ of curves of complexity at most $\ell$, such that we can cover $T$ with subcurves that each are within Fréchet distance $Δ$ to at least one curve in $C$. We call $C$ an $(\ell,Δ)$-clustering and aim to find an $(\ell,Δ)$-clustering of minimum cardinality. This problem was introduced by Akitaya $et$ $al.$ (2021) and shown to be NP… ▽ More Given a trajectory $T$ and a distance $Δ$, we wish to find a set $C$ of curves of complexity at most $\ell$, such that we can cover $T$ with subcurves that each are within Fréchet distance $Δ$ to at least one curve in $C$. We call $C$ an $(\ell,Δ)$-clustering and aim to find an $(\ell,Δ)$-clustering of minimum cardinality. This problem was introduced by Akitaya $et$ $al.$ (2021) and shown to be NP-complete. The main focus has therefore been on bicriterial approximation algorithms, allowing for the clustering to be an $(\ell, Θ(Δ))$-clustering of roughly optimal size. We present algorithms that construct $(\ell,4Δ)$-clusterings of $\mathcal{O}(k \log n)$ size, where $k$ is the size of the optimal $(\ell, Δ)$-clustering. For the discrete Fréchet distance, we use $\mathcal{O}(n \ell \log n)$ space and $\mathcal{O}(k n^2 \log^3 n)$ deterministic worst case time. For the continuous Fréchet distance, we use $\mathcal{O}(n^2 \log n)$ space and $\mathcal{O}(k n^3 \log^3 n)$ time. Our algorithms significantly improve upon the clustering quality (improving the approximation factor in $Δ$) and size (whenever $\ell \in Ω(\log n)$). We offer deterministic running times comparable to known expected bounds. Additionally, in the continuous setting, we give a near-linear improvement upon the space usage. When compared only to deterministic results, we offer a near-linear speedup and a near-quadratic improvement in the space usage. When we may restrict ourselves to only considering clusters where all subtrajectories are vertex-to-vertex subcurves, we obtain even better results under the continuous Fréchet distance. Our algorithm becomes near quadratic and uses space that is near linear in $n \ell$. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 27 pages, 8 figures

ACM Class: F.2.2

arXiv:2401.14815 [pdf, other]

Faster Fréchet Distance Approximation through Truncated Smoothing

Authors: Thijs van der Horst, Tim Ophelders

Abstract: The Fréchet distance is a popular distance measure for curves. Computing the Fréchet distance between two polygonal curves of $n$ vertices takes roughly quadratic time, and conditional lower bounds suggest that even approximating to within a factor $3$ cannot be done in strongly-subquadratic time, even in one dimension. The current best approximation algorithms present trade-offs between approxima… ▽ More The Fréchet distance is a popular distance measure for curves. Computing the Fréchet distance between two polygonal curves of $n$ vertices takes roughly quadratic time, and conditional lower bounds suggest that even approximating to within a factor $3$ cannot be done in strongly-subquadratic time, even in one dimension. The current best approximation algorithms present trade-offs between approximation quality and running time. Recently, van der Horst $\textit{et al.}$ (SODA, 2023) presented an $O((n^2 / α) \log^3 n)$ time $α$-approximate algorithm for curves in arbitrary dimensions, for any $α\in [1, n]$. Our main contribution is an approximation algorithm for curves in one dimension, with a significantly faster running time of $O(n \log^3 n + (n^2 / α^3) \log^2 n \log \log n)$. Additionally, we give an algorithm for curves in arbitrary dimensions that improves upon the state-of-the-art running time by a logarithmic factor, to $O((n^2 / α) \log^2 n)$. Both of our algorithms rely on a linear-time simplification procedure that in one dimension reduces the complexity of the reachable free space to $O(n^2 / α)$ without making sacrifices in the asymptotic approximation factor. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 27 pages, 11 figures

ACM Class: F.2.2

arXiv:2401.02897 [pdf, other]

Robust Bichromatic Classification using Two Lines

Authors: Erwin Glazenburg, Thijs van der Horst, Tom Peters, Bettina Speckmann, Frank Staals

Abstract: Given two sets $\mathit{R}$ and $\mathit{B}$ of at most $\mathit{n}$ points in the plane, we present efficient algorithms to find a two-line linear classifier that best separates the "red" points in $\mathit{R}$ from the "blue" points in $B$ and is robust to outliers. More precisely, we find a region $\mathit{W}_\mathit{B}$ bounded by two lines, so either a halfplane, strip, wedge, or double wedge… ▽ More Given two sets $\mathit{R}$ and $\mathit{B}$ of at most $\mathit{n}$ points in the plane, we present efficient algorithms to find a two-line linear classifier that best separates the "red" points in $\mathit{R}$ from the "blue" points in $B$ and is robust to outliers. More precisely, we find a region $\mathit{W}_\mathit{B}$ bounded by two lines, so either a halfplane, strip, wedge, or double wedge, containing (most of) the blue points $\mathit{B}$, and few red points. Our running times vary between optimal $O(n\log n)$ and $O(n^4)$, depending on the type of region $\mathit{W}_\mathit{B}$ and whether we wish to minimize only red outliers, only blue outliers, or both. △ Less

Submitted 16 February, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: 19 pages, 11 figures. Updated to include new results

ACM Class: F.2.2

arXiv:2304.13094 [pdf, other]

Simply Realising an Imprecise Polyline is NP-hard

Authors: Thijs van der Horst, Tim Ophelders, Bart van der Steenhoven

Abstract: We consider the problem of deciding, given a sequence of regions, if there is a choice of points, one for each region, such that the induced polyline is simple or weakly simple, meaning that it can touch but not cross itself. Specifically, we consider the case where each region is a translate of the same shape. We show that the problem is NP-hard when the shape is a unit-disk or unit-square. We ar… ▽ More We consider the problem of deciding, given a sequence of regions, if there is a choice of points, one for each region, such that the induced polyline is simple or weakly simple, meaning that it can touch but not cross itself. Specifically, we consider the case where each region is a translate of the same shape. We show that the problem is NP-hard when the shape is a unit-disk or unit-square. We argue that the problem is NP-complete when the shape is a vertical unit-segment. △ Less

Submitted 25 April, 2023; originally announced April 2023.

arXiv:2208.12721 [pdf, other]

A Subquadratic $n^ε$-approximation for the Continuous Fréchet Distance

Authors: Thijs van der Horst, Marc van Kreveld, Tim Ophelders, Bettina Speckmann

Abstract: The Fréchet distance is a commonly used similarity measure between curves. It is known how to compute the continuous Fréchet distance between two polylines with $m$ and $n$ vertices in $\mathbb{R}^d$ in $O(mn (\log \log n)^2)$ time; doing so in strongly subquadratic time is a longstanding open problem. Recent conditional lower bounds suggest that it is unlikely that a strongly subquadratic algorit… ▽ More The Fréchet distance is a commonly used similarity measure between curves. It is known how to compute the continuous Fréchet distance between two polylines with $m$ and $n$ vertices in $\mathbb{R}^d$ in $O(mn (\log \log n)^2)$ time; doing so in strongly subquadratic time is a longstanding open problem. Recent conditional lower bounds suggest that it is unlikely that a strongly subquadratic algorithm exists. Moreover, it is unlikely that we can approximate the Fréchet distance to within a factor $3$ in strongly subquadratic time, even if $d=1$. The best current results establish a tradeoff between approximation quality and running time. Specifically, Colombe and Fox (SoCG, 2021) give an $O(α)$-approximate algorithm that runs in $O((n^3 / α^2) \log n)$ time for any $α\in [\sqrt{n}, n]$, assuming $m = n$. In this paper, we improve this result with an $O(α)$-approximate algorithm that runs in $O((n + mn / α) \log^3 n)$ time for any $α\in [1, n]$, assuming $m \leq n$ and constant dimension $d$. △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: 20 pages, 5 figures

ACM Class: F.2.2

arXiv:2205.00277 [pdf, other]

Chromatic $k$-Nearest Neighbor Queries

Authors: Thijs van der Horst, Maarten Löffler, Frank Staals

Abstract: Let $P$ be a set of $n$ colored points. We develop efficient data structures that store $P$ and can answer chromatic $k$-nearest neighbor ($k$-NN) queries. Such a query consists of a query point $q$ and a number $k$, and asks for the color that appears most frequently among the $k$ points in $P$ closest to $q$. Answering such queries efficiently is the key to obtain fast $k$-NN classifiers. Our ma… ▽ More Let $P$ be a set of $n$ colored points. We develop efficient data structures that store $P$ and can answer chromatic $k$-nearest neighbor ($k$-NN) queries. Such a query consists of a query point $q$ and a number $k$, and asks for the color that appears most frequently among the $k$ points in $P$ closest to $q$. Answering such queries efficiently is the key to obtain fast $k$-NN classifiers. Our main aim is to obtain query times that are independent of $k$ while using near-linear space. We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the $k$-nearest neighbors of a query point $q$, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of $O(n^{1 / 2} \log n)$ for points in $\mathbb{R}^1$, and with query times varying between $O(n^{2/3}\log^{2/3} n)$ and $O(n^{5/6} {\rm polylog} n)$, depending on the distance measure used, for points in $\mathbb{R}^2$. Since these query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least $(1-\varepsilon)f^*$ times, where $f^*$ is the frequency of the most frequent color, we obtain a query time of $O(\log n + \log\log_{\frac{1}{1-\varepsilon}} n)$ in $\mathbb{R}^1$ and expected query times ranging between $\tilde{O}(n^{1/2}\varepsilon^{-3/2})$ and $\tilde{O}(n^{1/2}\varepsilon^{-5/2})$ in $\mathbb{R}^2$ using near-linear space (ignoring polylogarithmic factors). △ Less

Submitted 30 April, 2022; originally announced May 2022.

Comments: 37 pages, 9 figures

Showing 1–6 of 6 results for author: van der Horst, T