\DTMlangsetup

[en-GB]ord=omit

Using iterated local alignment to aggregate GPS trajectories into a traffic flow map

Tarn Duong111Paris, France F-75000. Email: [email protected]
Abstract

Desire line maps are widely deployed for traffic flow analysis by virtue of their ease of interpretation and computation. They can be considered to be simplified traffic flow maps, whereas the computational challenges in aggregating small scale traffic flows prevent the wider dissemination of high resolution flow maps. GPS trajectories are a promising data source to solve this challenging problem. The solution begins with the alignment (or map matching) of the GPS trajectories to the road network. However even the state-of-the-art map matching APIs produce sub-optimal results with small misalignments. While these misalignments are negligible for large scale flow aggregation in desire line maps, they pose substantial obstacles for small scale flow aggregation in high resolution maps. To remove these remaining misalignments, we introduce innovative local alignment algorithms, where we infer road segments to serve as local reference segments, and proceed to align nearby road segments to them. With each local alignment iteration, the misalignments of the GPS trajectories with each other and with the road network are reduced, and so converge closer to a minimal flow map. By analysing a set of empirical GPS trajectories collected in Hannover, Germany, we confirm that our minimal flow map has high levels of spatial resolution, accuracy and coverage.

Keywords: Desire lines/spider diagram, floating car data FCD, map matching, route finding

1 Introduction

One of the fundamental quantities in transport planning is a traffic flow map, i.e. a map of the traffic flow levels on the road segments in a road network (Ortúzar and Willumsen, 2011). Whilst traffic flow maps are a rich source of information about vehicle mobility patterns, they are costly in terms of time and resources to compute for any reasonably sized road network. To alleviate this cost burden, most approaches restrict the spatial coverage and resolution of the flow map. One of the most common is to place sensors at fixed locations in the road network, whose results are then visualised as a traffic count map. Another are trip intent/recall questionnaires to inform large scale properties such as trajectory origins and destinations, which can be visualised as a desire line flow map/spider diagram (Tobler, 1987). Both of these are simplified flow maps, since the detailed mobility patterns outside of the sensor locations or origin/destination pairs are not known. These unknown patterns can be inferred from other data sources, such as route assignment models (Evans, 1976). While these model-assigned routes are highly detailed, the trade-off is that they are not guaranteed to correspond closely to empirical mobility patterns.

Thus an ideal data source combines the empirical information of road sensor counts or questionnaires, with the small scale details of model-assigned trajectories. This gap in the market can be filled by GPS trajectory data. Due to the prevalence of GPS-enabled devices, such as vehicle navigation guides and mobile telephones, GPS trajectory data can be acquired with low marginal cost whilst at the same time offering extensive spatial coverage and resolution of empirical mobility patterns (Herrera et al., 2010, Andrienko and Andrienko, 2013). Throughout, we employ the term ‘GPS trajectory’ data rather than Floating Car Data (FCD) or Floating Mobile Data (FMD), as our analysis is not restricted to trajectory data from cars or from mobile phones, so these terms are equivalent for our purposes. An example is the 1183 GPS trajectories are collected from a GPS-enabled mobile phone, from December 2017 to March 2019 by a single driver in Hannover, Germany, with an overall average sampling rate about 1 GPS point per second (Zourlidou et al., 2022). They are plotted as the green circles in Figure 1.

Refer to caption(a)
Refer to caption(b)
Refer to caption(c)
Figure 1: GPS trajectories in Hannover, Germany. (a) City level. (b) Neighbourhood level, zoom of black rectangle. (c) Small neighbourhood level, zoom of blue rectangle. Trajectory ID = 7 (orange circles), 315 (purple diamonds), others (green circles).

In Figure 1(a), at the city level, the GPS points (green circles) appear to be aligned to the road network. If we zoom in on the small black rectangle in the central region, then in Figure 1(b) at the neighbourhood level, we observe that the GPS points deviate from the road network. This deviation is clearer in the closer zoom in Figure 1(c). Moreover, if we focus on the orange circles (Trajectory ID = 7) and purple diamonds (ID = 315), then we observe that the vehicle location between the recorded GPS points is unknown. These maps illustrate the errors in GPS trajectories. These are broadly classified as ‘measurement errors’ where the recorded GPS coordinates are not the true locations, and ‘sampling errors’ where the information about the trajectory is lost in between recorded GPS coordinates (Ortúzar and Willumsen, 2011, Saki and Hagen, 2022).

Our goal is to produce a traffic flow map from these noisy GPS trajectories, which can be utilised at any scale, from the city/regional level to the individual road segment level. This requires us to minimise the errors in GPS trajectories. Our approach is composed of two stages. The first stage is to align the GPS trajectories to the road network, which is known as map matching (Quddus et al., 2007, Chao et al., 2020). It produces a route, which is a connected sequence of road segments in the road network, that is consistent with the GPS trajectory. Our contribution is an improvement to standard map matching by adding post hoc route finding. In common with many open source transport planning tools, we employ APIs based on the OpenStreetMap (OSM) network. While we are able to improvement the overall alignment to the road network, these map matched routes inherit incompressible misalignments, ranging from several centimetres to several metres, from the OSM road network. These small misalignments prevent the accurate aggregation of traffic flows at this scale.

The second stage is to resolve these misalignments, and our contribution here is the proposed local alignment of map matched routes. In contrast to global alignment approaches, we locally infer which road segments should serve as a local reference, and then proceed to align other nearby road segments to it. To accomplish this, we introduce several novel algorithms which employ a mix of advanced statistical and geospatial methods. These include ‘node snap**’ where the nearby boundary points of road segments are combined via statistical clustering, and ‘line blending’ where road segments, which are near to each other but do not share overlap** road sub-segments, are aligned to maximise their overlap** sub-segments. Inputting these locally aligned routes into a flow aggregation API leads to a more accurate flow map. Iterating these local alignments in turn leads to a minimal flow map.

The outline of the paper is as follows. In Section 2, we describe the computation of map matched routes using off-the-shelf map matching and route finding APIs. In Section 3, we describe the local alignment of the map matched routes and their aggregation into a flow map. In Section 4, we demonstrate that the locally aligned flow map from the Hannover GPS trajectories is well-aligned to the OSM road network, and has a high level of accuracy and spatial coverage of estimated traffic flows compared to reference traffic flows. We then discuss some software implementation issues, and some concluding remarks.

2 Alignment of GPS trajectories with map matching and route finding

We introduce some mathematical notation to state precisely the problem of map matching. We represent the road network by a graph 𝒩=𝒩(,𝒱)𝒩𝒩𝒱\mathcal{N}=\mathcal{N}(\mathcal{E},\mathcal{V})caligraphic_N = caligraphic_N ( caligraphic_E , caligraphic_V ), where the nsubscript𝑛n_{\mathcal{E}}italic_n start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT edges ={𝒆1,,𝒆n}subscript𝒆1subscript𝒆subscript𝑛\mathcal{E}=\{{\bm{e}}_{1},\dots,{\bm{e}}_{n_{\mathcal{E}}}\}caligraphic_E = { bold_italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_e start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT caligraphic_E end_POSTSUBSCRIPT end_POSTSUBSCRIPT } of this graph are the road segments and the n𝒱subscript𝑛𝒱n_{\mathcal{V}}italic_n start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT nodes/vertices 𝒱={𝒗1,,𝒗n𝒱}𝒱subscript𝒗1subscript𝒗subscript𝑛𝒱\mathcal{V}=\{{\bm{v}}_{1},\dots,{\bm{v}}_{n_{\mathcal{V}}}\}caligraphic_V = { bold_italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_v start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT caligraphic_V end_POSTSUBSCRIPT end_POSTSUBSCRIPT } indicate that two (or more) different road segments are accessible to/from each other at this node point. These nodes 𝒗isubscript𝒗𝑖{\bm{v}}_{i}bold_italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are single GPS points. Each road segment is composed of a sequence of connected piecewise linear segments, so 𝒆i={𝒆i,1,,𝒆i,ni}subscript𝒆𝑖subscript𝒆𝑖1subscript𝒆𝑖subscript𝑛𝑖{\bm{e}}_{i}=\{{\bm{e}}_{i,1},\dots,{\bm{e}}_{i,n_{i}}\}bold_italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { bold_italic_e start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT , … , bold_italic_e start_POSTSUBSCRIPT italic_i , italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT } is an ordered sequence of nisubscript𝑛𝑖n_{i}italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT GPS points 𝒆i,j,j=1,,niformulae-sequencesubscript𝒆𝑖𝑗𝑗1subscript𝑛𝑖{\bm{e}}_{i,j},j=1,\dots,n_{i}bold_italic_e start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT , italic_j = 1 , … , italic_n start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We refer to this as a ‘linestring’ geometry, following the Open Geospatial Consortium terminology (OGC, 2010). We set 𝒩𝒩\mathcal{N}caligraphic_N to be the OSM network (https://www.openstreetmap.org), which is a freely available road network with global coverage.

We denote a single GPS trajectory G={𝒈1,,𝒈nG}𝐺subscript𝒈1subscript𝒈subscript𝑛𝐺G=\{{\bm{g}}_{1},\dots,{\bm{g}}_{n_{G}}\}italic_G = { bold_italic_g start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_g start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT end_POSTSUBSCRIPT } as a temporally ordered sequence of nGsubscript𝑛𝐺n_{G}italic_n start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT GPS points 𝒈i,i=1,,nGformulae-sequencesubscript𝒈𝑖𝑖1subscript𝑛𝐺{\bm{g}}_{i},i=1,\dots,n_{G}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_i = 1 , … , italic_n start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT. This is known as a ‘multipoint’ geometry (OGC, 2010). Whilst some authors require that each GPS point 𝒈isubscript𝒈𝑖{\bm{g}}_{i}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT be accompanied by their timestamp tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to be considered a GPS trajectory, this is not strictly required since 𝒈isubscript𝒈𝑖{\bm{g}}_{i}bold_italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are ordered according to the timestamps, even if the timestamps themselves are not recorded in the GPS trajectory. Due to measurement error, the points of a GPS trajectory G𝐺Gitalic_G are not necessarily coincident with the road network 𝒩𝒩\mathcal{N}caligraphic_N, and due to sampling error, there is no information about the vehicle in between the GPS points of G𝐺Gitalic_G.

We represent the output of a map matching algorithm M𝑀Mitalic_M from an empirical GPS trajectory G𝐺Gitalic_G as M(G)={𝒎1,,𝒎nM}𝑀𝐺subscript𝒎1subscript𝒎subscript𝑛𝑀M(G)=\{{\bm{m}}_{1},\dots,{\bm{m}}_{n_{M}}\}italic_M ( italic_G ) = { bold_italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT } which is an ordered, connected sequence of nMsubscript𝑛𝑀n_{M}italic_n start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT edges. The goal is that the map matched route follows closely the road network 𝒩(𝒱,)𝒩𝒱\mathcal{N}(\mathcal{V},\mathcal{E})caligraphic_N ( caligraphic_V , caligraphic_E ). Whilst it is straightforward to ensure that all boundary points of the segments {𝒎1,,𝒎nM}subscript𝒎1subscript𝒎subscript𝑛𝑀\{{\bm{m}}_{1},\dots,{\bm{m}}_{n_{M}}\}{ bold_italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT } coincide with the nodes 𝒱𝒱\mathcal{V}caligraphic_V of the road network graph 𝒩𝒩\mathcal{N}caligraphic_N, it is more challenging to ensure that the segments {𝒎1,,𝒎nM}subscript𝒎1subscript𝒎subscript𝑛𝑀\{{\bm{m}}_{1},\dots,{\bm{m}}_{n_{M}}\}{ bold_italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT } themselves coincide with the edges \mathcal{E}caligraphic_E. In comparison to an empirical trajectory G𝐺Gitalic_G, all boundary points of the edges of a map matched route M(G)𝑀𝐺M(G)italic_M ( italic_G ) are aligned to the road network (reduced measurement error) and the vehicle position is estimated by the linestring connecting the boundary points of the road segment (reduced sampling error).

There is a large body of research on this difficult problem of map matching. We focus on the popular class of Hidden Markov Models (HMM) map matching algorithms. HMM methods iteratively build the map matched route by selecting the most likely next segment to connect to the current route using a probabilistic model. According to a review of map matching algorithms (Chao et al., 2020), HMM is a state-transition method. The other three classes are similarity, candidate-evolving and scoring methods. Further details of alternative map matching algorithms are found in Quddus et al. (2007), Chao et al. (2020). We leave this discussion here since the improvements offered by our proposed methods are valid for any map matching algorithm, and concentrate on HMM map matching due to its accuracy and computational efficiency. Even if we restrict ourselves to HMM algorithms on the OSM road network, there are many off-the-shelf map matching APIs available. We focus on the Valhalla routing engine (https://valhalla.github.io/valhalla), which includes its highly recommended map matching API (Saki and Hagen, 2022).

Figure 2 displays the n=1147𝑛1147n=1147italic_n = 1147 map matched routes by the Valhalla map matching API. We discard 36 trajectories (3.04%) from our original data set of 1183 trajectories. In Figure 2(a), the map matched routes (blue lines) overall are well-aligned to the road network. For the orange map matched route, all its segments are aligned to the road network, whereas the purple route appears to be displaced by several metres from the road centreline. The measurement and sampling errors of the map matched routes M(G)𝑀𝐺M(G)italic_M ( italic_G ) are reduced in comparison those for the empirical trajectories G𝐺Gitalic_G, though these errors remain sizeable at the road segment level in Figure 2(b).

Refer to caption(a) Refer to caption(b)
Figure 2: Map matched routes, using only Valhalla map matching API. (a) Neighbourhood level. (b) Small neighbourhood level, zoom of black rectangle. Trajectory ID = 7 (orange), 315 (purple), others (blue).

Our first contribution is to better align the edges of the map matched routes M(G)𝑀𝐺M(G)italic_M ( italic_G ) to the road network edges \mathcal{E}caligraphic_E. We propose a post hoc adjustment of the map matching output by an additional call to a route finding API, as outlined in Algorithm 1. The inputs of ST_ROUTE are the empirical GPS trajectory G𝐺Gitalic_G, the map matching API M𝑀Mitalic_M, the route finding API R𝑅Ritalic_R, and the number of waypoints 𝒏Wsubscript𝒏𝑊{\bm{n}}_{W}bold_italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT for the route finding. Since we do not have an a priori single optimal value for the number of waypoints, we consider a range of w𝑤witalic_w values 𝒏W=(nW,1,,nW,w)subscript𝒏𝑊subscript𝑛𝑊1subscript𝑛𝑊𝑤{\bm{n}}_{W}=(n_{W,1},\dots,n_{W,w})bold_italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = ( italic_n start_POSTSUBSCRIPT italic_W , 1 end_POSTSUBSCRIPT , … , italic_n start_POSTSUBSCRIPT italic_W , italic_w end_POSTSUBSCRIPT ). In Step 1, we compute the initial map matched route M(G)𝑀𝐺M(G)italic_M ( italic_G ) from the empirical GPS trajectory G𝐺Gitalic_G by calling the map matching API M𝑀Mitalic_M. This initial map matched route M(G)={𝒎1,,𝒎nM}𝑀𝐺subscript𝒎1subscript𝒎subscript𝑛𝑀M(G)=\{{\bm{m}}_{1},\dots,{\bm{m}}_{n_{M}}\}italic_M ( italic_G ) = { bold_italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT } is a linestring with nMsubscript𝑛𝑀n_{M}italic_n start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT edges, with nM+1subscript𝑛𝑀1n_{M}+1italic_n start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT + 1 points. In Steps 2–5, we loop over the w𝑤witalic_w number of waypoints in 𝒏Wsubscript𝒏𝑊{\bm{n}}_{W}bold_italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT. In Steps 3–4, we take a sample of nW,isubscript𝑛𝑊𝑖n_{W,i}italic_n start_POSTSUBSCRIPT italic_W , italic_i end_POSTSUBSCRIPT waypoints, where the first waypoint 𝒘1subscript𝒘1{\bm{w}}_{1}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is the start point of 𝒎1subscript𝒎1{\bm{m}}_{1}bold_italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and the nW,isubscript𝑛𝑊𝑖n_{W,i}italic_n start_POSTSUBSCRIPT italic_W , italic_i end_POSTSUBSCRIPTth way point is the end point of 𝒎nMsubscript𝒎subscript𝑛𝑀{\bm{m}}_{n_{M}}bold_italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and the intermediate waypoints 𝒘2,,𝒘nW,i1subscript𝒘2subscript𝒘subscript𝑛𝑊𝑖1{\bm{w}}_{2},\dots,{\bm{w}}_{n_{W,i}-1}bold_italic_w start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_w start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_W , italic_i end_POSTSUBSCRIPT - 1 end_POSTSUBSCRIPT are sampled from the start points of the edges 𝒎2,,𝒎nMsubscript𝒎2subscript𝒎subscript𝑛𝑀{\bm{m}}_{2},\dots,{\bm{m}}_{n_{M}}bold_italic_m start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT end_POSTSUBSCRIPT. In Step 5, we call the route finding API R𝑅Ritalic_R with the waypoints 𝒘1,,𝒘nW,isubscript𝒘1subscript𝒘subscript𝑛𝑊𝑖{\bm{w}}_{1},\dots,{\bm{w}}_{n_{W,i}}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_w start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_W , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT. The result is the map matched route Mi(G)=R(𝒘1,,𝒘nW,i)={𝒎1,,𝒎nM}superscriptsubscript𝑀𝑖𝐺𝑅subscript𝒘1subscript𝒘subscript𝑛𝑊𝑖superscriptsubscript𝒎1subscript𝒎subscript𝑛superscript𝑀M_{i}^{*}(G)=R({\bm{w}}_{1},\dots,{\bm{w}}_{n_{W,i}})=\{{\bm{m}}_{1}^{*},\dots% ,{\bm{m}}_{n_{M^{*}}}\}italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) = italic_R ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_w start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_W , italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = { bold_italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , bold_italic_m start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT }, which is a route of nMsubscript𝑛superscript𝑀n_{M^{*}}italic_n start_POSTSUBSCRIPT italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT connected edges. In Step 6, we select the route with the smallest dynamic time war** (DTW) normalised distance between the routes Mi(G)superscriptsubscript𝑀𝑖𝐺M_{i}^{*}(G)italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) and the empirical trajectory G𝐺Gitalic_G, for i=1,w𝑖1𝑤i=1,\dots witalic_i = 1 , … italic_w. The DTW distance is based on the lengths of all the distortions of Mi(G)superscriptsubscript𝑀𝑖𝐺M_{i}^{*}(G)italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) to achieve a maximal alignment between Mi(G)superscriptsubscript𝑀𝑖𝐺M_{i}^{*}(G)italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) and G𝐺Gitalic_G (Sakoe and Chiba, 1978, Giorgino, 2009).

Algorithm 1 ST_ROUTE – Map matched route
1:Input: G𝐺Gitalic_G GPS trajectory, M𝑀Mitalic_M map matching API, R𝑅Ritalic_R route finding API, 𝒏Wsubscript𝒏𝑊{\bm{n}}_{W}bold_italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT #waypoints
2:Output: M(G)superscript𝑀𝐺M^{*}(G)italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) map matched route
3:Compute initial map matched route M(G)𝑀𝐺M(G)italic_M ( italic_G ) from empirical GPS trajectory G𝐺Gitalic_G
4:for i:=1assign𝑖1i:=1italic_i := 1 to w𝑤witalic_w do
5:     Set nW:=nW,iassignsubscript𝑛𝑊subscript𝑛𝑊𝑖n_{W}:=n_{W,i}italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT := italic_n start_POSTSUBSCRIPT italic_W , italic_i end_POSTSUBSCRIPT
6:     Sample nWsubscript𝑛𝑊n_{W}italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT waypoints 𝒘1,,𝒘nWsubscript𝒘1subscript𝒘subscript𝑛𝑊{\bm{w}}_{1},\dots,{\bm{w}}_{n_{W}}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_w start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_POSTSUBSCRIPT from M(G)𝑀𝐺M(G)italic_M ( italic_G ), with 𝒘1subscript𝒘1{\bm{w}}_{1}bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := Start(M)Start𝑀\operatorname{Start}(M)roman_Start ( italic_M ), 𝒘nW:=End(M)assignsubscript𝒘subscript𝑛𝑊End𝑀{\bm{w}}_{n_{W}}:=\operatorname{End}(M)bold_italic_w start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_POSTSUBSCRIPT := roman_End ( italic_M )
7:     Compute map matched route Mi(G):=R(𝒘1,,𝒘nW)assignsubscriptsuperscript𝑀𝑖𝐺𝑅subscript𝒘1subscript𝒘subscript𝑛𝑊M^{*}_{i}(G):=R({\bm{w}}_{1},\dots,{\bm{w}}_{n_{W}})italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_G ) := italic_R ( bold_italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_w start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
8:Select minimal route M(G):=argmini{1w}DTW(Mi(G),G)assignsuperscript𝑀𝐺subscriptargmin𝑖1𝑤DTWsubscriptsuperscript𝑀𝑖𝐺𝐺M^{*}(G):=\operatorname{argmin}_{i\in\{1\dots w\}}\,\operatorname{DTW}(M^{*}_{% i}(G),G)italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) := roman_argmin start_POSTSUBSCRIPT italic_i ∈ { 1 … italic_w } end_POSTSUBSCRIPT roman_DTW ( italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_G ) , italic_G )

We set R𝑅Ritalic_R to be the Valhalla Odin turn-by-turn route finding API to be consistent with our choice of M𝑀Mitalic_M as the Valhalla Meili map matching API. We set the number of waypoints as 𝒏W=3,13,23,33,43,63,83subscript𝒏𝑊3132333436383{\bm{n}}_{W}=3,13,23,33,43,63,83bold_italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = 3 , 13 , 23 , 33 , 43 , 63 , 83. Figure 3 displays the results M(G1),,M(G1147)superscript𝑀subscript𝐺1superscript𝑀subscript𝐺1147M^{*}(G_{1}),\dots,M^{*}(G_{1147})italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT 1147 end_POSTSUBSCRIPT ) from ST_ROUTE. For the trajectory ID = 7 (orange) nW=83subscript𝑛𝑊83n_{W}=83italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = 83, and ID = 315 (purple) nW=23subscript𝑛𝑊23n_{W}=23italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = 23 give the minimal DTW route. The overall impression of the map matched routes Msuperscript𝑀M^{*}italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in Figure 3(a) is that the misalignment has been reduced, especially in the purple line since it now aligns more closely to the road centreline. In Figure 3(b), at the level of road segments, whilst the map matched routes Msuperscript𝑀M^{*}italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT tend to be contained inside the road segments, they are not exactly coincident with each other. This is in part because the road network graph 𝒩𝒩\mathcal{N}caligraphic_N itself contains small misalignments, and so they are propagated into any map matching or route finding algorithm based on it. These small alignments lead to a lack of overlap** sub-segments, which in turn lead to inaccurate flow aggregation.

Refer to caption(a) Refer to caption(b)
Figure 3: Map matched routes, using Valhalla map matching and route finding APIs. (a) Neighbourhood level. (b) Small neighbourhood level, zoom of black rectangle. Trajectory ID = 7 (orange), 315 (purple), others (blue).

3 Local alignment of road segments for flow map aggregation

Our goal is to resolve the remaining misalignments from the map matching/route finding in the previous section, so we are able to aggregate accurately traffic flows on road segments. We aim to achieve this by local, internal alignment between the map matched routes. By internal alignment, we mean that we align the routes with each other, rather than to the external road network graph. Since an external reference road network is not required as an explicit input, our proposal can be deployed in more cases, e.g. when the quality of the road network graph is insufficient, or when the alignment to the road network graph is computationally intensive. By local alignment, we mean that we focus on aligning sub-segments of the routes, rather than complete routes.

Recall that we represent the road network by a graph 𝒩=𝒩(,𝒱)𝒩𝒩𝒱\mathcal{N}=\mathcal{N}(\mathcal{E},\mathcal{V})caligraphic_N = caligraphic_N ( caligraphic_E , caligraphic_V ). To this representation, we add the traffic flows on each of the network edges. We consider, without loss of generality, only the road segments with positive traffic flow ={(f,):𝒩(,𝒱),f>0}conditional-set𝑓bold-ℓformulae-sequencebold-ℓ𝒩𝒱𝑓0\mathcal{F}=\{(f,{\bm{\ell}}):{\bm{\ell}}\in\mathcal{N}(\mathcal{E},\mathcal{V% }),f>0\}caligraphic_F = { ( italic_f , bold_ℓ ) : bold_ℓ ∈ caligraphic_N ( caligraphic_E , caligraphic_V ) , italic_f > 0 } where bold-ℓ{\bm{\ell}}bold_ℓ is a road segment composed of edges in \mathcal{E}caligraphic_E with traffic flow f𝑓fitalic_f. Furthermore, we denote 𝒇=(f,)𝒇𝑓bold-ℓ{\bm{f}}=(f,{\bm{\ell}})bold_italic_f = ( italic_f , bold_ℓ ) so we can write succinctly ={𝒇1,,𝒇n}subscript𝒇1subscript𝒇subscript𝑛\mathcal{F}=\{{\bm{f}}_{1},\dots,{\bm{f}}_{n_{\mathcal{F}}}\}caligraphic_F = { bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT end_POSTSUBSCRIPT } for the nsubscript𝑛n_{\mathcal{F}}italic_n start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT road segment flows in a flow map. Our objective is to estimate these road segment flows where \mathcal{F}caligraphic_F forms a minimal network graph.

We begin by illustrating the difference in flow aggregation between the M𝑀Mitalic_M and Msuperscript𝑀M^{*}italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT map matched routes. For the routes M𝑀Mitalic_M from Figure 2(b) and Msuperscript𝑀M^{*}italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT routes from Figure 3(b), the flow maps are given below in Figure 4(a–b) respectively. In these maps, the colour (purple to orange) and width of the road segments is proportional to the traffic flow. We observe that there are fewer, wider linestring segments in Figure 4(b) than in Figure 4(a).

Refer to caption(a) Refer to caption(b)
Figure 4: Flow maps for map matched routes. (a) Map matched routes with Valhalla map matching only. (b) Map matched routes with Valhalla map matching and route finding. Colour (purple to orange) and width of road segments is proportional to traffic flow.

The flow aggregation in Figure 4 was carried out using the overline function in the R package stplanr, which we refer to as ST_OVERLINE_PLANR (Lovelace and Ellison, 2018). Starting with the map matched routes ={M(G1),,M(Gn)}superscriptsuperscript𝑀subscript𝐺1superscript𝑀subscript𝐺𝑛\mathcal{M}^{*}=\{M^{*}({G_{1}}),\dots,M^{*}(G_{n})\}caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = { italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) }, then the flow map is =𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝙻𝙰𝙽𝚁()𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝙻𝙰𝙽𝚁superscript\mathcal{F}={\tt ST\_OVERLINE\_PLANR}(\mathcal{M}^{*})caligraphic_F = typewriter_ST _ typewriter_OVERLINE _ typewriter_PLANR ( caligraphic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ). This flow aggregation involves the search for all road segments from the routes which exactly equal to each other. These exactly equal road segments are reduced to a single common segment, and the associated traffic flow is the number of exactly equal segments. Since it relies on exactly equal road segments, then small misalignments are sufficient to make the flow aggregations inaccurate.

Our goal of producing a minimal flow map relies on resolving the crucial problem of how to aggregate similar, but not exactly overlap**, road segments. Many solutions have been offered, such as edge bundling (Zhou et al., 2013) and rasterisation (Wood et al., 2010, Morgan and Lovelace, 2021). Edge bundling consists of clustering trajectory linestrings and replacing all cluster members with a single representative linestring. These (and subsequent) authors conclude that it performs poorly when applied to noisy GPS trajectories at the road segment level, and remains mostly suited to coarser aggregations, such as desire lines. Rasterisation relies on converting the vectorial flow map into a raster matrix, and aggregating the flows within the same raster pixel neighbourhood. Whilst this is indeed able to improve flow aggregations, it depends highly on the raster pixel neighbourhood size, and the rasterisation of the vectorial flow map leads to a loss of spatial resolution. We propose an alternative aggregation which does not lose resolution. Due to the complexity of this aggregation, it is divided into several algorithms, so that after the application of each algorithm, we progress further towards a minimal flow map.

We develop our novel algorithms within the R statistical analysis environment, to take advantage of its integrated access to advanced statistical and geospatial analysis methods. Whilst R is not a bona fide GIS (Geographical Informations System), its geospatial functionalities conform to the OGC standards (OGC, 2010) via the package sf (Pebesma, 2018), and is a viable option for research in transport geospatial data analysis (Necula, 2015, Lovelace et al., 2019).

3.1 Node snap** with hierarchical clustering

The proposed algorithm is ST_SNAPNODE, where the boundary points of the traffic flow linestrings are snapped to each other. Since the former are also nodes of the flow map, this gives the name to the algorithm. We focus on snap** these nodes since the linestring misalignments are in part caused by the existence of nodes which are close to each other but not exactly equal.

Since we are searching for points which are close to together, then this is well-suited to statistical clustering. There are many statistical clustering algorithms available, and we focus on hierarchical clustering (Gordon, 1999). A naive implementation where we consider all boundary points of all flow linestrings in a 1-pass complete linkage clustering is computationally intensive for any reasonable number of routes (Müllner, 2013). To resolve this computational bottleneck, we approximate the 1-pass complete linkage clustering by a nested 2-pass clustering. We begin with an efficient single linkage clustering of the boundary points of all linestrings in the R package fastcluster (Müllner, 2013). Since single linkage can result in chain-like clusters, we compute a subsequent complete linkage clustering to break these potential chains. In this nested approach, the complete linkage distance matrix is calculated only within each single linkage cluster, and so we are less likely to reach computational limits.

Algorithm 2 is a description of ST_SNAPNODE. The inputs are the nsubscript𝑛n_{\mathcal{F}}italic_n start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT traffic flow linestrings ={𝒇1,,𝒇n}subscript𝒇1subscript𝒇subscript𝑛\mathcal{F}=\{{\bm{f}}_{1},\dots,{\bm{f}}_{n_{\mathcal{F}}}\}caligraphic_F = { bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT end_POSTSUBSCRIPT } and the snap tolerance εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT. In Step 1, we extract the boundary points of all flow linestrings. In Steps 2–3, we compute a single linkage clustering on all boundary points, and cut the dendrogram at height εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, resulting in Csuperscript𝐶C^{\prime}italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT clusters. In Steps 4–7, within each of these single linkage clusters, we compute a complete linkage clustering, and cut the dendrogram at height εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT. This divides each single linkage cluster into C′′superscript𝐶′′C^{\prime\prime}italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT clusters, where all cluster members are at most εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT distance from each other. In Steps 8–11, we compute the weighted spatial centroid in each of the C′′superscript𝐶′′C^{\prime\prime}italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT complete linkage clusters, weighted by the corresponding traffic flow values. In Step 12, we renumber the cluster labels from the above nested clusterings to approximate a 1-pass complete linkage clustering into Csuperscript𝐶C^{*}italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT clusters. In Steps 13–16, within each of these Csuperscript𝐶C^{*}italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT clusters, for all points of the corresponding flow linestrings which are closer than εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT distance to the centroid, we snap them to the centroid. In Steps 17–18, we collate and sort all the nsubscript𝑛superscriptn_{\mathcal{F}^{*}}italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT snapped linestrings ={𝒇1,,𝒇n}superscriptsuperscriptsubscript𝒇1superscriptsubscript𝒇subscript𝑛superscript\mathcal{F}^{*}=\{{\bm{f}}_{1}^{*},\dots,{\bm{f}}_{n_{\mathcal{F}^{*}}}^{*}\}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = { bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT }. In comparison to the unsnapped linestrings \mathcal{F}caligraphic_F, there are fewer snapped linestrings superscript\mathcal{F}^{*}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT which have higher flow values, are more interconnected, and share more exactly overlap** sub-segments.

Algorithm 2 ST_SNAPNODE – Snap node clustering of linestrings
1:Input: \mathcal{F}caligraphic_F flow linestrings, εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT snap tolerance
2:Output: superscript\mathcal{F}^{*}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT snap node clustered flow linestrings
3:Extract boundary points B():={Start(𝒆1),End(𝒆1),,Start(𝒆n),End(𝒆n)}assign𝐵Startsubscript𝒆1Endsubscript𝒆1Startsubscript𝒆subscript𝑛Endsubscript𝒆subscript𝑛B(\mathcal{F}):=\{\operatorname{Start}({\bm{e}}_{1}),\operatorname{End}({\bm{e% }}_{1}),\dots,\operatorname{Start}({\bm{e}}_{n_{\mathcal{F}}}),\operatorname{% End}({\bm{e}}_{n_{\mathcal{F}}})\}italic_B ( caligraphic_F ) := { roman_Start ( bold_italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , roman_End ( bold_italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , … , roman_Start ( bold_italic_e start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , roman_End ( bold_italic_e start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT caligraphic_F end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) }
4:Compute hierarchical clustering with single linkage on B()𝐵B(\mathcal{F})italic_B ( caligraphic_F )
5:Cut single linkage dendrogram at height εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT to compute Csuperscript𝐶C^{\prime}italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT cluster labels
6:for i:=1assign𝑖1i:=1italic_i := 1 to Csuperscript𝐶C^{\prime}italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT do
7:     Extract i𝑖iitalic_ith cluster of boundary points Bi:={𝒃1,,𝒃nB}assignsuperscriptsubscript𝐵𝑖superscriptsubscript𝒃1superscriptsubscript𝒃subscript𝑛superscript𝐵B_{i}^{\prime}:=\{{\bm{b}}_{1}^{\prime},\dots,{\bm{b}}_{n_{B^{\prime}}}^{% \prime}\}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT := { bold_italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , … , bold_italic_b start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }
8:     Compute hierarchical clustering with complete linkage on Bisuperscriptsubscript𝐵𝑖B_{i}^{\prime}italic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
9:     Cut complete linkage dendrogram at height εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT to compute C′′superscript𝐶′′C^{\prime\prime}italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT cluster labels
10:     for j:=1assign𝑗1j:=1italic_j := 1 to C′′superscript𝐶′′C^{\prime\prime}italic_C start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT do
11:         Extract j𝑗jitalic_jth cluster of boundary points Bj′′:={𝒃1′′,,𝒃nB′′′′}assignsuperscriptsubscript𝐵𝑗′′superscriptsubscript𝒃1′′superscriptsubscript𝒃subscript𝑛superscript𝐵′′′′B_{j}^{\prime\prime}:=\{{\bm{b}}_{1}^{\prime\prime},\dots,{\bm{b}}_{n_{B^{% \prime\prime}}}^{\prime\prime}\}italic_B start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT := { bold_italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , … , bold_italic_b start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT }
12:         Extract corresponding flows {𝒇1′′,,𝒇nB′′′′}superscriptsubscript𝒇1′′superscriptsubscript𝒇subscript𝑛superscript𝐵′′′′\{{\bm{f}}_{1}^{\prime\prime},\dots,{\bm{f}}_{n_{B^{\prime\prime}}}^{\prime% \prime}\}{ bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT } from \mathcal{F}caligraphic_F
13:         Compute 𝒃jsuperscriptsubscript𝒃𝑗{\bm{b}}_{j}^{*}bold_italic_b start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := weighted centroid of {𝒇1′′,,𝒇nB′′′′}superscriptsubscript𝒇1′′superscriptsubscript𝒇subscript𝑛superscript𝐵′′′′\{{\bm{f}}_{1}^{\prime\prime},\dots,{\bm{f}}_{n_{B^{\prime\prime}}}^{\prime% \prime}\}{ bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT }, weights := f1′′,,fnB′′′′superscriptsubscript𝑓1′′superscriptsubscript𝑓subscript𝑛superscript𝐵′′′′f_{1}^{\prime\prime},\dots,f_{n_{B^{\prime\prime}}}^{\prime\prime}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT , … , italic_f start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT      
14:Renumber collated complete linkage cluster labels to unique Csuperscript𝐶C^{*}italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT labels
15:for i:=1assign𝑖1i:=1italic_i := 1 to Csuperscript𝐶C^{*}italic_C start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT do
16:     Extract corresponding flow linestrings i:={𝒇1,,𝒇nB}assignsuperscriptsubscript𝑖superscriptsubscript𝒇1superscriptsubscript𝒇subscript𝑛superscript𝐵\mathcal{F}_{i}^{*}:=\{{\bm{f}}_{1}^{*},\dots,{\bm{f}}_{n_{B^{*}}}^{*}\}caligraphic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := { bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } from \mathcal{F}caligraphic_F
17:     Snap points of {𝒇1,,𝒇nB}superscriptsubscript𝒇1superscriptsubscript𝒇subscript𝑛superscript𝐵\{{\bm{f}}_{1}^{*},\dots,{\bm{f}}_{n_{B^{*}}}^{*}\}{ bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT italic_B start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } within εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT distance of cluster centroid 𝒃isubscriptsuperscript𝒃𝑖{\bm{b}}^{*}_{i}bold_italic_b start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to 𝒃isubscriptsuperscript𝒃𝑖{\bm{b}}^{*}_{i}bold_italic_b start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
18:     isuperscriptsubscript𝑖\mathcal{F}_{i}^{*}caligraphic_F start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := rejoin snapped and unsnapped points into linestrings
19:Collate snapped linestrings {1,C}superscriptsubscript1superscriptsubscriptsuperscript𝐶\{\mathcal{F}_{1}^{*},\dots\mathcal{F}_{C^{\prime*}}^{*}\}{ caligraphic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … caligraphic_F start_POSTSUBSCRIPT italic_C start_POSTSUPERSCRIPT ′ ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT } into :={𝒇1,,𝒇n}assignsuperscriptsuperscriptsubscript𝒇1superscriptsubscript𝒇subscript𝑛superscript\mathcal{F}^{*}:=\{{\bm{f}}_{1}^{*},\dots,{\bm{f}}_{n_{\mathcal{F}^{*}}}^{*}\}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := { bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT }
20:Sort superscript\mathcal{F}^{*}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT by descending flow and length

An illustration of Algorithm 2 is given in Figure 5. In Figure 5(a) is the flow map before node snap**, where we observe that the boundary points (black solid circles) of some of the purple linestrings are closer to each than the snap** tolerance εS=4subscript𝜀𝑆4\varepsilon_{S}=4italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 4 m. Figure 5(b) shows an update flow map after node snap**, and applying ST_OVERLINE_PLANR. There are fewer thin purple road segments, since by snap** their intersection nodes, they share more exact sub-segments, and these shared sub-segments are merged during the flow aggregation to result in wide orange road segments.

Refer to caption(a) Refer to caption(b)
Figure 5: Flow maps for unsnapped and node snapped segments, with tolerance εS=4msubscript𝜀𝑆4m\varepsilon_{S}=4~{}\mathrm{m}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 4 roman_m. (a) Unsnapped. (b) Node snapped. Colour (purple to orange) and width of road segments is proportional to traffic flow.

3.2 Node splitting to add missing intersection nodes

Node snap** is focused near the boundary points of the flow linestrings. So it does not address misalignments far from the boundary points. Moreover there remain some linestrings which do indeed intersect but whose intersection is not correctly computed by ST_OVERLINE_PLANR. The solution to both of these problems is the explicit addition of the missing intersection nodes to these linestrings.

ST_SPLITNODE is the collation of a couple of algorithms which explicitly add the nodes at the intersections in the interior of linestrings, and splits these linestrings into simple linestring segments at these added nodes. This gives the name to this method. The inputs to Algorithm 3 are the linestrings \mathcal{F}caligraphic_F and the node splitting S𝑆Sitalic_S. For computational stability and efficiency, our method draws on two existing algorithms: in Steps 1–3, to_spatial_subdivision (S=𝑆absentS=italic_S = ‘subdivision’) in the sfnetworks package (van der Meer et al., 2023), and in Steps 4–5, geos_unary_union (S=𝑆absentS=italic_S = ‘unary’) in the geos package (Dunnington and Pebesma, 2023). The first option tends to find fewer intersection nodes, but this can be an advantage for our proposed line blending (to be introduced in the next subsection) since many small linestrings with similar flow values do not provide a clear prioritisation of this blending. It is also similar to the intersection computation performed by ST_OVERLINE_PLANR. The second option finds more missing intersection nodes, and leads to more comprehensive flow aggregation. We require both types of node splitting to compute a minimal flow map.

Algorithm 3 ST_SPLITNODE – Split nodes at interior intersections of linestrings
1:Input: \mathcal{F}caligraphic_F flow linestrings, S𝑆Sitalic_S node splitting type
2:Output: superscript\mathcal{F}^{*}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT node split flow linestrings
3:Initialise local network 𝒩superscript𝒩\mathcal{N}^{*}caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with linestrings \mathcal{F}caligraphic_F
4:if S==𝑆S==italic_S = = ‘subdivision’ then
5:     :=assignsuperscriptabsent\mathcal{F}^{*}:=caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := sfnetworks::to_spatial_subdivision(𝒩)superscript𝒩(\mathcal{N}^{*})( caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
6:else if S==𝑆S==italic_S = = ‘unary’ then
7:     :=assignsuperscriptabsent\mathcal{F}^{*}:=caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := geos::geos_unary_union(𝒩)superscript𝒩(\mathcal{N}^{*})( caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )

In Figure 6(a) is the flow map without any node splitting or node snap**. This map has missing intersection nodes, and intersections nodes which are close to each other. In Figure 6(b) is the flow map after node splitting (S=𝑆absentS=italic_S = ‘unary’) to add the missing intersections nodes, and then node snap** (with snap tolerance εS=4subscript𝜀𝑆4\varepsilon_{S}=4italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 4 m). The result is that there are fewer road segments with higher flow values. Due to the combined action of ST_SPLITNODE and ST_SNAPNODE, Figure 6(b) is an improvement over Figures 5(a–b) and 6(a).

Refer to caption(a) Refer to caption(b)
Figure 6: Flow maps for unsplit/unsnapped and node split/node snapped segments, with tolerance εS=4msubscript𝜀𝑆4m\varepsilon_{S}=4~{}\mathrm{m}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 4 roman_m. (a) Unsplit and unsnapped. (b) Node split and node snapped. Colour (purple to orange) and width of road segments is proportional to traffic flow.

3.3 Line blending to align similar linestrings with local reference

So far we have focused on improving the alignment of linestrings induced by resolving inconsistencies at their intersections. We now focus on aligning linestrings more generally. For this, we require a comparison relationship to establish an order of alignment of nearby linestrings. For two linestrings 𝒇1subscript𝒇1{\bm{f}}_{1}bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝒇2subscript𝒇2{\bm{f}}_{2}bold_italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT from a flow map \mathcal{F}caligraphic_F, we define that 𝒇2subscript𝒇2{\bm{f}}_{2}bold_italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT is a candidate to be aligned to the reference 𝒇1subscript𝒇1{\bm{f}}_{1}bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT at threshold ε0𝜀0\varepsilon\geq 0italic_ε ≥ 0 if

𝒇2𝚂𝚃_𝙱𝚄𝙵𝙵𝙴𝚁(𝒇1,ε),f2f1.formulae-sequencesubscript𝒇2𝚂𝚃_𝙱𝚄𝙵𝙵𝙴𝚁subscript𝒇1𝜀subscript𝑓2subscript𝑓1{\bm{f}}_{2}\subseteq{\tt ST\_BUFFER}({\bm{f}}_{1},\varepsilon),\ f_{2}\leq f_% {1}.bold_italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊆ typewriter_ST _ typewriter_BUFFER ( bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ε ) , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT . (1)

This relation is insensitive to any local complexities in 𝒇2subscript𝒇2{\bm{f}}_{2}bold_italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT as long as they are all contained within the buffer zone around 𝒇1subscript𝒇1{\bm{f}}_{1}bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The buffer zone we employ has flat edges, e.g. for the R package sf, this corresponds to ST_BUFFER(endCapStyle="FLAT"), so the buffer zone ends at the boundary points of 𝒇1subscript𝒇1{\bm{f}}_{1}bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. This avoids erroneously considering neighbouring segments of 𝒇1subscript𝒇1{\bm{f}}_{1}bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, which are connected to 𝒇1subscript𝒇1{\bm{f}}_{1}bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT as a part of a longer sequence of road segments, to be candidates to be aligned to 𝒇1subscript𝒇1{\bm{f}}_{1}bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. The condition on the flow values means that we place a higher priority on linestrings with higher flows. Since we can define a local reference linestring, a global road network graph is no longer required to align the candidate linestrings.

Let 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT be a reference linestring from \mathcal{F}caligraphic_F. The set of m𝑚mitalic_m linestrings from \{𝒇ref}\subscript𝒇ref\mathcal{F}\backslash\{{\bm{f}}_{\rm ref}\}caligraphic_F \ { bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT } which satisfy Equation (1) are the candidate linestrings cand={𝒇cand,1,,𝒇cand,ncand}subscriptcandsubscript𝒇cand1subscript𝒇candsubscript𝑛subscriptcand\mathcal{F}_{\rm cand}=\{{\bm{f}}_{{\rm cand},1},\dots,{\bm{f}}_{{\rm cand},n_% {\mathcal{F}_{\rm cand}}}\}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT = { bold_italic_f start_POSTSUBSCRIPT roman_cand , 1 end_POSTSUBSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT roman_cand , italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT }, with the convention that if ncand=0subscript𝑛subscriptcand0n_{\mathcal{F}_{\rm cand}}=0italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT end_POSTSUBSCRIPT = 0 then candsubscriptcand\mathcal{F}_{\rm cand}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT is the empty set. We call our approach ‘line blending’, since we will blend the candidate linestrings onto the reference linestring. The inputs in Algorithm 4 are the reference linestring 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT, the set of m𝑚mitalic_m candidate linestrings candsubscriptcand\mathcal{F}_{\rm cand}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT, and the blend tolerance ε𝜀\varepsilonitalic_ε. The output are the modified reference and m𝑚mitalic_m modified candidate flow linestrings, all with added interior points for accurate calculation of exactly equal linestring segments. In Step 1, we initialise a local network graph 𝒩superscript𝒩\mathcal{N}^{*}caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with the reference linestring. In Steps 2–3, we extract all points of the candidate linestrings, and use the network_blend function in the sfnetworks package (which we denote as ST_NETWORK_BLEND) to blend efficiently these points into the reference linestring (van der Meer et al., 2023). This network blending requires a blend threshold, which we set to be ε𝜀\varepsilonitalic_ε. ST_NETWORK_BLEND projects the candidate linestrings onto the reference linestring, and explicitly adds them to the network, thereby creating new edges in 𝒩superscript𝒩\mathcal{N}^{*}caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The result is a local network graph with more, shorter edges and with nodes at the projected candidate points, and whose union is the reference linestring. In Steps 4–5, we extract the m𝑚mitalic_m blended candidate linestrings with these added interior points by applying the shortest path search between the start and end point of each blended candidate linestring along the network graph 𝒩superscript𝒩\mathcal{N}^{*}caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. Step 6 is the equivalent for the blended reference linestring. Step 7 involves collating the blended candidate linestrings.

Algorithm 4 ST_LINEBLEND – Blend candidate linestrings onto reference linestring
1:Input: 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT reference, candsubscriptcand\mathcal{F}_{\rm cand}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT candidate flow linestrings, ε𝜀\varepsilonitalic_ε blend tolerance
2:Output: (𝒇ref,cand)subscriptsuperscript𝒇refsubscriptsuperscriptcand({\bm{f}}^{*}_{\rm ref},\mathcal{F}^{*}_{\rm cand})( bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT , caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT ) blended reference and m𝑚mitalic_m candidate flow linestrings
3:Initialise local network graph 𝒩superscript𝒩\mathcal{N}^{*}caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT with 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT
4:Extract all points Gcandsubscript𝐺candG_{\rm cand}italic_G start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT of candidate linestrings and flow values fcandsubscript𝑓candf_{\rm cand}italic_f start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT from candsubscriptcand\mathcal{F}_{\rm cand}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT
5:Update network by blending candidate points 𝒩:=𝚂𝚃_𝙽𝙴𝚃𝚆𝙾𝚁𝙺_𝙱𝙻𝙴𝙽𝙳(𝒩,Gcand,ε)assignsuperscript𝒩𝚂𝚃_𝙽𝙴𝚃𝚆𝙾𝚁𝙺_𝙱𝙻𝙴𝙽𝙳superscript𝒩subscript𝐺cand𝜀\mathcal{N}^{*}:={\tt ST\_NETWORK\_BLEND}(\mathcal{N}^{*},G_{\rm cand},\varepsilon)caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := typewriter_ST _ typewriter_NETWORK _ typewriter_BLEND ( caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , italic_G start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT , italic_ε )
6:for i:=1assign𝑖1i:=1italic_i := 1 to ncandsubscript𝑛subscriptcandn_{\mathcal{F}_{\rm cand}}italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT end_POSTSUBSCRIPT do
7:     𝒇cand,isubscriptsuperscript𝒇cand𝑖{\bm{f}}^{*}_{{\rm cand},i}bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_cand , italic_i end_POSTSUBSCRIPT := shortest path from Start(𝒇cand,i)Startsubscript𝒇cand𝑖\operatorname{Start}({\bm{f}}_{{\rm cand},i})roman_Start ( bold_italic_f start_POSTSUBSCRIPT roman_cand , italic_i end_POSTSUBSCRIPT ) to End(𝒇cand,i)Endsubscript𝒇cand𝑖\operatorname{End}({\bm{f}}_{{\rm cand},i})roman_End ( bold_italic_f start_POSTSUBSCRIPT roman_cand , italic_i end_POSTSUBSCRIPT ) along network 𝒩superscript𝒩\mathcal{N}^{*}caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
8:𝒇refsubscriptsuperscript𝒇ref{\bm{f}}^{*}_{{\rm ref}}bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT := shortest path from Start(𝒇ref)Startsubscript𝒇ref\operatorname{Start}({\bm{f}}_{{\rm ref}})roman_Start ( bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT ) to End(𝒇ref)Endsubscript𝒇ref\operatorname{End}({\bm{f}}_{{\rm ref}})roman_End ( bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT ) along network 𝒩superscript𝒩\mathcal{N}^{*}caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
9:Collate cand:={𝒇cand,1,,𝒇cand,ncand}assignsubscriptsuperscriptcandsuperscriptsubscript𝒇cand1superscriptsubscript𝒇candsubscript𝑛subscriptcand\mathcal{F}^{*}_{\rm cand}:=\{{\bm{f}}_{{\rm cand},1}^{*},\dots,{\bm{f}}_{{\rm cand% },n_{\mathcal{F}_{\rm cand}}}^{*}\}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT := { bold_italic_f start_POSTSUBSCRIPT roman_cand , 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , … , bold_italic_f start_POSTSUBSCRIPT roman_cand , italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT }

The result from 𝚂𝚃_𝙻𝙸𝙽𝙴𝙱𝙻𝙴𝙽𝙳𝚂𝚃_𝙻𝙸𝙽𝙴𝙱𝙻𝙴𝙽𝙳{\tt ST\_LINEBLEND}typewriter_ST _ typewriter_LINEBLEND is the modified reference linestring and the projected candidate linestrings with added interior points. These added interior points resolve a limitation of the flow aggregation of ST_OVERLINE_PLANR (Morgan and Lovelace, 2021). If there are many candidate linestrings, then this may lead to many sub-segments in the projected linestrings, each with their own flow values. So we assign the weighted mean flow, weighted by the sub-segment length, to all sub-segments. This single flow value takes into account the contribution of each candidate linestring to the flow along the reference linestring. Moreover, we take the rounded value of this weighted mean flow to expedite the aggregation computations and to reduce visual clutter of the flow map.

In Figure 7(a), we illustrate this line blending with the linestrings ABC𝐴𝐵𝐶ABCitalic_A italic_B italic_C with flow fABC=7subscript𝑓𝐴𝐵𝐶7f_{ABC}=7italic_f start_POSTSUBSCRIPT italic_A italic_B italic_C end_POSTSUBSCRIPT = 7 in blue, and AD𝐴𝐷ADitalic_A italic_D with flow fAD=2subscript𝑓𝐴𝐷2f_{AD}=2italic_f start_POSTSUBSCRIPT italic_A italic_D end_POSTSUBSCRIPT = 2 in orange. As these two linestrings do not share a common sub-segment, so flow aggregation does not modify them. Since AD𝚂𝚃_𝙱𝚄𝙵𝙵𝙴𝚁(ABC,4m)𝐴𝐷𝚂𝚃_𝙱𝚄𝙵𝙵𝙴𝚁𝐴𝐵𝐶4mAD\subset{\tt ST\_BUFFER}(ABC,4~{}\mathrm{m})italic_A italic_D ⊂ typewriter_ST _ typewriter_BUFFER ( italic_A italic_B italic_C , 4 roman_m ) (the pale blue rectangle), and fAD<fABCsubscript𝑓𝐴𝐷subscript𝑓𝐴𝐵𝐶f_{AD}<f_{ABC}italic_f start_POSTSUBSCRIPT italic_A italic_D end_POSTSUBSCRIPT < italic_f start_POSTSUBSCRIPT italic_A italic_B italic_C end_POSTSUBSCRIPT, then ABC𝐴𝐵𝐶ABCitalic_A italic_B italic_C is the reference 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT and AD𝐴𝐷ADitalic_A italic_D is the candidate linestring 𝒇candsubscript𝒇cand{\bm{f}}_{\rm cand}bold_italic_f start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT, according to Equation (1). Line blending involves projecting AD𝐴𝐷ADitalic_A italic_D to ABC𝐴𝐵𝐶ABCitalic_A italic_B italic_C, so D𝐷Ditalic_D is projected to Dsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT (which lies exactly on the reference linestring), and B𝐵Bitalic_B is added explicitly to this projected linestring. We also add Dsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to the reference linestring. The reference linestring becomes 𝒇ref=(7,ABDC)subscriptsuperscript𝒇ref7𝐴𝐵superscript𝐷𝐶{\bm{f}}^{*}_{\rm ref}=(7,ABD^{\prime}C)bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT = ( 7 , italic_A italic_B italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_C ), and the candidate 𝒇cand=(2,ABD)subscriptsuperscript𝒇cand2𝐴𝐵superscript𝐷{\bm{f}}^{*}_{\rm cand}=(2,ABD^{\prime})bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT = ( 2 , italic_A italic_B italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). Now 𝒇ref,𝒇candsubscriptsuperscript𝒇refsubscriptsuperscript𝒇cand{\bm{f}}^{*}_{\rm ref},{\bm{f}}^{*}_{\rm cand}bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT , bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT share the sub-segment ABD𝐴𝐵superscript𝐷ABD^{\prime}italic_A italic_B italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT exactly, and ST_OVERLINE_PLANR gives the aggregated flows as (9,ABD)9𝐴𝐵superscript𝐷(9,ABD^{\prime})( 9 , italic_A italic_B italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) and (2,DC)2superscript𝐷𝐶(2,D^{\prime}C)( 2 , italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_C ). We take the rounded value of the weighted mean of these two flow linestrings. The result of ST_LINEBLEND with ST_OVERLINE_PLANR is a single linestring (9,ABDC)9𝐴𝐵superscript𝐷𝐶(9,ABD^{\prime}C)( 9 , italic_A italic_B italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_C ), as shown in Figure 7(b).

Refer to caption(a) Refer to caption(b)
Figure 7: Line blending removes small misalignments and leads to correct flow aggregation. (a) Before line blending and flow aggregation. Reference linestring (7,ABC)7𝐴𝐵𝐶(7,ABC)( 7 , italic_A italic_B italic_C ) in blue, candidate linestring (2,AD)2𝐴𝐷(2,AD)( 2 , italic_A italic_D ) in orange. (b) After blending candidate into reference linestring, with tolerance ε=4𝜀4\varepsilon=4italic_ε = 4 m, and flow aggregation. Blended linestring is (9,ABDC)9𝐴𝐵superscript𝐷𝐶(9,ABD^{\prime}C)( 9 , italic_A italic_B italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_C ).

More complex situations arise when other linestrings touch the candidate linestring, but are not candidates themselves for blending. Since line blending projects the candidates to the reference linestring, then we have to also project these other linestrings to avoid leaving a gap in the updated flow map. This procedure ST_SNAP_CAND_TOUCH is outlined in Algorithm 5. The inputs are the reference linestring 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT, the ncandsubscript𝑛subscriptcandn_{\mathcal{F}_{\rm cand}}italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT end_POSTSUBSCRIPT candidate linestrings candsubscriptcand\mathcal{F}_{\rm cand}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT, the nctsubscript𝑛subscriptctn_{\mathcal{F}_{\rm ct}}italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT end_POSTSUBSCRIPT candidate-touching linestrings ctsubscriptct\mathcal{F}_{\rm ct}caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT, and the snap tolerance εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT. In Steps 1–8, we iterate over each candidate-touching linestring. In Steps 2–3, we compute the intersection points between the candidate-touching linestring and the boundary points of the candidate linestrings, and the respective distances. In Steps 4–7, if this intersection point is within εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT distance to the boundary points of 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT, then we snap the candidate-touching linestring to the closest boundary point. In Step 8, if the intersection point is not within εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT distance, then we snap it to the nearest point on 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT. These snap** operations are similar to those in Steps 15–16 in ST_SNAPNODE in Algorithm 2, and ensure that we maintain connectivity between ctsubscriptct\mathcal{F}_{\rm ct}caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT and 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT. In Step 9, we collate these snapped linestrings.

Algorithm 5 ST_SNAP_CAND_TOUCH – Snap candidate-touching linestrings onto reference
1:Input: 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT reference, candsubscriptcand\mathcal{F}_{\rm cand}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT candidate, ctsubscriptct\mathcal{F}_{\rm ct}caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT candidate-touching flow linestrings, εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT snap tolerance
2:Output: ctsubscriptsuperscriptct\mathcal{F}^{*}_{\rm ct}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT snapped candidate-touching flow linestrings
3:for i:=1assign𝑖1i:=1italic_i := 1 to nctsubscript𝑛subscriptctn_{\mathcal{F}_{\rm ct}}italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT end_POSTSUBSCRIPT do
4:     𝒈ct,i:=𝒇ct,i{Start(cand),End(cand)}assignsuperscriptsubscript𝒈ct𝑖subscript𝒇ct𝑖StartsubscriptcandEndsubscriptcand{\bm{g}}_{{\rm ct},i}^{*}:={\bm{f}}_{{\rm ct},i}\cap\{\operatorname{Start}(% \mathcal{F}_{{\rm cand}}),\operatorname{End}(\mathcal{F}_{{\rm cand}})\}bold_italic_g start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := bold_italic_f start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT ∩ { roman_Start ( caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT ) , roman_End ( caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT ) }
5:     di,Start:=𝚂𝚃_𝙳𝙸𝚂𝚃(𝒈ct,i,Start(𝒇ref))assignsuperscriptsubscript𝑑𝑖Start𝚂𝚃_𝙳𝙸𝚂𝚃superscriptsubscript𝒈ct𝑖Startsubscript𝒇refd_{i,\operatorname{Start}}^{*}:={\tt ST\_DIST}({\bm{g}}_{{\rm ct},i}^{*},% \operatorname{Start}({\bm{f}}_{\rm ref}))italic_d start_POSTSUBSCRIPT italic_i , roman_Start end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := typewriter_ST _ typewriter_DIST ( bold_italic_g start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , roman_Start ( bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT ) ); di,End:=𝚂𝚃_𝙳𝙸𝚂𝚃(𝒈ct,i,End(𝒇ref))assignsuperscriptsubscript𝑑𝑖End𝚂𝚃_𝙳𝙸𝚂𝚃superscriptsubscript𝒈ct𝑖Endsubscript𝒇refd_{i,\operatorname{End}}^{*}:={\tt ST\_DIST}({\bm{g}}_{{\rm ct},i}^{*},% \operatorname{End}({\bm{f}}_{\rm ref}))italic_d start_POSTSUBSCRIPT italic_i , roman_End end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := typewriter_ST _ typewriter_DIST ( bold_italic_g start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , roman_End ( bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT ) )
6:     if (di,StartεSsuperscriptsubscript𝑑𝑖Startsubscript𝜀𝑆d_{i,\operatorname{Start}}^{*}\leq\varepsilon_{S}italic_d start_POSTSUBSCRIPT italic_i , roman_Start end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT and di,Startdi,Endsuperscriptsubscript𝑑𝑖Startsuperscriptsubscript𝑑𝑖Endd_{i,\operatorname{Start}}^{*}\leq d_{i,\operatorname{End}}^{*}italic_d start_POSTSUBSCRIPT italic_i , roman_Start end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_d start_POSTSUBSCRIPT italic_i , roman_End end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPTthen
7:         𝒇ct,i:=assignsubscriptsuperscript𝒇ct𝑖absent{\bm{f}}^{*}_{{\rm ct},i}:=bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT := snap 𝒇ct,isubscript𝒇ct𝑖{\bm{f}}_{{\rm ct},i}bold_italic_f start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT to Start(𝒇ref)Startsubscript𝒇ref\operatorname{Start}({\bm{f}}_{\rm ref})roman_Start ( bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT )
8:     else if (di,EndεSsuperscriptsubscript𝑑𝑖Endsubscript𝜀𝑆d_{i,\operatorname{End}}^{*}\leq\varepsilon_{S}italic_d start_POSTSUBSCRIPT italic_i , roman_End end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT and di,Enddi,Startsuperscriptsubscript𝑑𝑖Endsuperscriptsubscript𝑑𝑖Startd_{i,\operatorname{End}}^{*}\leq d_{i,\operatorname{Start}}^{*}italic_d start_POSTSUBSCRIPT italic_i , roman_End end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≤ italic_d start_POSTSUBSCRIPT italic_i , roman_Start end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPTthen
9:         𝒇ct,i:=assignsubscriptsuperscript𝒇ct𝑖absent{\bm{f}}^{*}_{{\rm ct},i}:=bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT := snap 𝒇ct,isubscript𝒇ct𝑖{\bm{f}}_{{\rm ct},i}bold_italic_f start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT to End(𝒇ref)Endsubscript𝒇ref\operatorname{End}({\bm{f}}_{\rm ref})roman_End ( bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT )
10:     else 𝒇ct,i:=assignsubscriptsuperscript𝒇ct𝑖absent{\bm{f}}^{*}_{{\rm ct},i}:=bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT := snap 𝒇ct,isubscript𝒇ct𝑖{\bm{f}}_{{\rm ct},i}bold_italic_f start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT to nearest point of 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT      
11:Collate ct:={𝒇ct,1,,𝒇ct,nct}assignsubscriptsuperscriptctsubscriptsuperscript𝒇ct1subscriptsuperscript𝒇ctsubscript𝑛subscriptct\mathcal{F}^{*}_{\rm ct}:=\{{\bm{f}}^{*}_{{\rm ct},1},\dots,{\bm{f}}^{*}_{{\rm ct% },n_{\mathcal{F}_{\rm ct}}}\}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT := { bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct , 1 end_POSTSUBSCRIPT , … , bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct , italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT }

Figure 8(a) is an illustration of ST_SNAP_CAND_TOUCH with the reference ABC𝐴𝐵𝐶ABCitalic_A italic_B italic_C with flow fABC=7subscript𝑓𝐴𝐵𝐶7f_{ABC}=7italic_f start_POSTSUBSCRIPT italic_A italic_B italic_C end_POSTSUBSCRIPT = 7 (blue), the candidate linestring AD𝐴𝐷ADitalic_A italic_D with flow fAD=2subscript𝑓𝐴𝐷2f_{AD}=2italic_f start_POSTSUBSCRIPT italic_A italic_D end_POSTSUBSCRIPT = 2 (orange), and the blending buffer zone with blend tolerance ε=4𝜀4\varepsilon=4italic_ε = 4 m (light blue). The candidate-touching linestring 𝒇ct=(5,DE)subscript𝒇ct5𝐷𝐸{\bm{f}}_{\rm ct}=(5,DE)bold_italic_f start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT = ( 5 , italic_D italic_E ) (purple) touches the candidate AD𝐴𝐷ADitalic_A italic_D at D𝐷Ditalic_D. The orange line is entirely within the pale blue buffer zone around the blue reference line, whereas the purple line extends outside of the buffer zone and so is not a candidate for blending to the reference linestring. If we apply ST_LINEBLEND to blend the candidate linestring, then D𝐷Ditalic_D is projected to Dsuperscript𝐷D^{\prime}italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT to lie on the reference linestring, i.e. 𝒇cand=(2,ABD)superscriptsubscript𝒇cand2𝐴𝐵superscript𝐷{\bm{f}}_{\rm cand}^{*}=(2,ABD^{\prime})bold_italic_f start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( 2 , italic_A italic_B italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ). If we apply ST_SNAP_CAND_TOUCH to the candidate-touching linestring, then as the intersection point D𝐷Ditalic_D is less than εS=4subscript𝜀𝑆4\varepsilon_{S}=4italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 4 m from the boundary points of 𝒇refsubscript𝒇ref{\bm{f}}_{\rm ref}bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT, it is projected to the boundary point C𝐶Citalic_C, i.e. 𝒇ct=(5,CE)subscriptsuperscript𝒇ct5𝐶𝐸{\bm{f}}^{*}_{\rm ct}=(5,CE)bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT = ( 5 , italic_C italic_E ). Thus line blending, candidate-touching snap** and flow aggregation yields that 𝒇ct=(5,CE)subscriptsuperscript𝒇ct5𝐶𝐸{\bm{f}}^{*}_{\rm ct}=(5,CE)bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT = ( 5 , italic_C italic_E ) remains connected to 𝒇ref=(9,ABDC)superscriptsubscript𝒇ref9𝐴𝐵superscript𝐷𝐶{\bm{f}}_{\rm ref}^{*}=(9,ABD^{\prime}C)bold_italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = ( 9 , italic_A italic_B italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_C ).

Refer to caption(a) Refer to caption(b)
Figure 8: Snap candidate-touching linestrings to reference linestring after line blending, with blend tolerance ε=4𝜀4\varepsilon=4italic_ε = 4 m, snap tolerance εS=4subscript𝜀𝑆4\varepsilon_{S}=4italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 4 m. (a) Before line blending and snap**. Reference linestring (7,ABC)7𝐴𝐵𝐶(7,ABC)( 7 , italic_A italic_B italic_C ) in blue, candidate linestring (2,AD)2𝐴𝐷(2,AD)( 2 , italic_A italic_D ) orange, and candidate touching linestring (5,DE)5𝐷𝐸(5,DE)( 5 , italic_D italic_E ) purple. (b) After line blending and snap**. Reference linestring becomes (9,ABDC)9𝐴𝐵superscript𝐷𝐶(9,ABD^{\prime}C)( 9 , italic_A italic_B italic_D start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT italic_C ), and candidate-touching linestring (5,CE)5𝐶𝐸(5,CE)( 5 , italic_C italic_E ).

So far we have only considered line blending where the reference linestring has a single flow. Due to the noisiness of the empirical GPS trajectories, this is insufficient to reach a minimal flow map. So we allow the reference to be a sequence of k𝑘kitalic_k connected edges, each with their own flow. If we treat this edge sequence momentarily as a single linestring, with flow equal to the weighted mean flow, weighted by the length of the k𝑘kitalic_k edges, then we can apply Equation (1) to search for potential candidate linestrings. Due to computational limitations, we retain that candidates be single edges with single flow values. The generalisation to k𝑘kitalic_k-edge reference linestrings allows us to blend candidate linestrings which exceed the buffer zones of 1-edge reference linestrings.

We have described the situation for blending candidate and candidate-touching linestrings to a single reference linestring (possibly composed of k𝑘kitalic_k edges). The next step to determine the priority for line blending for a set of reference linestrings. We require that a reference linestring cannot be a candidate linestring to another reference linestring, and a candidate linestring is a candidate for one reference linestring only. These ensure that if two linestrings 𝒇2𝚂𝚃_𝙱𝚄𝙵𝙵𝙴𝚁(𝒇1,ε)subscript𝒇2𝚂𝚃_𝙱𝚄𝙵𝙵𝙴𝚁subscript𝒇1𝜀{\bm{f}}_{2}\subseteq{\tt ST\_BUFFER}({\bm{f}}_{1},\varepsilon)bold_italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⊆ typewriter_ST _ typewriter_BUFFER ( bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_ε ) and 𝒇1𝚂𝚃_𝙱𝚄𝙵𝙵𝙴𝚁(𝒇2,ε)subscript𝒇1𝚂𝚃_𝙱𝚄𝙵𝙵𝙴𝚁subscript𝒇2𝜀{\bm{f}}_{1}\subseteq{\tt ST\_BUFFER}({\bm{f}}_{2},\varepsilon)bold_italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊆ typewriter_ST _ typewriter_BUFFER ( bold_italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_ε ), then we select only one of them. The flow condition f2f1subscript𝑓2subscript𝑓1f_{2}\leq f_{1}italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ≤ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT usually distinguishes the reference linestring, except for where f1=f2subscript𝑓1subscript𝑓2f_{1}=f_{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. In the situation of equal flow values, then we designate the longer linestring to be the reference.

The inputs to ST_LINEBLEND_PRIORITY in Algorithm 6 are the flow linestrings \mathcal{F}caligraphic_F, the number of connected edges for the reference linestrings k𝑘kitalic_k, and the blend tolerance ε𝜀\varepsilonitalic_ε. The output is a set of non-overlap** reference linestrings refsubscriptref\mathcal{F}_{\rm ref}caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT, a set of unique candidate linestrings candsubscriptcand\mathcal{F}_{\rm cand}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT and a set of (potentially repeated) candidate-touching ctsubscriptct\mathcal{F}_{\rm ct}caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT linestrings. In Steps 1–2, we set up a network graph from the linestrings \mathcal{F}caligraphic_F, and then extract 𝒦𝒦\mathcal{K}caligraphic_K, all simple paths (i.e. all paths composed of connected, unique edges) with k𝑘kitalic_k-edges. In Steps 3–4, we compute the weighted mean of the k𝑘kitalic_k flow values, weighted by the length of individual edges, and assign this to the entire k𝑘kitalic_k-edge path. The function Edges()Edges{\rm Edges}(\cdot)roman_Edges ( ⋅ ) extracts the edges from a k𝑘kitalic_k-edge path. In Step 5, we sort the flow linestrings, in descending order of their flow values and length. In Steps 6–8, we construct a maximal set of non-overlap** k𝑘kitalic_k-edge paths 𝒦superscript𝒦\mathcal{K}^{*}caligraphic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. We initialise 𝒦superscript𝒦\mathcal{K}^{*}caligraphic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to contain the first path. We iterate through 𝒦𝒦\mathcal{K}caligraphic_K and, if this path in 𝒦𝒦\mathcal{K}caligraphic_K does not overlap the current 𝒦superscript𝒦\mathcal{K}^{*}caligraphic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, then we add it to 𝒦superscript𝒦\mathcal{K}^{*}caligraphic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. For k=1𝑘1k=1italic_k = 1, we bypass Steps 2–8. In Steps 9–18, we step through the 𝒌i𝒦superscriptsubscript𝒌𝑖superscript𝒦{\bm{k}}_{i}^{*}\in\mathcal{K}^{*}bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∈ caligraphic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, starting from the linestring with the highest flow and length. In Steps 11–13, if the current flow linestring 𝒌isuperscriptsubscript𝒌𝑖{\bm{k}}_{i}^{*}bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT has no incident edges in the candidate linestrings candsubscriptcand\mathcal{F}_{\rm cand}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT, then we set it to be a reference linestring. In Step 13, we search for candidate linestrings for 𝒌isuperscriptsubscript𝒌𝑖{\bm{k}}_{i}^{*}bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT, from those linestrings which are not already reference or candidate linestrings, according to Equation (1). In Steps 14–17, if there is at least one candidate linestring, then we update ref,candsubscriptrefsubscriptcand\mathcal{F}_{{\rm ref}},\mathcal{F}_{\rm cand}caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT and ctsubscriptct\mathcal{F}_{\rm ct}caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT. In Step 18, we extract the candidate candsubscriptcand\mathcal{F}_{\rm cand}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT and candidate-touching ctsubscriptct\mathcal{F}_{\rm ct}caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT linestring sets, since the reference linestring set is already computed as refsubscriptref\mathcal{F}_{\rm ref}caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT.

Algorithm 6 ST_LINEBLEND_PRIORITY – Compute line blending priority
1:Input: \mathcal{F}caligraphic_F flow linestrings, k𝑘kitalic_k #edges, ε𝜀\varepsilonitalic_ε blend tolerance
2:Output: (ref,cand,ct)subscriptrefsubscriptcandsubscriptct(\mathcal{F}_{\rm ref},\mathcal{F}_{\rm cand},\mathcal{F}_{\rm ct})( caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT ) reference, candidate, candidate-touching linestrings
3:Initialise network graph 𝒩superscript𝒩\mathcal{N}^{*}caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT from linestrings \mathcal{F}caligraphic_F
4:Extract all simple paths of length k𝑘kitalic_k edges 𝒦:={𝒌1,,𝒌n𝒦}assign𝒦subscript𝒌1subscript𝒌subscript𝑛𝒦\mathcal{K}:=\{{\bm{k}}_{1},\dots,{\bm{k}}_{n_{\mathcal{K}}}\}caligraphic_K := { bold_italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_k start_POSTSUBSCRIPT italic_n start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT end_POSTSUBSCRIPT } from 𝒩superscript𝒩\mathcal{N}^{*}caligraphic_N start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
5:for i:=1assign𝑖1i:=1italic_i := 1 to n𝒦subscript𝑛𝒦n_{\mathcal{K}}italic_n start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT do
6:     fisubscript𝑓𝑖f_{i}italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT := weighted mean of k𝑘kitalic_k flows, weight := len(Edges(𝒌isubscript𝒌𝑖{\bm{k}}_{i}bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT))
7:Sort 𝒦𝒦\mathcal{K}caligraphic_K by descending flow and combined length
8:Initialise 𝒦:={𝒌1}assignsuperscript𝒦subscript𝒌1\mathcal{K}^{*}:=\{{\bm{k}}_{1}\}caligraphic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := { bold_italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT }
9:for i:=2assign𝑖2i:=2italic_i := 2 to n𝒦subscript𝑛𝒦n_{\mathcal{K}}italic_n start_POSTSUBSCRIPT caligraphic_K end_POSTSUBSCRIPT do
10:     if (𝒌i𝒦={}subscript𝒌𝑖superscript𝒦{\bm{k}}_{i}\cap\mathcal{K}^{*}=\{\}bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∩ caligraphic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = { }then 𝒦:=𝒦{𝒌i}assignsuperscript𝒦superscript𝒦subscript𝒌𝑖\mathcal{K}^{*}:=\mathcal{K}^{*}\cup\{{\bm{k}}_{i}\}caligraphic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := caligraphic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ { bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT }      
11:Initialise 𝒬:=ref:=cand:=ct:={}assign𝒬subscriptrefassignsubscriptcandassignsubscriptctassign\mathcal{Q}:=\mathcal{F}_{\rm ref}:=\mathcal{F}_{\rm cand}:=\mathcal{F}_{\rm ct% }:=\{\}caligraphic_Q := caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT := caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT := caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT := { }
12:for i:=1assign𝑖1i:=1italic_i := 1 to n𝒦subscript𝑛superscript𝒦n_{\mathcal{K}^{*}}italic_n start_POSTSUBSCRIPT caligraphic_K start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT do
13:     if (Edges(𝒌i)candEdgessuperscriptsubscript𝒌𝑖subscriptcand{\rm Edges}({\bm{k}}_{i}^{*})\notin\mathcal{F}_{\rm cand}roman_Edges ( bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) ∉ caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPTthen
14:         Set reference linestring 𝒌ref:=𝒌iassignsubscript𝒌refsuperscriptsubscript𝒌𝑖{\bm{k}}_{{\rm ref}}:={\bm{k}}_{i}^{*}bold_italic_k start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT := bold_italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
15:         Search candidate linestrings cand:={𝒇\(Edges(ref){𝒌ref}cand):𝒇\mathcal{F}_{{\rm cand}}^{*}:=\{{\bm{f}}\in\mathcal{F}\backslash({\rm Edges}(% \mathcal{F}_{\rm ref})\cup\{{\bm{k}}_{\rm ref}\}\cup\mathcal{F}_{\rm cand}):{% \bm{f}}\subseteqcaligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := { bold_italic_f ∈ caligraphic_F \ ( roman_Edges ( caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT ) ∪ { bold_italic_k start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT } ∪ caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT ) : bold_italic_f ⊆
16:    𝚂𝚃_𝙱𝚄𝙵𝙵𝙴𝚁(𝒌ref,ε),ffref}{\tt ST\_BUFFER}({\bm{k}}_{\rm ref},\varepsilon),f\leq f_{\rm ref}\}typewriter_ST _ typewriter_BUFFER ( bold_italic_k start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT , italic_ε ) , italic_f ≤ italic_f start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT }
17:         if cand{}superscriptsubscriptcand\mathcal{F}_{\rm cand}^{*}\neq\{\}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ≠ { } then
18:              Update ref:=ref{𝒌ref},cand:=candcandformulae-sequenceassignsubscriptrefsubscriptrefsubscript𝒌refassignsubscriptcandsubscriptcandsuperscriptsubscriptcand\mathcal{F}_{\rm ref}:=\mathcal{F}_{\rm ref}\cup\{{\bm{k}}_{\rm ref}\},% \mathcal{F}_{\rm cand}:=\mathcal{F}_{\rm cand}\cup\mathcal{F}_{\rm cand}^{*}caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT := caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT ∪ { bold_italic_k start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT } , caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT := caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT ∪ caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT
19:              Search touches ct:={𝒇\{Edges(ref)cand}:𝚂𝚃_𝚃𝙾𝚄𝙲𝙷𝙴𝚂(cand,𝒇)}assignsubscriptctconditional-set𝒇\Edgessubscriptrefsubscriptcand𝚂𝚃_𝚃𝙾𝚄𝙲𝙷𝙴𝚂superscriptsubscriptcand𝒇\mathcal{F}_{{\rm ct}}:=\{{\bm{f}}\in\mathcal{F}\backslash\{{\rm Edges}(% \mathcal{F}_{\rm ref})\ \cup\mathcal{F}_{\rm cand}\}:{\tt ST\_TOUCHES}(% \mathcal{F}_{{\rm cand}}^{*},{\bm{f}})\}caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT := { bold_italic_f ∈ caligraphic_F \ { roman_Edges ( caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT ) ∪ caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT } : typewriter_ST _ typewriter_TOUCHES ( caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , bold_italic_f ) }
20:              Update 𝒬:=𝒬{({𝒌ref},cand,ct)}assign𝒬𝒬subscript𝒌refsuperscriptsubscriptcandsuperscriptsubscriptct\mathcal{Q}:=\mathcal{Q}\cup\{(\{{\bm{k}}_{\rm ref}\},\mathcal{F}_{\rm cand}^{% *},\mathcal{F}_{\rm ct}^{*})\}caligraphic_Q := caligraphic_Q ∪ { ( { bold_italic_k start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT } , caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) }               
21:Extract cand:={cand,1,,cand,n𝒬},ct:={ct,1,,ct,n𝒬}formulae-sequenceassignsubscriptcandsubscriptsuperscriptcand1subscriptsuperscriptcandsubscript𝑛𝒬assignsubscriptctsubscriptsuperscriptct1subscriptsuperscriptctsubscript𝑛𝒬\mathcal{F}_{\rm cand}:=\{\mathcal{F}^{*}_{{\rm cand},1},\dots,\mathcal{F}^{*}% _{{\rm cand},n_{\mathcal{Q}}}\},\mathcal{F}_{\rm ct}:=\{\mathcal{F}^{*}_{{\rm ct% },1},\dots,\mathcal{F}^{*}_{{\rm ct},n_{\mathcal{Q}}}\}caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT := { caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_cand , 1 end_POSTSUBSCRIPT , … , caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_cand , italic_n start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT end_POSTSUBSCRIPT } , caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT := { caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct , 1 end_POSTSUBSCRIPT , … , caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ct , italic_n start_POSTSUBSCRIPT caligraphic_Q end_POSTSUBSCRIPT end_POSTSUBSCRIPT } from 𝒬𝒬\mathcal{Q}caligraphic_Q

The local alignment of road segments, in Algorithm 7, is established by combining Algorithms 26. Its inputs are the flow linestrings \mathcal{F}caligraphic_F, the split node type S𝑆Sitalic_S, the number of connected edges for the reference linestrings k𝑘kitalic_k, the blend tolerance ε𝜀\varepsilonitalic_ε and the snap tolerance εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT. In Step 1, we node split the flow linestrings \mathcal{F}caligraphic_F. Step 2, we node snap the flow linestrings. In Step 3, we set up the priority for the line blending, and in Step 4 we store all the linestrings that will not be modified by the line blending in csuperscript𝑐\mathcal{F}^{c}caligraphic_F start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT. In Steps 5–8, we iterate over each reference linestring. In Step 6, we update the reference and candidate flow linestrings by blending the candidate linestrings. In Step 7, we apply ST_OVERLINE_PLANR within each of the k𝑘kitalic_k-edges of the updated reference linestrings. In Step 8, we compute the weighted mean flow, weighted by the length of the corresponding sub-segments, and assign it as the flow value to all sub-segments. The result from Steps 6–8 is a modified k𝑘kitalic_k-edge reference linestring with k𝑘kitalic_k aggregated flows. In Step 9, we snap the candidate-touching linestrings to the reference linestring to maintain connectivity. In Steps 10–11, we collate the modified linestrings from Steps 5–9, with the unmodified linestrings csuperscript𝑐\mathcal{F}^{c}caligraphic_F start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT from Step 4, and sort them in descending flow and length.

Algorithm 7 ST_OVERLINE_LINEBLEND – Locally align road segment flows using line blending
1:Input: \mathcal{F}caligraphic_F flow linestrings, S𝑆Sitalic_S split node, k𝑘kitalic_k #edges, ε𝜀\varepsilonitalic_ε blend tolerance, εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT snap tolerance
2:Output: superscript\mathcal{F}^{*}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT aggregated aligned flow linestrings
3::=𝚂𝚃_𝚂𝙿𝙻𝙸𝚃𝙽𝙾𝙳𝙴(,S)assign𝚂𝚃_𝚂𝙿𝙻𝙸𝚃𝙽𝙾𝙳𝙴𝑆\mathcal{F}:={\tt ST\_SPLITNODE}(\mathcal{F},S)caligraphic_F := typewriter_ST _ typewriter_SPLITNODE ( caligraphic_F , italic_S )
4::=𝚂𝚃_𝚂𝙽𝙰𝙿𝙽𝙾𝙳𝙴(,εS)assign𝚂𝚃_𝚂𝙽𝙰𝙿𝙽𝙾𝙳𝙴subscript𝜀𝑆\mathcal{F}:={\tt ST\_SNAPNODE}(\mathcal{F},\varepsilon_{S})caligraphic_F := typewriter_ST _ typewriter_SNAPNODE ( caligraphic_F , italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT )
5:(ref,cand,ct):=𝚂𝚃_𝙻𝙸𝙽𝙴𝙱𝙻𝙴𝙽𝙳_𝙿𝚁𝙸𝙾𝚁𝙸𝚃𝚈(,k,ε)assignsubscriptrefsubscriptcandsubscriptct𝚂𝚃_𝙻𝙸𝙽𝙴𝙱𝙻𝙴𝙽𝙳_𝙿𝚁𝙸𝙾𝚁𝙸𝚃𝚈𝑘𝜀(\mathcal{F}_{\rm ref},\mathcal{F}_{\rm cand},\mathcal{F}_{\rm ct}):={\tt ST\_% LINEBLEND\_PRIORITY}(\mathcal{F},k,\varepsilon)( caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT ) := typewriter_ST _ typewriter_LINEBLEND _ typewriter_PRIORITY ( caligraphic_F , italic_k , italic_ε )
6:c:=\(Edges(ref)candct)assignsuperscript𝑐\Edgessubscriptrefsubscriptcandsubscriptct\mathcal{F}^{c}:=\mathcal{F}\backslash({\rm Edges}(\mathcal{F}_{\rm ref})\cup% \mathcal{F}_{\rm cand}\cup\mathcal{F}_{\rm ct})caligraphic_F start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT := caligraphic_F \ ( roman_Edges ( caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT ) ∪ caligraphic_F start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT ∪ caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT )
7:for i:=1assign𝑖1i:=1italic_i := 1 to nrefsubscript𝑛subscriptrefn_{\mathcal{F}_{\rm ref}}italic_n start_POSTSUBSCRIPT caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT end_POSTSUBSCRIPT do
8:     (𝒇ref,i,cand,i):=𝚂𝚃_𝙻𝙸𝙽𝙴𝙱𝙻𝙴𝙽𝙳(𝒇ref,i,cand,i,ε)assignsuperscriptsubscript𝒇ref𝑖superscriptsubscriptcand𝑖𝚂𝚃_𝙻𝙸𝙽𝙴𝙱𝙻𝙴𝙽𝙳subscript𝒇ref𝑖subscriptcand𝑖𝜀({\bm{f}}_{{\rm ref},i}^{*},\mathcal{F}_{{\rm cand},i}^{*}):={\tt ST\_% LINEBLEND}({\bm{f}}_{{\rm ref},i},\mathcal{F}_{{\rm cand},i},\varepsilon)( bold_italic_f start_POSTSUBSCRIPT roman_ref , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_F start_POSTSUBSCRIPT roman_cand , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) := typewriter_ST _ typewriter_LINEBLEND ( bold_italic_f start_POSTSUBSCRIPT roman_ref , italic_i end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT roman_cand , italic_i end_POSTSUBSCRIPT , italic_ε )
9:     𝒇ref,i:=𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝙻𝙰𝙽𝚁(𝒇ref,icand,i)assignsuperscriptsubscript𝒇ref𝑖𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝙻𝙰𝙽𝚁superscriptsubscript𝒇ref𝑖superscriptsubscriptcand𝑖{\bm{f}}_{{\rm ref},i}^{*}:={\tt ST\_OVERLINE\_PLANR}({\bm{f}}_{{\rm ref},i}^{% *}\cup\mathcal{F}_{{\rm cand},i}^{*})bold_italic_f start_POSTSUBSCRIPT roman_ref , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := typewriter_ST _ typewriter_OVERLINE _ typewriter_PLANR ( bold_italic_f start_POSTSUBSCRIPT roman_ref , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ caligraphic_F start_POSTSUBSCRIPT roman_cand , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT )
10:     fref,isubscriptsuperscript𝑓ref𝑖f^{*}_{{\rm ref},i}italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ref , italic_i end_POSTSUBSCRIPT := weighted mean of 𝒇ref,isubscriptsuperscript𝒇ref𝑖{\bm{f}}^{*}_{{\rm ref},i}bold_italic_f start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_ref , italic_i end_POSTSUBSCRIPT, weight :=len(Edges(𝒇ref,i))assignabsentlenEdgessuperscriptsubscript𝒇ref𝑖:=\mathrm{len}({\rm Edges}({\bm{f}}_{{\rm ref},i}^{*})):= roman_len ( roman_Edges ( bold_italic_f start_POSTSUBSCRIPT roman_ref , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ) )
11:     ct,i:=𝚂𝚃_𝚂𝙽𝙰𝙿_𝙲𝙰𝙽𝙳_𝚃𝙾𝚄𝙲𝙷(𝒇ref,i,cand,i,ct,i,εS)assignsuperscriptsubscriptct𝑖𝚂𝚃_𝚂𝙽𝙰𝙿_𝙲𝙰𝙽𝙳_𝚃𝙾𝚄𝙲𝙷superscriptsubscript𝒇ref𝑖subscriptcand𝑖subscriptct𝑖subscript𝜀𝑆\mathcal{F}_{{\rm ct},i}^{*}:={\tt ST\_SNAP\_CAND\_TOUCH}({\bm{f}}_{{\rm ref},% i}^{*},\mathcal{F}_{{\rm cand},i},\mathcal{F}_{{\rm ct},i},\varepsilon_{S})caligraphic_F start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := typewriter_ST _ typewriter_SNAP _ typewriter_CAND _ typewriter_TOUCH ( bold_italic_f start_POSTSUBSCRIPT roman_ref , italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT , caligraphic_F start_POSTSUBSCRIPT roman_cand , italic_i end_POSTSUBSCRIPT , caligraphic_F start_POSTSUBSCRIPT roman_ct , italic_i end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT )
12::=refctcassignsuperscriptsuperscriptsubscriptrefsuperscriptsubscriptctsuperscript𝑐\mathcal{F}^{*}:=\mathcal{F}_{\rm ref}^{*}\cup\mathcal{F}_{\rm ct}^{*}\cup% \mathcal{F}^{c}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT := caligraphic_F start_POSTSUBSCRIPT roman_ref end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ caligraphic_F start_POSTSUBSCRIPT roman_ct end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ∪ caligraphic_F start_POSTSUPERSCRIPT italic_c end_POSTSUPERSCRIPT
13:Sort superscript\mathcal{F}^{*}caligraphic_F start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT in descending order of flow and length

We outline the overall workflow ST_OVERLINE to compute a minimal flow map, by combining pre- and post-processing with the iteration of local alignment. For Algorithm 8, the inputs are the map matched routes \mathcal{M}caligraphic_M, the split node type S𝑆Sitalic_S, the blend tolerance ε𝜀\varepsilonitalic_ε, the simplify tolerance εDsubscript𝜀𝐷\varepsilon_{D}italic_ε start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT, the snap tolerance εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT, the number of connected edges in reference linestrings 𝒌𝒌{\bm{k}}bold_italic_k, and the maximum number of iterations jmaxsubscript𝑗j_{\max}italic_j start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT. In Step 1, we apply ST_OVERLINE_PLANR to the map matched routes to produce an initial flow map \mathcal{F}caligraphic_F. In Step 2, we apply ST_SPLITNODE to this flow map, because applying ST_SPLITNODE on a flow map is more robust than on the map matched routes. In Step 3, we employ the standard ST_SIMPLIFY with simplify tolerance εDsubscript𝜀𝐷\varepsilon_{D}italic_ε start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT. These simplified linestrings are modified so that all modified segments are at most εDsubscript𝜀𝐷\varepsilon_{D}italic_ε start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT distance from the unmodified segments (Ramer, 1972, Douglas and Peucker, 1973). These simplified linestrings usually lead to more overlap** segments, which assist the flow aggregation in Step 4. Steps 5–10 is the iteration of ST_OVERLINE_LINEBLEND for the k𝑘kitalic_k-edge reference linestrings. For each k𝑘kitalic_k in 𝒌𝒌{\bm{k}}bold_italic_k, we iterate until the maximum number of iterations jmaxsubscript𝑗j_{\max}italic_j start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT is reached or two consecutive flow maps are identical. Within each iteration, we search for the k𝑘kitalic_k-edge reference linestrings, blend the candidate linestrings, and snap the candidate-touching linestrings, and compute the flow aggregation. In Steps 11–14 is some housekee** in ST_OVERLINE_PRUNE, where we remove pseudo nodes, and replace the incident edges with a concatenated edge with the weighted mean flow, as well as some loops and leaf edges with low flow values.

Algorithm 8 ST_OVERLINE – Compute locally aligned flow map from map matched routes
1:Input: \mathcal{M}caligraphic_M map matched routes, S𝑆Sitalic_S split node, εDsubscript𝜀𝐷\varepsilon_{D}italic_ε start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT simplify tolerance, 𝒌𝒌{\bm{k}}bold_italic_k #edges, jmaxsubscript𝑗j_{\max}italic_j start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT max #iterations, ε𝜀\varepsilonitalic_ε blend tolerance, εSsubscript𝜀𝑆\varepsilon_{S}italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT snap tolerance
2:Output: \mathcal{F}caligraphic_F aligned road segment flows
3::=𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝙻𝙰𝙽𝚁()assign𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝙻𝙰𝙽𝚁\mathcal{F}:={\tt ST\_OVERLINE\_PLANR}(\mathcal{M})caligraphic_F := typewriter_ST _ typewriter_OVERLINE _ typewriter_PLANR ( caligraphic_M );
4::=𝚂𝚃_𝚂𝙿𝙻𝙸𝚃𝙽𝙾𝙳𝙴(,S)assign𝚂𝚃_𝚂𝙿𝙻𝙸𝚃𝙽𝙾𝙳𝙴𝑆\mathcal{F}:={\tt ST\_SPLITNODE}(\mathcal{F},S)caligraphic_F := typewriter_ST _ typewriter_SPLITNODE ( caligraphic_F , italic_S )
5::=𝚂𝚃_𝚂𝙸𝙼𝙿𝙻𝙸𝙵𝚈(,εD)assign𝚂𝚃_𝚂𝙸𝙼𝙿𝙻𝙸𝙵𝚈subscript𝜀𝐷\mathcal{F}:={\tt ST\_SIMPLIFY}(\mathcal{F},\varepsilon_{D})caligraphic_F := typewriter_ST _ typewriter_SIMPLIFY ( caligraphic_F , italic_ε start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT )
6::=𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝙻𝙰𝙽𝚁()assign𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝙻𝙰𝙽𝚁\mathcal{F}:={\tt ST\_OVERLINE\_PLANR}(\mathcal{F})caligraphic_F := typewriter_ST _ typewriter_OVERLINE _ typewriter_PLANR ( caligraphic_F )
7:for k𝑘kitalic_k in 𝒌𝒌{\bm{k}}bold_italic_k do
8:     prev:={}assignsubscriptprev\mathcal{F}_{\rm prev}:=\{\}caligraphic_F start_POSTSUBSCRIPT roman_prev end_POSTSUBSCRIPT := { }; j:=0assign𝑗0j:=0italic_j := 0
9:     while j<jmax𝑗subscript𝑗j<j_{\max}italic_j < italic_j start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT and prevsubscriptprev\mathcal{F}_{\rm prev}\neq\mathcal{F}caligraphic_F start_POSTSUBSCRIPT roman_prev end_POSTSUBSCRIPT ≠ caligraphic_F do
10:         prev:=assignsubscriptprev\mathcal{F}_{\rm prev}:=\mathcal{F}caligraphic_F start_POSTSUBSCRIPT roman_prev end_POSTSUBSCRIPT := caligraphic_F; j:=j+1assign𝑗𝑗1j:=j+1italic_j := italic_j + 1
11:         :=𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙻𝙸𝙽𝙴𝙱𝙻𝙴𝙽𝙳(,k,ε,εS)assign𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙻𝙸𝙽𝙴𝙱𝙻𝙴𝙽𝙳𝑘𝜀subscript𝜀𝑆\mathcal{F}:={\tt ST\_OVERLINE\_LINEBLEND}(\mathcal{F},k,\varepsilon,% \varepsilon_{S})caligraphic_F := typewriter_ST _ typewriter_OVERLINE _ typewriter_LINEBLEND ( caligraphic_F , italic_k , italic_ε , italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT )
12:         :=𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝙻𝙰𝙽𝚁()assign𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝙻𝙰𝙽𝚁\mathcal{F}:={\tt ST\_OVERLINE\_PLANR}(\mathcal{F})caligraphic_F := typewriter_ST _ typewriter_OVERLINE _ typewriter_PLANR ( caligraphic_F )      
13:prev:={}assignsubscriptprev\mathcal{F}_{\rm prev}:=\{\}caligraphic_F start_POSTSUBSCRIPT roman_prev end_POSTSUBSCRIPT := { }; j:=0assign𝑗0j:=0italic_j := 0
14:while j<jmax𝑗subscript𝑗j<j_{\max}italic_j < italic_j start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT and prevsubscriptprev\mathcal{F}_{\rm prev}\neq\mathcal{F}caligraphic_F start_POSTSUBSCRIPT roman_prev end_POSTSUBSCRIPT ≠ caligraphic_F do
15:     prev:=assignsubscriptprev\mathcal{F}_{\rm prev}:=\mathcal{F}caligraphic_F start_POSTSUBSCRIPT roman_prev end_POSTSUBSCRIPT := caligraphic_F; j:=j+1assign𝑗𝑗1j:=j+1italic_j := italic_j + 1
16:     :=𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝚁𝚄𝙽𝙴()assign𝚂𝚃_𝙾𝚅𝙴𝚁𝙻𝙸𝙽𝙴_𝙿𝚁𝚄𝙽𝙴\mathcal{F}:={\tt ST\_OVERLINE\_PRUNE}(\mathcal{F})caligraphic_F := typewriter_ST _ typewriter_OVERLINE _ typewriter_PRUNE ( caligraphic_F )

4 Results

In this section we compute a minimal flow map for the Hannover GPS trajectories. From the complete set of 1183 trajectories, we keep 1177 trajectories with length greater than 100100100100 m. We input the 1177 trajectories into ST_ROUTE with M𝑀Mitalic_M as the map matching and R𝑅Ritalic_R the route finding APIs from the Valhalla routing engine, and 𝒏W=3,13,23,33,43,63,83subscript𝒏𝑊3132333436383{\bm{n}}_{W}=3,13,23,33,43,63,83bold_italic_n start_POSTSUBSCRIPT italic_W end_POSTSUBSCRIPT = 3 , 13 , 23 , 33 , 43 , 63 , 83 waypoints. We employ the dockerised version 3.4.0 of the Valhalla routing engine (GIS OPS, 2023). Of these input trajectories, 1147 yield a sufficiently high quality match, where the Hausdorff distance dHaus(M(G),G)<100subscript𝑑Haussuperscript𝑀𝐺𝐺100d_{\operatorname{Haus}}(M^{*}(G),G)<100italic_d start_POSTSUBSCRIPT roman_Haus end_POSTSUBSCRIPT ( italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) , italic_G ) < 100 m, and the ratio len(M(G))/len(G)<1.1lensuperscript𝑀𝐺len𝐺1.1\operatorname{len}(M^{*}(G))/\operatorname{len}(G)<1.1roman_len ( italic_M start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( italic_G ) ) / roman_len ( italic_G ) < 1.1. We continue the analysis with these 1147 map matched routes.

For all iterations, we set jmax=20subscript𝑗20j_{\max}=20italic_j start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 20. For the line blending, we begin with 1 iteration of ST_OVERLINE (Steps 1–4) with simplify tolerance εD=1msubscript𝜀𝐷1m\varepsilon_{D}=1~{}\mathrm{m}italic_ε start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = 1 roman_m. We follow with ST_OVERLINE (Steps 5–14) with blend and snap tolerances ε=εS=4m𝜀subscript𝜀𝑆4m\varepsilon=\varepsilon_{S}=4~{}\mathrm{m}italic_ε = italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 4 roman_m, #edges 𝒌=1,2𝒌12{\bm{k}}=1,2bold_italic_k = 1 , 2, and node split type S=𝑆absentS=italic_S = ‘subdivision’. These simplify, blend and snap tolerances were chosen heuristically as a trade-off between being sufficiently large to account for the noisiness of the GPS trajectories and the map matching/route finding APIs, whilst not being too large to obscure separate road segments within region with a dense road network. We set 𝒌=1,2𝒌12{\bm{k}}=1,2bold_italic_k = 1 , 2 since the search for connected k𝑘kitalic_k-edges with k>2𝑘2k>2italic_k > 2 is too computationally intensive for our setup due to the large number of road segments (13 495). We set S=𝑆absentS=italic_S = ‘subdivision’ node splitting, as it provides more stable line blending priority at this early stage. We follow with 1 iteration of ST_OVERLINE (Steps 5–14), with the same tuning parameter choices, except with S=𝑆absentS=italic_S = ‘unary’. ‘Unary’ node splitting is usually applied after an iteration of ‘subdivision’ node splitting since the former can now add any missing intersection nodes without adversely affecting the line blending priority. We end with 2 iterations of ST_OVERLINE (Steps 5–14), with ε=εS=5m,𝒌=1,2,3,4,S=\varepsilon=\varepsilon_{S}=5~{}\mathrm{m},{\bm{k}}=1,2,3,4,S=italic_ε = italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 5 roman_m , bold_italic_k = 1 , 2 , 3 , 4 , italic_S = ‘unary’, since the larger values for ε,εS𝜀subscript𝜀𝑆\varepsilon,\varepsilon_{S}italic_ε , italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT capture any line blending missed at 4 m, and the connected 3-, 4-edge searches become computationally feasible with the lower number of road segments. These 5 iterations of ST_OVERLINE converge to a minimal flow map with 1 413 road segments, i.e. a 89.5% reduction of the 13 495 segments in the initial flow map.

Figure 9 illustrates the results of these iterations. In Figure 9(a) are the GPS trajectories (green circles) and their map matched routes resulting from ST_ROUTE (blue lines). Whilst the map matched routes no longer completely obscure the road network, they remain misaligned to each other and to the road network. So it is not possible to accurately estimate the traffic flow map directly from them. In Figure 9(b) is a flow map resulting from ST_OVERLINE (Steps 1–4) with S=𝑆absentS=italic_S = ‘subdivision’. This is similar to the flow map that would be obtained by following Morgan and Lovelace (2021) without rasterisation. Since each linestring is labelled by its flow value, the crowding of the labels indicates that this is unlikely to be a minimal flow map. In Figure 9(c), we complete the line blending iteration ST_OVERLINE (Steps 5–14) with S=𝑆absentS=italic_S = ‘subdivision’. The crowding of the flow value labels is reduced so we are progressing to a minimal flow map. In Figure 9(d), we carry out a further 3 line blending iterations ST_OVERLINE (Steps 5–14) with S=𝑆absentS=italic_S = ‘unary’ to arrive at a minimal flow map.

Refer to caption(a) Refer to caption(b)
Refer to caption(c) Refer to caption(d)
Figure 9: Iterations to compute locally aligned flow map. (a) Empirical GPS trajectories (green circles), map matched routes from ST_ROUTE (blue lines). (b) Initial flow map from ST_OVERLINE (1–4), S=𝑆absentS=italic_S = ‘subdivision’. (c) Intermediate flow map from 1 iteration of ST_OVERLINE (5–14), S=𝑆absentS=italic_S = ‘subdivision’. (d) Minimal flow map from 3 further iterations of ST_OVERLINE (5–14), S=𝑆absentS=italic_S = ‘unary’. Label is traffic flow. Colour (purple to orange) of road segments is proportional to traffic flow.

4.1 Validation

A visual inspection of the minimal flow map in Figure 9(d) reveals good alignment to the OSM road network in general, though there remain some data artefacts elsewhere. For example, in Figure 10(a), the green GPS trajectories give no indication of the blue wonky route output by ST_ROUTE, and in Figure 10(c), the GPS trajectories do not involve a loop, though ST_ROUTE contains a loop. Since our local alignment does not use an external road network to correct these wonky or extraneous loops in the map matched routes, they are propagated to the flow maps in Figure 10(a, d). Overall, these type of data artefacts are small and infrequent, and are associated with road segments with low flow.

Refer to caption(a) Refer to caption(b)
Refer to caption(a) Refer to caption(b)
Figure 10: Data artefacts in locally aligned road segment flows. (a–b) Wonky road segments. (c–d) Extraneous road segments. (a, c) GPS trajectories (green circles), map matched routes (blue lines). (b, d) Colour (purple to orange) of road segments is proportional to traffic flow.

For a more quantitative validation of the accuracy of our proposed flow map, we require a gold standard reference flow map. The experimental design of the Hannover GPS trajectories is to serve as a reference data set for learning turning rules at road junctions, rather than as a reference flow map (Zourlidou et al., 2022). So we have to compute a proxy reference flow map. For this, we first compute line transects 𝒯i={𝒕i,1,,𝒕i,n𝒯i}=𝚂𝚃_𝚃𝚁𝙰𝙽𝚂𝙴𝙲𝚃(𝒇i,εT,δT)subscript𝒯𝑖subscript𝒕𝑖1subscript𝒕𝑖subscript𝑛subscript𝒯𝑖𝚂𝚃_𝚃𝚁𝙰𝙽𝚂𝙴𝙲𝚃subscript𝒇𝑖subscript𝜀𝑇subscript𝛿𝑇\mathcal{T}_{i}=\{{\bm{t}}_{i,1},\dots,{\bm{t}}_{i,n_{\mathcal{T}_{i}}}\}={\tt ST% \_TRANSECT}({\bm{f}}_{i},\varepsilon_{T},\delta_{T})caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = { bold_italic_t start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT , … , bold_italic_t start_POSTSUBSCRIPT italic_i , italic_n start_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT } = typewriter_ST _ typewriter_TRANSECT ( bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_ε start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT , italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) for a road linestring 𝒇isubscript𝒇𝑖{\bm{f}}_{i}\in\mathcal{F}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_F, where a line transect is an orthogonal line segment (of length 2εT2subscript𝜀𝑇2\varepsilon_{T}2 italic_ε start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT m) to 𝒇isubscript𝒇𝑖{\bm{f}}_{i}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. If len(𝒇i)>δTlensubscript𝒇𝑖subscript𝛿𝑇\operatorname{len}({\bm{f}}_{i})>\delta_{T}roman_len ( bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) > italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT m, then these transects are placed at every δTsubscript𝛿𝑇\delta_{T}italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT m of 𝒇isubscript𝒇𝑖{\bm{f}}_{i}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, or if len(𝒇i)δTlensubscript𝒇𝑖subscript𝛿𝑇\operatorname{len}({\bm{f}}_{i})\leq\delta_{T}roman_len ( bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ≤ italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT m then the single transect is subtended from the centroid of 𝒇isubscript𝒇𝑖{\bm{f}}_{i}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The empirical proxy flows are the number of intersection points between the map matched routes from \mathcal{M}caligraphic_M which intersect the line transects fi,jemp=#{𝚂𝚃_𝙸𝙽𝚃𝙴𝚁𝚂𝙴𝙲𝚃𝙸𝙾𝙽(,𝒕i,j):𝒕i,j𝒯i},j=1,,n𝒯iformulae-sequencesuperscriptsubscript𝑓𝑖𝑗emp#conditional-set𝚂𝚃_𝙸𝙽𝚃𝙴𝚁𝚂𝙴𝙲𝚃𝙸𝙾𝙽subscript𝒕𝑖𝑗subscript𝒕𝑖𝑗subscript𝒯𝑖𝑗1subscript𝑛subscript𝒯𝑖f_{i,j}^{\mathrm{emp}}=\#\{{\tt ST\_INTERSECTION}(\mathcal{M},{\bm{t}}_{i,j}):% {\bm{t}}_{i,j}\in\mathcal{T}_{i}\},j=1,\dots,n_{\mathcal{T}_{i}}italic_f start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_emp end_POSTSUPERSCRIPT = # { typewriter_ST _ typewriter_INTERSECTION ( caligraphic_M , bold_italic_t start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) : bold_italic_t start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∈ caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } , italic_j = 1 , … , italic_n start_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT, and the mean discrepancy over all line transects of 𝒇isubscript𝒇𝑖{\bm{f}}_{i}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is f¯iemp=mean{fi,jemp,j=1,,n𝒯i}\bar{f}_{i}^{\mathrm{emp}}=\mathrm{mean}\{f_{i,j}^{\mathrm{emp}},j=1,\dots,n_{% \mathcal{T}_{i}}\}over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_emp end_POSTSUPERSCRIPT = roman_mean { italic_f start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_emp end_POSTSUPERSCRIPT , italic_j = 1 , … , italic_n start_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT }. Our error measure of 𝒇isubscript𝒇𝑖{\bm{f}}_{i}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the absolute difference Erri,j=|fif¯iemp|subscriptErr𝑖𝑗subscript𝑓𝑖superscriptsubscript¯𝑓𝑖emp\operatorname{Err}_{i,j}=|f_{i}-\bar{f}_{i}^{\mathrm{emp}}|roman_Err start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = | italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_emp end_POSTSUPERSCRIPT | for all line transects 𝒕j,j=1,,n𝒯iformulae-sequencesubscript𝒕𝑗𝑗1subscript𝑛subscript𝒯𝑖{\bm{t}}_{j},j=1,\dots,n_{\mathcal{T}_{i}}bold_italic_t start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_j = 1 , … , italic_n start_POSTSUBSCRIPT caligraphic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT which intersect 𝒇isubscript𝒇𝑖{\bm{f}}_{i}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. We also compute the relative error RErri,j=Erri,j/fisubscriptRErr𝑖𝑗subscriptErr𝑖𝑗subscript𝑓𝑖\operatorname{RErr}_{i,j}=\operatorname{Err}_{i,j}/f_{i}roman_RErr start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = roman_Err start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT / italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, which is well-defined since fi1subscript𝑓𝑖1f_{i}\geq 1italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ≥ 1 for all 𝒇isubscript𝒇𝑖{\bm{f}}_{i}\in\mathcal{F}bold_italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_F.

With the tuning parameters tolerance εT=5subscript𝜀𝑇5\varepsilon_{T}=5italic_ε start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = 5 m and δT=50subscript𝛿𝑇50\delta_{T}=50italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = 50 m, then we compute 7 380 line transects. The colour (purple to orange) of the line transects is proportional to the absolute error. In Figure 11(a), the vast majority of the error values are low (purple), with only a few high error values (orange), in/near the solid black rectangle. The zoom of the black rectangle is given in Figure 11(b), along with the road segment flows in blue. The road segment 𝒇11subscript𝒇11{\bm{f}}_{11}bold_italic_f start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT (thick blue segment) with the estimated flow is f11=238subscript𝑓11238f_{11}=238italic_f start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = 238, and the empirical proxy flow f¯11emp=254superscriptsubscript¯𝑓11emp254\bar{f}_{11}^{\mathrm{emp}}=254over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_emp end_POSTSUPERSCRIPT = 254, has a high absolute error, i.e. Err11=16,RErr11=0.07formulae-sequencesubscriptErr1116subscriptRErr110.07\operatorname{Err}_{11}=16,\operatorname{RErr}_{11}=0.07roman_Err start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = 16 , roman_RErr start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = 0.07. The road segment 𝒇331subscript𝒇331{\bm{f}}_{331}bold_italic_f start_POSTSUBSCRIPT 331 end_POSTSUBSCRIPT with f331=25,f¯331emp=12formulae-sequencesubscript𝑓33125superscriptsubscript¯𝑓331emp12f_{331}=25,\bar{f}_{331}^{\mathrm{emp}}=12italic_f start_POSTSUBSCRIPT 331 end_POSTSUBSCRIPT = 25 , over¯ start_ARG italic_f end_ARG start_POSTSUBSCRIPT 331 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT roman_emp end_POSTSUPERSCRIPT = 12, has a higher relative error, i.e. Err331=13,RErr331=0.52formulae-sequencesubscriptErr33113subscriptRErr3310.52\operatorname{Err}_{331}=13,\operatorname{RErr}_{331}=0.52roman_Err start_POSTSUBSCRIPT 331 end_POSTSUBSCRIPT = 13 , roman_RErr start_POSTSUBSCRIPT 331 end_POSTSUBSCRIPT = 0.52. These errors are due to that these road segments, at an earlier iteration of the flow map, comprised shorter sub-segments with higher flows due to the presence of the more than 100100100100 GPS trajectories passing near the intersection of 𝒇11subscript𝒇11{\bm{f}}_{11}bold_italic_f start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT and 𝒇331subscript𝒇331{\bm{f}}_{331}bold_italic_f start_POSTSUBSCRIPT 331 end_POSTSUBSCRIPT. Since the nodes of these sub-segments are pseudo nodes, they were removed, and the concatenated sub-segments assigned the single weighted mean flow = 25.

Refer to caption(a) Refer to caption(b)
Figure 11: Validation of line transects of locally aligned flow map. (a) Neighbourhood level. (b) Zoom of black rectangle. Line transects computed with tolerance εT=5subscript𝜀𝑇5\varepsilon_{T}=5italic_ε start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = 5 m, at every δT=50subscript𝛿𝑇50\delta_{T}=50italic_δ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = 50 m. Label is absolute error (Err). Colour (purple to orange) of line transects is proportional to absolute error. Width of road segments (blue) is proportional to traffic flow.

To supplement the visual examination of these flow estimation errors, Figure 12 shows the bivariate (Err, RErr) histogram plot of the errors. The vast majority (6518 or 88.3%) in the orange hexagonal bin have zero absolute and relative error, and only 39 (0.53%) have Err >4absent4>4> 4, and 364 (4.9%) have RErr >0.1absent0.1>0.1> 0.1, and only 16 (0.21%) have Err >4absent4>4> 4 and RErr >0.1absent0.1>0.1> 0.1 which comprise most of the purple bins.

Refer to caption
Figure 12: Histogram plot of absolute and relative errors of line transects of locally aligned flow map. Label in hexagonal bin is number of line transects. Vertical dotted line is Err = 4, horizontal dotted line is RErr = 0.1.

This demonstrates that our flow map is mostly accurate as it usually controls the estimation error. By accuracy, we mean by how close the estimated flows are to the empirical proxy flows. However accuracy is insufficient on its own: if we take the extreme situation of a flow map with a single road segment with zero error, then this has the highest accuracy, but we have no estimated flows outside of this single segment. So we require that the spatial coverage of the flow map be also high. We cannot resolve this question of spatial coverage unambiguously since we do not have a gold standard flow map, though we can verify that all n=1147𝑛1147n=1147italic_n = 1147 map matched routes intersect at least one line transect from the estimated flow map. Whilst this calculation does not exclude that there can be some regions of some map matched routes are without nearby line transects, we can further verify with a visual inspection that these regions tend to be small in area and/or comprise a low number of routes. So we claim that our minimal flow map has high levels of accuracy and spatial coverage.

To conclude the validation of our flow maps, the flow map at the city level is illustrated in Figure 13(a). The orange segments with high flows are apparent, whereas these high flows were not apparent from the scatter plot of the GPS trajectories in Figure 1(a). The desire line map (or spider diagram) is in Figure 13(b) and represent the traffic flows between the origin/destination hub nodes (black solid circles) These hub nodes are the centroids from a hierarchical clustering, similar to that in ST_SNAPNODE, of the boundary points of GPS trajectories with the cutoff at 5 000 m. The straight lines indicate the GPS trajectories whose origin/destination are associated with different hub nodes, and the circles indicate the trajectories whose origin/destination are associated with the same hub node. The desire line map is in effect a low resolution map of straight-line flows between the hub nodes only, whereas the high resolution map shows the traffic flows on all road segments.

Refer to caption(a) Refer to caption(b)
Figure 13: High and low resolution flow maps. (a) Locally aligned flow map. (b) Desire line map. Colour (purple to orange) and width of road segments is proportional to traffic flow.

4.2 Software

These analysis algorithms have been developed in R since the complexity of Algorithms 18 require a mix of advanced geospatial and statistical methods. As a compiled language, R can have slower execution times. There are two main computational bottlenecks. The first consists of the map matching/route finding in Algorithm 1. The Valhalla routing engine APIs are available as a web-based service (e.g. https://valhalla1.openstreetmap.de) or as a local dockerised image (e.g. https://github.com/gis-ops/docker-valhalla). We use the latter as it allows for faster computation since these local API requests are not sent to a remote web-based server, and can be parallelised on a stand-alone machine. We conduct a small study of the execution times based on 10 replicates of ST_ROUTE on 10 randomly selected GPS trajectories on an Intel i5 Quad core 3.10 GHz machine running Ubuntu 22.04 and R 4.4.0. Executing ST_ROUTE with a local Valhalla API is around 7.1 times faster than the web-based API, and a parallelisation (with 3 workers) is around 1.8 times faster than a serial computation. This is less than 3 because the dockerised image is not optimised for simultaneous API calls. Combining these together, a parallelised local API achieves around a 12.6 fold speed improvement in comparison to a serial web-based API.

The second bottleneck is the line blending in Algorithm 7. Based on the execution times based on 10 replicates of ST_OVERLINE on a subset of 433 GPS trajectories, 3-worker parallelisation is around 1.8 times faster than a serial computation. This is less than 3 because only the repeated application of line blending (Algorithm 4) is parallelised, whilst the line blending priority (Algorithm 6) remains a serial computation. These speed factors are intended to be illustrative, since execution times, involving remote web servers APIs and parallelisation, are difficult to predict on different internet connections and machines. We tentatively claim that a local Valhalla API reduces the execution time by an order of magnitude, whereas parallelisation reduces it almost linearly by the number of workers.

We anticipate releasing an add-on package on CRAN (https://cran.r-project.org), which is the main R package repository. Since our R add-on package is under development, in the mean time, we provide a geopackage and QGIS project with the input GPS trajectories, map matched routes, iterated flow maps and desire lines, as listed in Table 1. The interested reader is able to interactively explore in QGIS the added value of our proposed high resolution minimal flow map flowmap4, in comparison to the input trajectories traj, the map matched routes route, the desire line flow map flowmap_desire, and the flow map computed according to a leading alternative without rasterisation (Morgan and Lovelace, 2021) which is similar to flowmap0.

Layer Description n𝑛nitalic_n
traj Empirical GPS trajectories 1 147
route Map matched routes ST_ROUTE 1 147
flowmap0 Flow map ST_OVERLINE(1–4), εD=1,S=\varepsilon_{D}=1~{},S=italic_ε start_POSTSUBSCRIPT italic_D end_POSTSUBSCRIPT = 1 , italic_S = ‘subdivision’ 13 495
flowmap1 Flow map ST_OVERLINE(5–14), 𝒌=1,2,ε=εS=4,S=formulae-sequenceformulae-sequence𝒌12𝜀subscript𝜀𝑆4𝑆absent{\bm{k}}=1,2,\varepsilon=\varepsilon_{S}=4,S=bold_italic_k = 1 , 2 , italic_ε = italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 4 , italic_S = ‘subdivision’ 2 437
flowmap2 Flow map ST_OVERLINE(5–14), 𝒌=1,2,ε=εS=4,S=formulae-sequenceformulae-sequence𝒌12𝜀subscript𝜀𝑆4𝑆absent{\bm{k}}=1,2,\varepsilon=\varepsilon_{S}=4,S=bold_italic_k = 1 , 2 , italic_ε = italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 4 , italic_S = ‘unary’ 1 953
flowmap3 Flow map ST_OVERLINE(5–14), 𝒌=1,2,3,4,ε=εS=5,S=formulae-sequenceformulae-sequence𝒌1234𝜀subscript𝜀𝑆5𝑆absent{\bm{k}}=1,2,3,4,\varepsilon=\varepsilon_{S}=5,S=bold_italic_k = 1 , 2 , 3 , 4 , italic_ε = italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 5 , italic_S = ‘unary’ 1 867
flowmap4 Flow map ST_OVERLINE(5–14), 𝒌=1,2,3,4,ε=εS=5,S=formulae-sequenceformulae-sequence𝒌1234𝜀subscript𝜀𝑆5𝑆absent{\bm{k}}=1,2,3,4,\varepsilon=\varepsilon_{S}=5,S=bold_italic_k = 1 , 2 , 3 , 4 , italic_ε = italic_ε start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = 5 , italic_S = ‘unary’ 1 413
flowmap_desire Desire lines ST_DESIRELINE, ε=5000𝜀5000\varepsilon=5000italic_ε = 5000 41
Table 1: Geospatial layers in geopackage. The first column is layer name, the second is description, and the third is number of geospatial features n𝑛nitalic_n.

5 Conclusion

We have introduced novel analysis algorithms to compute a flow map from empirical GPS trajectories. Our starting point is to focus on aligning segments of the map matched routes rather than the complete routes. We define a spatial relation to set up local reference road segments, which allows us to align other nearby road segments to this local reference segment. This local alignment is the key innovation to computing a minimal flow map that is aligned to the underlying road network. We presented solid evidence for the high level of spatial resolution, accuracy and coverage for our proposed minimal flow map. Since it accurately shows the traffic flow on all road segments at all scales, it provides increased added value in comparison to the empirical GPS trajectories, to the low resolution desire lines map, and to existing high resolution flow map methodologies.

References

  • Andrienko and Andrienko (2013) Andrienko, N. and G. Andrienko (2013). Visual analytics of movement: An overview of methods, tools and procedures. Information Visualization 12, 3–24.
  • Chao et al. (2020) Chao, P., Y. Xu, W. Hua, and X. Zhou (2020). A survey on map-matching algorithms. In R. Borovica-Gajic, J. Qi, and W. Wang (Eds.), Databases Theory and Applications, Volume 12008, pp.  121–133. Springer International Publishing.
  • Douglas and Peucker (1973) Douglas, D. H. and T. K. Peucker (1973). Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica (The Canadian Cartographer) 10, 112–122.
  • Dunnington and Pebesma (2023) Dunnington, D. and E. Pebesma (2023). geos: Open Source Geometry Engine (’GEOS’) R API. R package version 0.2.4. https://github.com/paleolimbot/geos.
  • Evans (1976) Evans, S. P. (1976). Derivation and analysis of some models for combining trip distribution and assignment. Transportation Research 10, 37–57.
  • Giorgino (2009) Giorgino, T. (2009). Computing and visualizing dynamic time war** alignments in R: The dtw package. Journal of Statistical Software 31(7), 1–24.
  • GIS OPS (2023) GIS OPS (2023). Valhalla routing engine version 3.4.0 [Docker]. https://github.com/gis-ops/docker-valhalla.
  • Gordon (1999) Gordon, A. D. (1999). Classification (2nd ed.). London: Chapman and Hall/CRC.
  • Herrera et al. (2010) Herrera, J. C., D. B. Work, R. Herring, Q. Ban, X. Jacobson, and A. M. Bayen (2010). Evaluation of traffic data obtained via GPS-enabled mobile phones: The Mobile Century field experiment. Transportation Research Part C: Emerging Technologies 18, 568–583.
  • Lovelace and Ellison (2018) Lovelace, R. and R. Ellison (2018). stplanr: A package for transport planning. The R Journal 10, 7–23.
  • Lovelace et al. (2019) Lovelace, R., J. Nowosad, and J. Muenchow (2019). Geocomputation with R. Chapman and Hall/CRC.
  • Morgan and Lovelace (2021) Morgan, M. and R. Lovelace (2021). Travel flow aggregation: Nationally scalable methods for interactive and online visualisation of transport behaviour at the road network level. Environment and Planning B: Urban Analytics and City Science 48, 1684–1696.
  • Müllner (2013) Müllner, D. (2013). fastcluster: Fast hierarchical, agglomerative clustering routines for R and Python. Journal of Statistical Software 53(9), 1–18.
  • Necula (2015) Necula, E. (2015). Analyzing traffic patterns on street segments based on GPS data using R. Transportation Research Procedia 10, 276–285.
  • OGC (2010) OGC (2010). OpenGIS implementation standard for geographic information - Simple feature access - Part 1: Common architecture. Version 1.2.1.
  • Ortúzar and Willumsen (2011) Ortúzar, J. D. and L. G. Willumsen (2011). Modelling Transport (4th ed.). Hoboken: Wiley.
  • Pebesma (2018) Pebesma, E. (2018). Simple features for R: Standardized support for spatial vector data. The R Journal 10, 439–446.
  • Quddus et al. (2007) Quddus, M. A., W. Y. Ochieng, and R. B. Noland (2007). Current map-matching algorithms for transport applications: State-of-the art and future research directions. Transportation Research Part C: Emerging Technologies 15, 312–328.
  • Ramer (1972) Ramer, U. (1972). An iterative procedure for the polygonal approximation of plane curves. Computer Graphics and Image Processing 1, 244–256.
  • Saki and Hagen (2022) Saki, S. and T. Hagen (2022). A practical guide to an open-source map-matching approach for Big GPS Data. SN Computer Science 3(5), 415.
  • Sakoe and Chiba (1978) Sakoe, H. and S. Chiba (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing 26, 43–49.
  • Tobler (1987) Tobler, W. R. (1987). Experiments in migration map** by computer. The American Cartographer 14, 155–163.
  • van der Meer et al. (2023) van der Meer, L., L. Abad, A. Gilardi, and R. Lovelace (2023). sfnetworks: Tidy Geospatial Networks. R package version 0.6.4. https://luukvdmeer.github.io/sfnetworks.
  • Wood et al. (2010) Wood, J., J. Dykes, and A. Slingsby (2010). Visualisation of origins, destinations and flows with OD maps. The Cartographic Journal 47, 117–129.
  • Zhou et al. (2013) Zhou, H., P. Xu, X. Yuan, and H. Qu (2013). Edge bundling in information visualization. Tsinghua Science and Technology 18, 145–156.
  • Zourlidou et al. (2022) Zourlidou, S., J. Golze, and M. Sester (2022). Dataset: GPS trajectory dataset of the region of Hannover, Germany. https://doi.org/10.25835/9bidqxvl.