License: arXiv.org perpetual non-exclusive license
arXiv:2403.00276v1 [cs.LG] 01 Mar 2024

Graph Construction with Flexible Nodes for Traffic Demand Prediction

**yan Hou Shan Liu Ya Zhang School of Automation, Southeast UniversityNan**gChina [email protected] [email protected] [email protected]  and  Haotong Qin Department of Information Technology and Electrical Engineering, ETH ZürichZürichSwitzerland [email protected]
(2024)
Abstract.

Graph neural networks (GNNs) have been widely applied in traffic demand prediction, and transportation modes can be divided into station-based mode and free-floating traffic mode. Existing research in traffic graph construction primarily relies on map matching to construct graphs based on the road network. However, the complexity and inhomogeneity of data distribution in free-floating traffic demand forecasting make road network matching inflexible. To tackle these challenges, this paper introduces a novel graph construction method tailored to free-floating traffic mode. We propose a novel density-based clustering algorithm (HDPC-L) to determine the flexible positioning of nodes in the graph, overcoming the computational bottlenecks of traditional clustering algorithms and enabling effective handling of large-scale datasets. Furthermore, we extract valuable information from ridership data to initialize the edge weights of GNNs. Comprehensive experiments on two real-world datasets, the Shenzhen bike-sharing dataset and the Haikou ride-hailing dataset, show that the method significantly improves the performance of the model. On average, our models show an improvement in accuracy of around 25% and 19.5% on the two datasets. Additionally, it significantly enhances computational efficiency, reducing training time by approximately 12% and 32.5% on the two datasets. We make our code available at https://github.com/hou**yan/HDPC-L-ODInit.

Graph Construction,Graph Neural Network,Free-Floating Traffic Mode,Traffic Demand Prediction
copyright: acmlicensedjournalyear: 2024doi: XXXXXXX.XXXXXXXconference: Make sure to enter the correct conference title from your rights confirmation emai; August 25–29, 2024; Barcelona, Spainisbn: 978-1-4503-XXXX-X/24/06ccs: Information systems Spatial-temporal systems

1. INTRODUCTION

In recent years, a variety of new modes of transportation have emerged, facilitating the lives of residents while also bringing great challenges to transportation planning, especially in the increasingly complex urban road network. For instance, the rapid expansion of free-floating bike sharing systems (FFBS) in China has led to a widely reported problem of oversupply(Tian et al., 2024). To enhance efficiency, operators require effective rebalancing methods and intelligent management strategies. Particularly when dealing with large-scale complex systems, the implementation of spatial and time-based zoning management strategies becomes crucial. However, these new transportation systems suffer from significant spatial and temporal imbalances due to various factors. Thus, accurate prediction of service demand at different locations and times is necessary to enhance traveler satisfaction and reduce operational costs(Tian et al., 2024).

Previous studies(Du et al., 2021; Yao et al., 2018; Lv et al., 2018; Zhou et al., 2018; Zonoozi et al., 2018) have attempted to use convolutional neural networks (CNNs) for predicting free-floating traffic demand, but this approach is fundamentally flawed because CNNs are not suitable for handling non-Euclidean spatial data, such as transportation networks. For instance, in image data, the proximity of pixels often corresponds to semantic similarity, whereas in the spatial distribution of traffic demand, distant regions may hold greater relevance than nearby regions due to a complex topological relationship. This relationship poses a challenge for CNNs as they struggle to capture such nuances.

Graph neural networks (GNNs) offer a valuable approach for capturing non-Euclidean spatial relationships within transportation networks. Consequently, previous research has explored various methodologies for constructing graphs based on transportation networks, which can generally be categorized into two groups: manual determination and automatic determination through machine learning algorithms. In manual determination, some researchers have directly matched origins and destinations with the road network, using midpoints or intersections of roads as nodes (De Fabritiis et al., 2008; Wang et al., 2019). Other studies have adopted a grid-based approach, partitioning the space into grids with each grid serving as a node(Yuan et al., 2018; Huang et al., 2022; Zhang et al., 2019). However, these algorithms have limitations. First of all, they lack flexibility because they only allow nodes to be selected from a limited number of options, and the granularity at which they are modeled is fixed and cannot be flexibly adjusted. Secondly, due to the complex and intricate nature of large-scale traffic networks, different strategies must be developed for different areas to accurately determine the nodes, reducing efficiency and restricting applicability to a large number of cases. Alternatively, machine learning methods can be employed for automatic node determination, often utilizing clustering techniques(Hulot et al., 2018; Li et al., 2015). However, these algorithms are typically designed to cluster and group a pre-existing set of fixed nodes, limiting their flexibility. Additionally, the high computational complexity of clustering algorithms makes them challenging to apply to large-scale datasets. In conclusion, while manual determination and clustering-based automatic determination have been explored, they each possess certain limitations in terms of flexibility, efficiency, and applicability. We aim to address these limitations and develop innovative methods for defining rational and flexible nodes that lead to the construction of accurate and insightful graphs, which in turn will lead to the application of graph neural networks to free-floating traffic mode demand forecasting.

Refer to caption
(a) ride-hailing order distribution of Haikou
Refer to caption
(b) bike-sharing order distribution of Shenzhen
Figure 1. Order density distribution of Haikou and Shenzhen

To address the aforementioned challenges, we propose HDPC-L, a hierarchical clustering algorithm based on density clustering. Traditional clustering algorithms are not well-suited for datasets associated with transportation demand problems due to imbalanced data distributions. Taking the Shenzhen bike-sharing dataset as an example, in Fig. 1, the density of data points in different regions exhibits significant disparity and disorder. Ordinary methods like K-means fail to yield satisfactory results in such scenarios. In contrast, our method overcomes this limitation by employing density-based clustering techniques. Moreover, the concept of hierarchical clustering significantly reduces the computational complexity of our approach while extending its performance capabilities. Regardless of the regional scale, our algorithm enables the construction of a reasonable graph structure. This graph structure is also highly flexible, allowing us to freely adjust the number of nodes as required and model at various levels of granularity.

Unlike many existing models(Wu et al., 2019; Song et al., 2020; Chen et al., 2022; Fang et al., 2021; James, 2022; Jiang et al., 2023; Lan et al., 2022) which primarily focus on structural improvements through module redesign or the addition of attention or transformer modules, our approach aims to explore and harness valuable information embedded in the ridership data itself. For datasets containing typical origin-destination (OD) information, we employ statistical methods to uncover the OD flow relationships between nodes. Based on this, we generate a weight matrix that serves as an initialization parameter alongside the original adjacency matrix, and subsequently train our models. We conducted experiments using five baseline models and observed improvements across all six evaluation metrics: Accuracy, RMSE, R2, Explained Variance, Edge Quantity and Training Time for each Epoch.

Our main contributions can be summarized as follows:

  • \bullet

    To the best of our knowledge, we are the first to construct graphs with flexible nodes for predicting free-floating traffic demand, expanding the application area of GNNs.

  • \bullet

    We introduce HDPC-L, a novel hierarchical density clustering algorithm designed to identify rational and adaptable graph nodes. HDPC-L addresses the shortcomings encountered by existing algorithms such as high computational complexity, lack of flexibility, and unsuitability for large-scale datasets.

  • \bullet

    Pioneeringly, we extract origin-destination (OD) information from the dataset and use it as the basis for initializing the graph edge weights. This fusion simplifies the graph while improving its representational capabilities, which in turn significantly improves the performance of the model.

  • \bullet

    We validate the effectiveness of this approach on two real-world datasets, with 24.96% and 19.46% improvement in accuracy and 12.05% and 32.40% reduction in training time, respectively.

2. RELATED WORK

Our paper is related to the following research directions:

2.1. Graph Construction

Graph neural networks (GNNs) possess remarkable capabilities in capturing spatial dependencies within transportation networks, prompting extensive research into constructing graph structures. These studies can be broadly classified into three categories: (1)The first category involves fixed graph structures where nodes are derived from real-world sensor locations. For instance, some researchers leverage the PeMS dataset for highway traffic prediction in California (Fang et al., 2021; Chen et al., 2022; Li et al., 2018; Wu et al., 2019; Jiang et al., 2023; Li and Lasenby, 2021; James, 2022; Zhao et al., 2020), while others utilize datasets from Chicago and Los Angeles bike-sharing systems for stacked demand prediction(Li et al., 2022). However, the construction of such graphs is inflexible, demanding high road network accuracy, lacking flexibility, and offering limited options for node selection. (2)The second category involves rasterizing space and using grid points as graph nodes. Examples include dividing cities into square areas (Huang et al., 2022), segmenting total areas into smaller regions (Li and Axhausen, 2020; Zhang et al., 2019), or delineating small areas based on road networks (Tang et al., 2021). (3)The third category employs unsupervised clustering for graph construction. Researchers have reclustered stations in New York’s bike-sharing dataset for station classification(Li et al., 2015). Additionally, methods like Traffic Analysis Zone (TAZ) clustering for free-floating bike-sharing systems (FFBS) have been developed, requiring spatial and temporal data analysis with high computational complexity(Tian et al., 2024; Hua et al., 2020; Lv et al., 2020). Our HDPC-L method addresses these challenges by flexibly generating rational nodes with minimal computational complexity. Furthermore, due to its hierarchical clustering design, we can increase the number of clusters infinitely by increasing the number of layers to reduce the modeling granularity.

2.2. Graph Neural Network

Graph Neural Networks (GNNs) have become important tools for representing spatial topological relationships in Intelligent Transportation Systems (ITS) due to their expertise in handling non-Euclidean data. Deep learning architectures leveraging GNNs find extensive application in traffic prediction tasks: DCRNN (Li et al., 2018) conceptualizes each moment within the traffic system as a diffusion process of vehicles across the road network; TGCN (Zhao et al., 2020) integrates Graph Convolutional Networks (GCNs) and Gated Recurrent Units (GRUs) to capture both spatial and temporal dependencies concurrently; GraphWaveNet (Wu et al., 2019) seamlessly amalgamates GCNs and Temporal Convolutional Networks (TCNs) for enhanced predictive capabilities; AST-GAT (Li and Lasenby, 2021) employs multi-head graph attention blocks to capture intricate spatial dependencies; Frigate (Gupta et al., 2023) introduces a novel GNN tailored for scenarios involving missing data, such as road sensor failures or closures; STDGRL (Xie et al., 2023) pioneers a spatio-temporal dynamic graph-relational learning model for predicting traffic flow in urban subway stations; MGC-RNN (He et al., 2022) explores spatio-temporal prediction tasks by fusing heterogeneous data from multiple sources; TGC-LSTM (Cui et al., 2019) defines traffic graph convolution based on physical network topology, augmenting interpretability by integrating L1 and L2 parameters into the model’s loss function. Past research efforts predominantly focused on enhancing model performance through architectural expansions and parameter increments, often neglecting the intrinsic characteristics of the data itself. Our approach diverges by mining Origin-Destination (OD) flow information inherent in traffic ridership datasets. We leverage this information to initialize the edge weights of graph neural networks using the graph adjacency matrix, resulting in a substantial performance enhancement over the baseline model.

2.3. Free-Floating Traffic Demand Prediction

Existing research on forecasting free-floating traffic demand can be broadly categorized into two main streams: those manually constructing the graph structure by aligning it with the road network or rasterizing the spatial domain, as mentioned in Section 2.2 (Zhao et al., 2020; Li and Lasenby, 2021; Li et al., 2018; Wu et al., 2019; Xie et al., 2023), and those that eschew graph structures and rely on Convolutional Neural Networks (CNNs) to encapsulate spatial dependencies within the traffic network. For methods leveraging CNNs to capture spatial dependencies: DTCNN (Du et al., 2021) introduces a dynamically shifted convolutional neural network tailored for precise traffic demand forecasting; DMVST-Net (Yao et al., 2018) proposes a deep multi-view spatio-temporal network framework to model spatial and temporal relationships, utilizing local CNNs to capture local spatial correlations; LC-RNN incorporates a network-embedded convolutional structure aimed at capturing topology-aware features (Lv et al., 2018); Some researchers have adopted a convolutional and ConvLSTM unit-based encoder-decoder framework in citywide passenger demand (Zhou et al., 2018).

3. Preliminaries

Traffic Graph. In traffic prediction problems, traffic network usually be constructed as a graph G(N,E)𝐺𝑁𝐸G(N,E)italic_G ( italic_N , italic_E ), where N𝑁Nitalic_N denotes the node set of the graph and EN×N𝐸superscript𝑁𝑁E\subseteq\mathbb{R}^{N\times N}italic_E ⊆ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT denotes the edge set. In this model, the graph node sets represent the key points of different regions. Taking the bike sharing dataset as an example, the center of clustering for each cluster is a node that manages the shared bikes within the cluster.

Definition 0 ().

(Inflow and Outflow) In this paper, we define ItN×Nsuperscriptnormal-Inormal-tsuperscriptnormal-Nnormal-NI^{t}\subseteq\mathbb{R}^{N\times N}italic_I start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT and OtN×Nsuperscriptnormal-Onormal-tsuperscriptnormal-Nnormal-NO^{t}\subseteq\mathbb{R}^{N\times N}italic_O start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ⊆ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT. For Itsuperscriptnormal-Inormal-tI^{t}italic_I start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, we use it to refer the inflow among nodes at time slot t, and the same goes for Otsuperscriptnormal-Onormal-tO^{t}italic_O start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. Therefore, we can define 𝕀={I1,I2,I3,,IT}T×N×N𝕀superscriptnormal-I1superscriptnormal-I2superscriptnormal-I3normal-…superscriptnormal-Inormal-Tsuperscriptnormal-Tnormal-Nnormal-N\mathbb{I}=\{I^{1},I^{2},I^{3},\ldots,I^{T}\}\subseteq\mathbb{R}^{T\times N% \times N}blackboard_I = { italic_I start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_I start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_I start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , … , italic_I start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT } ⊆ blackboard_R start_POSTSUPERSCRIPT italic_T × italic_N × italic_N end_POSTSUPERSCRIPT to represent the inflow among nodes over a period of time, where Tnormal-TTitalic_T denotes the number of time slots and Nnormal-NNitalic_N denotes the number of nodes of the graph, and the same goes for 𝕆={O1,O2,O3,,OT}T×N×N𝕆superscriptnormal-O1superscriptnormal-O2superscriptnormal-O3normal-…superscriptnormal-Onormal-Tsuperscriptnormal-Tnormal-Nnormal-N\mathbb{O}=\{O^{1},O^{2},O^{3},\ldots,O^{T}\}\subseteq\mathbb{R}^{T\times N% \times N}blackboard_O = { italic_O start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_O start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , italic_O start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT , … , italic_O start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT } ⊆ blackboard_R start_POSTSUPERSCRIPT italic_T × italic_N × italic_N end_POSTSUPERSCRIPT.

Free-Floating Traffic Mode. The term ”free-floating” refers to a transportation system that enables users to pick up and drop off shared vehicles at any location within a designated area, rather than being restricted to specific stations or hubs. This mode is commonly associated with various shared mobility services such as ride-hailing, bike-sharing, delivery, or car-sharing, offering users the freedom and flexibility to initiate and conclude their trips at any legal parking spot within a defined operational zone.

Refer to caption
Figure 2. Overview of Our Method

4. Methodology

4.1. Overview

The method’s overview depicted in Fig. 2 is as follows:

  • Spatial Division and Map**: The space is segmented into grids with varying granularities. Subsequently, the geographic location information of each data point in the dataset is mapped to the corresponding grid sequentially.

  • HDPC-L Hierarchical Clustering: Beginning with the coarsest granularity grid, termed layer1𝑙𝑎𝑦𝑒𝑟1layer1italic_l italic_a italic_y italic_e italic_r 1, data points are clustered using the proposed HDPC-L algorithm to derive the clustering result of the first layer, denoted as layer1clusterorder𝑙𝑎𝑦𝑒𝑟1𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑜𝑟𝑑𝑒𝑟layer1-cluster-orderitalic_l italic_a italic_y italic_e italic_r 1 - italic_c italic_l italic_u italic_s italic_t italic_e italic_r - italic_o italic_r italic_d italic_e italic_r. Moving to finer granularities, termed layer2𝑙𝑎𝑦𝑒𝑟2layer2italic_l italic_a italic_y italic_e italic_r 2, the clustering process continues. However, instead of clustering all data points in the second layer together, they are segregated based on layer1clusterorder𝑙𝑎𝑦𝑒𝑟1𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑜𝑟𝑑𝑒𝑟layer1-cluster-orderitalic_l italic_a italic_y italic_e italic_r 1 - italic_c italic_l italic_u italic_s italic_t italic_e italic_r - italic_o italic_r italic_d italic_e italic_r. Data points belonging to the same cluster in the first layer are then clustered using the HDPC-L algorithm to obtain the clustering result of the second layer, termed layer2clusterorder𝑙𝑎𝑦𝑒𝑟2𝑐𝑙𝑢𝑠𝑡𝑒𝑟𝑜𝑟𝑑𝑒𝑟layer2-cluster-orderitalic_l italic_a italic_y italic_e italic_r 2 - italic_c italic_l italic_u italic_s italic_t italic_e italic_r - italic_o italic_r italic_d italic_e italic_r. This hierarchical clustering process continues until the desired number of nodes is achieved, thereby determining the graph’s fineness and establishing the set of nodes N𝑁Nitalic_N.

  • Edge Weight Initialization: Based on the clustering results from Step 2, the area covered by the data set is redivided, and each data point is assigned to its corresponding cluster. Statistical analysis is then conducted to derive all Origin-Destination (OD) traffic relationships among the clustered centers. This process generates the edge weight matrix W𝑊Witalic_W of the graph G𝐺Gitalic_G.

  • Graph Construction: Utilizing the node set N𝑁Nitalic_N and edge weight matrix W obtained from steps Step 2 and Step 3, respectively, the weighted graph structure G(N,E,W)𝐺𝑁𝐸𝑊G(N,E,W)italic_G ( italic_N , italic_E , italic_W ) is established.

4.2. Spatial Division and Map**

To accomplish the clustering task effectively, it is imperative to represent the ridership data within the dataset in a spatially coherent manner. In this regard, we partition the entire geographical area covered by the dataset into a series of grids, each characterized by varying levels of granularity. The outcome of this spatial division is illustrated in Fig. 3, which provides a schematic depiction of the data point density distribution within the Shenzhen bike-sharing dataset under square grid divisions of diverse granularity. Evidently, the choice of granularity profoundly influences our ability to capture the nuances of the data distribution. With larger granularity, we can discern the global distribution patterns, whereas smaller granularity facilitates a more refined characterization of local variations within sub-regions. This approach enables us to effectively balance the representation of both global and local data characteristics, thereby enhancing the robustness and accuracy of subsequent clustering analyses.

Refer to caption
(a) 10×10grid1010𝑔𝑟𝑖𝑑10\times 10\;grid10 × 10 italic_g italic_r italic_i italic_d
Refer to caption
(b) 60×60grid6060𝑔𝑟𝑖𝑑60\times 60\;grid60 × 60 italic_g italic_r italic_i italic_d
Figure 3. Results of square segmentation at different granularities for the Shenzhen Bike-Sharing Dataset

4.3. HDPC-L Hierarchical Clustering

4.3.1. Hierarchical Clustering

Hierarchical clustering is an approach used to reduce the computational complexity of clustering algorithms and improve their efficiency. In order to represent the density distribution of origins and destinations, a fine-grained grid is often utilized, such as dividing the original space into a 1000×1000100010001000\times 10001000 × 1000 grid, as illustrated in Section 4.2. However, this division can lead to a huge number of data points, resulting in long computation times or unreasonable clustering outcomes. To address this issue, we adopt a hierarchical clustering method that follows the concept of coarse-to-fine graph modeling. This approach enhances the clustering process by significantly reducing computation requirements and generating more reasonable results. The coarse-to-fine graph approach captures macroscopic spatial dependencies at larger granularities and captures small changes in local spatial dependencies at smaller granularities.

Referring to the schematic in Figure 2, we can gain a clear understanding of the hierarchical clustering approach, with a particular focus on the HDPC-L Hierarchical Clustering section. Firstly, we perform the initial level of clustering on the global area using the largest granularity grid. Our proposed HDPC-L algorithm enables us to manually set or automatically select the desired number of clusters. Consequently, we obtain the first level of clustering results. Subsequently, we repeat this process on a grid with smaller granularity, dividing the ridership data based on the clustering results from the previous layer. Each class is treated as a sub-region and undergoes its own clustering operation to yield the clustering results of the second layer. This process continues, with each subsequent layer delving deeper into the clustering process, until we obtain the desired number of nodes. As the layers progress, the number of nodes increases, leading to a higher level of modeling refinement. Moreover, the hierarchical clustering algorithm is widely applicable and highly adaptable. It can be tailored to suit almost any large-scale dataset by adjusting the number of clustering layers, fine-tuning clustering algorithm parameters, and employing other relevant techniques.

4.3.2. HDPC-L

Density-based clustering algorithms offer several advantages that align with the challenges encountered when modeling free-floating traffic mode problem. Firstly, these algorithms identify clusters of diverse shapes and sizes, providing a level of flexibility that surpasses other clustering methods. Additionally, they demonstrate robustness in the presence of noise and outliers, as they prioritize the identification of regions with high data density rather than individual data points. Moreover, density-based clustering algorithms are adept at handling clusters with varying densities, making them particularly suitable for datasets with non-uniform cluster distribution. As a result, we propose the HDPC-L algorithm, which is based on density peak clustering (DPC)(Rodriguez and Laio, 2014) to effectively identify the graph nodes in the free-floating traffic mode dataset.

The steps for HDPC-L are shown in Algothrim 1. The HDPC-L algorithm operates on two fundamental assumptions. Firstly, it posits that the center of a class cluster is encompassed by lower-density data points within the cluster, with the center representing the point of highest density in its vicinity. Secondly, it emphasizes maximizing the distance between the centers of class clusters. These assumptions underpin the approach taken by the HDPC-L algorithm in tackling the complexities of the free-floating traffic mode datasets. To satisfy these two assumptions, the following two concepts need to be defined—local density ρisubscript𝜌𝑖\rho_{i}italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and relative distance δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

(1) ρi=ijβexp[(dijdc)2]subscript𝜌𝑖subscript𝑖𝑗𝛽𝑒𝑥𝑝delimited-[]superscriptsubscript𝑑𝑖𝑗subscript𝑑𝑐2\rho_{i}=\sum\limits_{i\neq j}\beta exp\left[-\left(\frac{d_{ij}}{d_{c}}\right% )^{2}\right]italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i ≠ italic_j end_POSTSUBSCRIPT italic_β italic_e italic_x italic_p [ - ( divide start_ARG italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ]

In Eq. 1, β𝛽\betaitalic_β represents the density of data points, dijsubscript𝑑𝑖𝑗d_{ij}italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT represents the Euclidean distance between data point i𝑖iitalic_i and data point j𝑗jitalic_j, and dcsubscript𝑑𝑐d_{c}italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT represents the neighborhood truncation distance of data point i𝑖iitalic_i.

(2) δi=maxijdijsubscript𝛿𝑖subscript𝑖𝑗subscript𝑑𝑖𝑗\delta_{i}=\max_{i\neq j}{d_{ij}}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_i ≠ italic_j end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT
(3) δi=minj:ρj>ρidijsubscript𝛿𝑖subscript:𝑗subscript𝜌𝑗subscript𝜌𝑖subscript𝑑𝑖𝑗\delta_{i}=\min_{j:\rho_{j}>\rho_{i}}{d_{ij}}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = roman_min start_POSTSUBSCRIPT italic_j : italic_ρ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT > italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT

Eq. 2 and 3 are the two formulas for relative distance ρisubscript𝜌𝑖\rho_{i}italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT referring to the minimum distance between the sample point i𝑖iitalic_i and other points with higher density, and the local density of each data point needs to be sorted before calculating the sample point i𝑖iitalic_i. Eq. 2 is the relative distance of the sample with the highest density and Eq. 3 is the relative distance of the remaining data points. The data point with the highest density must be the center of the density, and we artificially set its distance to the maximum value. The remaining density peaks need to satisfy both a higher local density ρ𝜌\rhoitalic_ρ and a larger relative distance δ𝛿\deltaitalic_δ. Thus, we can define decision variable γ𝛾\gammaitalic_γ:

(4) γ=ρi×δiL𝛾subscript𝜌𝑖superscriptsubscript𝛿𝑖𝐿\gamma=\rho_{i}\times{\delta_{i}}^{L}italic_γ = italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT × italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT

In Eq. 4, L denotes the number of the cluster layer. The reason we define Eq. 4 in this way is that as the number of layers of hierarchical clustering deepens, the influence of δ𝛿\deltaitalic_δ on γ𝛾\gammaitalic_γ becomes less and less due to smaller sub-regions, which leads us to get many invalid clustering centers very close to each other in the densely populated regions. To offset this weakening of delta’s influence, we choose δLsuperscript𝛿𝐿\delta^{L}italic_δ start_POSTSUPERSCRIPT italic_L end_POSTSUPERSCRIPT as the criterion for relative distance, which greatly alleviates the decay of δ𝛿\deltaitalic_δ’s influence as the layer deepens.

Input: Sample dataset D𝐷Ditalic_D, number of clustering layers L𝐿Litalic_L.
Output: Cluster centers locations X,Y𝑋𝑌X,Yitalic_X , italic_Y, coding of the cluster to which each data point belongs N𝑁Nitalic_N.
foreach layer in L𝐿Litalic_L do
       if not the first layer then
             Get the results of the previous layers of clustering; Divide D𝐷Ditalic_D into multiple subsets based on the clustering results of the previous layers.
      foreach subset in D𝐷Ditalic_D do
             Calculate the distance matrix dijsubscript𝑑𝑖𝑗d_{ij}italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT using the subset data; Determine the neighborhood stage distance dcsubscript𝑑𝑐d_{c}italic_d start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT; Calculating local density ρisubscript𝜌𝑖\rho_{i}italic_ρ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and relative distance δisubscript𝛿𝑖\delta_{i}italic_δ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT; Select clustering centers based on γ𝛾\gammaitalic_γ; Categorize data points that are not cluster centers.
      
Algorithm 1 Steps of HDPC-L clustering algorithm.

4.4. Graph Neural Network Enhancement

4.4.1. Spacial Dependence Modeling

The characterization of spatial dependence plays a crucial role in traffic demand forecasting. CNNs are not suitable for handling non-Euclidean structured data like traffic networks, and we need more powerful tools to effectively characterize these complex spatial dependencies. Fortunately, the emergence of GNNs provides a promising solution for addressing the challenges associated with characterizing graph structures. Currently, most research in traffic prediction leverages GCNs to capture spatial dependencies. By utilizing the known adjacency matrix A𝐴Aitalic_A and feature matrix X𝑋Xitalic_X, GCNs can express the transfer relationship between features at each layer, as demonstrated in Eq. 5. In this equation, H𝐻Hitalic_H represents the features of each layer, where X𝑋Xitalic_X corresponds to the input layer. Additionally, A~~𝐴\widetilde{A}over~ start_ARG italic_A end_ARG is computed by adding the unit matrix I𝐼Iitalic_I to the adjacency matrix A𝐴Aitalic_A. Furthermore, σ𝜎\sigmaitalic_σ denotes a nonlinear activation function, while D~~𝐷\widetilde{D}over~ start_ARG italic_D end_ARG signifies the degree matrix derived from A~~𝐴\widetilde{A}over~ start_ARG italic_A end_ARG. Lastly, W𝑊Witalic_W represents a learnable parameter matrix. By stacking multiple GCN network layers according to Eq. 5, we can effectively capture the intricate spatial dependencies within the transportation network.

(5) H(l+1)=σ(D~12A~D~12H(l)W(l))superscript𝐻𝑙1𝜎superscript~𝐷12~𝐴superscript~𝐷12superscript𝐻𝑙superscript𝑊𝑙H^{(l+1)}=\sigma\left(\widetilde{D}^{-\frac{1}{2}}\widetilde{A}\widetilde{D}^{% -\frac{1}{2}}H^{(l)}W^{(l)}\right)italic_H start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_σ ( over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over~ start_ARG italic_A end_ARG over~ start_ARG italic_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT italic_H start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT italic_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT )
Refer to caption
(a) Original Adjacency Matrix
Refer to caption
(b) Weighted Adjacency Matrix
Figure 4. Changes in the adjacency matrix before and after weighting of OD flow relationships on the Haikou Ride-Hailing Dataset (diagonal line represents nodes themselves).

4.4.2. Edge Weight Initialization

The approach for station-based mode differs from that of the free-floating traffic mode. In the case of the station-based mode, the adjacency matrix can be directly obtained from the road network, which can be used to construct the GCN model. However, in the case of the free-floating traffic mode, as explained in Section 4.3, identifying nodes N𝑁Nitalic_N in the graph G(N,E)𝐺𝑁𝐸G(N,E)italic_G ( italic_N , italic_E ) does not directly correspond to specific locations within the road network. Consequently, it becomes necessary to extract valuable information from the ridership data to determine if there is any prior knowledge to construct the adjacency matrix of the graph. Free-floating traffic mode datasets possess distinct characteristics compared to other types of traffic datasets, typically containing ridership information. This raw information provides valuable insights for graph construction, particularly regarding the OD flow. To effectively build the graph, we need to begin with spatio-temporal matching, which involves assigning each order to the clusters established in Section 4.3 based on its geographic location. In this way the origins and destinations of the order can help determine which clusters have OD traffic exchanges between them, and by connecting the points with exchanges we obtain the adjacency matrix A𝐴Aitalic_A of the graph. However, unlike the relatively simpler structure of real-world road networks, the traffic prediction for the free-floating traffic mode is characterized by complexity, flexibility, and diversity. As a result, the graph constructed using this approach becomes highly intricate, with a significantly larger number of edges compared to the graph of the station-based model with a similar order of magnitude in terms of nodes. In order to solve this problem and simplify the structure of the graph reasonably , we compute the OD flow between all nodes within each time slot, obtaining the OD feature matrices 𝕆𝕆\mathbb{O}blackboard_O and 𝕀𝕀\mathbb{I}blackboard_I as defined in Definition. 1.

(6) {A~i=Aggr(𝕀)AA~o=Aggr(𝕆)A\begin{cases}\widetilde{A}_{i}=Aggr{\left(\mathbb{I}\right)}\mid\mid A\\ \widetilde{A}_{o}=Aggr{\left(\mathbb{O}\right)}\mid\mid A\\ \end{cases}{ start_ROW start_CELL over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_A italic_g italic_g italic_r ( blackboard_I ) ∣ ∣ italic_A end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL over~ start_ARG italic_A end_ARG start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = italic_A italic_g italic_g italic_r ( blackboard_O ) ∣ ∣ italic_A end_CELL start_CELL end_CELL end_ROW

By utilizing Eq. 6, we can obtain the weighted adjacency matrix. In this equation, Aggr(·)𝐴𝑔𝑔𝑟·Aggr(\textbf{\textperiodcentered})italic_A italic_g italic_g italic_r ( · ) denotes the aggregation function applied to the OD flow feature matrix. In this paper, the feature matrices of all time slots are summed and then normalized to determine the degree of correlation between the nodes, after which the degree of correlation between the nodes is filtered according to a set threshold to determine the final weight matrix. The symbol \mid\mid∣ ∣ represents the process of combining the weight matrix with the original adjacency matrix A𝐴Aitalic_A after aggregation. It is important to note that different models may employ various approaches to handle this combination process. The comparison in correlation between nodes is shown in Fig. 4, where it can be clearly seen that the adjacencies have been greatly simplified and have different weights, allowing each node in the model to better focus on the nodes with which it is most correlated.

Refer to caption
(a) Kmeans-72
Refer to caption
(b) Shenzhen Bike-Sharing Density Distribution
Refer to caption
(c) HDPC-L-72
Refer to caption
(d) HDPC-L-1120
Figure 5. Comparison of Clustering Results of Different Methods on Shenzhen Bike-Sharing Dataset.

5. Numerical Experiments

5.1. Experimental Setup

5.1.1. Construction of the Datasets

To validate the effectiveness of our approach, we conducted four sets of experiments on two real-world datasets. The first dataset we used is the Shenzhen bike-sharing dataset, which consists of 59.3 million orders. The data spans from May 17th, 2021 to June 27th, covering a total of 42 days. After obtaining the clustering results and establishing the nodes, we divided the time span into 15-minute intervals. Each interval was then assigned to the corresponding node, allowing to create an inflow and outflow datasets for each station. The second dataset we utilized is the Haikou ride-hailing Dataset, containing 12.4 million orders. This dataset spans from May 1st, 2017 to October 1st, 2017, covering a total of 184 days. The next treatment is roughly the same as above, the only difference is the time slot is 30-minute.

5.1.2. Evaluation Metrics

We quantify the performance of our model using six metrics:

(1) Edge Quantity: The total number of edges of the graph built based on the dataset.

(2) Training Time for each Epoch: The time in seconds for the model to train an epoch on the dataset.

(3) Root Mean Squared Error (RMSE):

(7) RMSE=1MNj=1Mi=1N(yijy^ij)2𝑅𝑀𝑆𝐸1𝑀𝑁superscriptsubscript𝑗1𝑀superscriptsubscript𝑖1𝑁superscriptsuperscriptsubscript𝑦𝑖𝑗superscriptsubscript^𝑦𝑖𝑗2RMSE=\sqrt{\frac{1}{MN}\sum\limits_{j=1}^{M}\sum\limits_{i=1}^{N}{\left(y_{i}^% {j}-\widehat{y}_{i}^{j}\right)}^{2}}italic_R italic_M italic_S italic_E = square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_M italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

(4) Accuracy:

(8) Accuracy=11MNj=1Mi=1Nyijy^ijyij𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦11𝑀𝑁superscriptsubscript𝑗1𝑀superscriptsubscript𝑖1𝑁superscriptsubscript𝑦𝑖𝑗superscriptsubscript^𝑦𝑖𝑗superscriptsubscript𝑦𝑖𝑗Accuracy=1-\frac{1}{MN}\sum\limits_{j=1}^{M}\sum\limits_{i=1}^{N}{\frac{y_{i}^% {j}-\widehat{y}_{i}^{j}}{y_{i}^{j}}}italic_A italic_c italic_c italic_u italic_r italic_a italic_c italic_y = 1 - divide start_ARG 1 end_ARG start_ARG italic_M italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT divide start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG start_ARG italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT end_ARG

(5) Coefficient of Determination(R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT):

(9) R2=1j=1Mi=1N(yijy^ij)2j=1Mi=1N(yijY¯)2superscript𝑅21superscriptsubscript𝑗1𝑀superscriptsubscript𝑖1𝑁superscriptsuperscriptsubscript𝑦𝑖𝑗superscriptsubscript^𝑦𝑖𝑗2superscriptsubscript𝑗1𝑀superscriptsubscript𝑖1𝑁superscriptsuperscriptsubscript𝑦𝑖𝑗¯𝑌2R^{2}=1-\frac{\sum_{j=1}^{M}\sum_{i=1}^{N}{\left(y_{i}^{j}-\widehat{y}_{i}^{j}% \right)}^{2}}{\sum_{j=1}^{M}\sum_{i=1}^{N}{\left(y_{i}^{j}-\overline{Y}\right)% }^{2}}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1 - divide start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ( italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT - over¯ start_ARG italic_Y end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

(6) Explained Variance(EV𝐸𝑉E-Vitalic_E - italic_V):

(10) EV=1Var(YY^)Var(Y)𝐸𝑉1𝑉𝑎𝑟𝑌^𝑌𝑉𝑎𝑟𝑌E-V=1-\frac{Var\left(Y-\widehat{Y}\right)}{Var\left(Y\right)}italic_E - italic_V = 1 - divide start_ARG italic_V italic_a italic_r ( italic_Y - over^ start_ARG italic_Y end_ARG ) end_ARG start_ARG italic_V italic_a italic_r ( italic_Y ) end_ARG

where yijsuperscriptsubscript𝑦𝑖𝑗y_{i}^{j}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT and y^ijsuperscriptsubscript^𝑦𝑖𝑗\widehat{y}_{i}^{j}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT represent the ground truth and predicted one of the j𝑗jitalic_jth time sample in the i𝑖iitalic_ith node. M𝑀Mitalic_M is the number of the time slots; N𝑁Nitalic_N is the number of nodes; Y𝑌Yitalic_Y and Y^^𝑌\widehat{Y}over^ start_ARG italic_Y end_ARG represent the set of yijsuperscriptsubscript𝑦𝑖𝑗y_{i}^{j}italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT and y^ijsuperscriptsubscript^𝑦𝑖𝑗\widehat{y}_{i}^{j}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT respectively, and Y¯¯𝑌\overline{Y}over¯ start_ARG italic_Y end_ARG is the average of Y𝑌Yitalic_Y.

Specifically, RMSE𝑅𝑀𝑆𝐸RMSEitalic_R italic_M italic_S italic_E and Accuracy𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦Accuracyitalic_A italic_c italic_c italic_u italic_r italic_a italic_c italic_y are used to measure the prediction error and prediction precision, respectively. R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT and EV𝐸𝑉E-Vitalic_E - italic_V calculate the correlation coefficient, which measures the ability of the model to represent the ground truth data: the larger the value is, the better the model is.

5.1.3. Task Setting

We divide the Shenzhen dockless bike-sharing and Haikou ride-hailing datasets into training and testing sets according to 8:2. We uniformly use 16 historical time slots to predict 4 future time slots. To ensure a fair comparison, we maintained consistency in the hyperparameters used before and after the enhancement of the baseline model. However, it’s important to note that the hyperparameters may vary between different baseline models. This approach ensures that any observed improvements can be attributed to the modifications made to the baseline model rather than differences in hyperparameter settings.

5.1.4. Baseline Models

We validate our approach on the following baseline model:

  • GCN(Kipf and Welling, 2016): A type of GNNs aggregates information from neighboring nodes to update the node’s representation.

  • TGCN(Zhao et al., 2020): It combines GCN to learn complex topologies to capture spatial dependencies and GRU to learn dynamic changes in traffic data to capture temporal dependencies.

  • A3TGCN(Bai et al., 2021): It is similar to the TGCN structure in that it introduces an attention mechanism to adjust the importance of different time points and combines global time information to improve prediction accuracy.

  • STGCN(Yu et al., 2018): It uses convolution to build the model, GCN is used to capture spatial features and TCN is used to capture temporal features.

  • GraphWaveNet(Wu et al., 2019): It develops a novel adaptive dependency matrix and learns it through node embedding.

Refer to caption
Figure 6. Partial schematic of our approach to nodes identified in Shenzhen.
Refer to caption
(a) Outflow of Haikou Ride-Hailing
Refer to caption
(b) Outflow of Shenzhen Bike-Sharing
Figure 7. Accuracy during training for several models.
Table 1. Average area of clusters in each layer after HDPC-L hierarchical clustering in Shenzhen and Haikou.
Layer1𝐿𝑎𝑦𝑒𝑟1Layer1italic_L italic_a italic_y italic_e italic_r 1 Layer2𝐿𝑎𝑦𝑒𝑟2Layer2italic_L italic_a italic_y italic_e italic_r 2 Layer3𝐿𝑎𝑦𝑒𝑟3Layer3italic_L italic_a italic_y italic_e italic_r 3 Layer4𝐿𝑎𝑦𝑒𝑟4Layer4italic_L italic_a italic_y italic_e italic_r 4
Shenzhen 143.52km2143.52𝑘superscript𝑚2143.52km^{2}143.52 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 23.92km223.92𝑘superscript𝑚223.92km^{2}23.92 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 6.06km26.06𝑘superscript𝑚26.06km^{2}6.06 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 1.54km21.54𝑘superscript𝑚21.54km^{2}1.54 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Haikou 390.60km2390.60𝑘superscript𝑚2390.60km^{2}390.60 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 97.65km297.65𝑘superscript𝑚297.65km^{2}97.65 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 32.55km232.55𝑘superscript𝑚232.55km^{2}32.55 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 10.85km210.85𝑘superscript𝑚210.85km^{2}10.85 italic_k italic_m start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
Table 2. Comparison of the performance of baseline models using our improved methodology across 2 real-world datasets.
Dataset Model Outflow Inflow
Time/Epoch Accuracy RMSE R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT EV𝐸𝑉E-Vitalic_E - italic_V Time/Epoch Accuracy RMSE R2superscript𝑅2R^{2}italic_R start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT EV𝐸𝑉E-Vitalic_E - italic_V
Haikou Ride-Hailing A3TGCN 82.40 4.97% 0.050 0.080 0.083 84.94 5.45% 0.053 0.076 0.077
A3TGCN(ours) 57.35 47.27% 0.027 0.698 0.701 55.87 45.29% 0.031 0.677 0.680
GCN 0.85 1.58% 0.051 0.006 0.006 0.87 4.74% 0.054 0.034 0.034
GCN(ours) 0.85 74.51% 0.012 0.926 0.927 0.85 76.40% 0.012 0.937 0.938
GraphWaveNet 136.10 70.35% 0.016 0.906 0.906 137.70 70.85% 0.017 0.909 0.909
GraphWaveNet(ours) 104.90 70.55% 0.015 0.907 0.907 101.70 70.97% 0.016 0.910 0.910
TGCN 14.76 73.61% 0.013 0.918 0.920 14.19 75.78% 0.013 0.931 0.931
TGCN(ours) 14.46 80.57% 0.010 0.955 0.955 14.26 81.34% 0.010 0.959 0.960
STGCN 7.53 66.73% 0.015 0.842 0.843 8.00 68.24% 0.016 0.872 0.874
STGCN(ours) 7.65 73.29% 0.013 0.910 0.911 7.62 71.71% 0.015 0.906 0.912
Shenzhen Bike-Sharing A3TGCN 117.20 11.10% 0.016 0.170 0.172 117.30 11.53% 0.018 0.173 0.176
A3TGCN(ours) 25.12 37.09% 0.011 0.568 0.572 26.35 34.07% 0.013 0.528 0.531
GCN 0.31 6.832% 0.017 0.081 0.082 0.28 7.27% 0.019 0.107 0.109
GCN(ours) 0.38 52.00% 0.008 0.740 0.760 0.27 52.27% 0.009 0.740 0.759
GraphWaveNet 1538.19 51.00% 0.008 0.713 0.716 786.31 50.64% 0.0094 0.709 0.716
GraphWaveNet(ours) 61.69 50.61% 0.008 0.708 0.712 62.45 50.56% 0.0096 0.704 0.707
TGCN 5.80 28.39% 0.010 0.379 0.377 5.80 30.89% 0.011 0.389 0.382
TGCN(ours) 5.89 52.00% 0.007 0.706 0.707 5.76 51.64% 0.008 0.711 0.717
STGCN 10.18 54.07% 0.008 0.759 0.761 10.18 57.30% 0.009 0.792 0.793
STGCN(ours) 10.25 62.14% 0.007 0.829 0.827 10.17 61.20% 0.008 0.823 0.824

5.2. Experimental Results

5.2.1. Graph Construction

We utilized the HDPC-L method to obtain the graph nodes and selected K-means for comparison. The results obtained by HDPC-L and K-means when clustering 72 nodes are depicted in Figure 5. Density-based clustering proves to be more suitable than conventional clustering methods. As seen in Figure 5(b), the nodes obtained by K-means are distributed in a relatively uniform manner, whereas the nodes obtained by HDPC-L are clustered in regions with higher density. This clustering behavior aligns more closely with the requirements of real-world free-floating traffic management systems, making it a more reasonable approach. Initially, we decided to compare the impact of hierarchical clustering on the results. However, during the experiments, we encountered a limitation in directly clustering based on density, as it required a minimum of 865G865𝐺865G865 italic_G of computer running memory. Unfortunately, our device did not possess such a large memory capacity, making it infeasible to obtain 1120 nodes through direct density-based clustering. This highlighted the impracticality of performing clustering operations on large-scale datasets solely based on density. Instead, we adopted a hierarchical clustering approach, which proved to be a viable solution. Figure 5(d) shows the final 1120 nodes obtained through HDPC-L on Shenzhen bike-sharing dataset. Evidently, hierarchical clustering plays a crucial role in addressing large-scale clustering challenges. As shown in Fig. 6, we show part of the distribution of our finalized graph nodes in the city of Shenzhen, where the black circle represents the site and the colored area represents the area under the jurisdiction of the site. We performed four layers of clustering on both datasets, and the average area of the clusters in each layer is shown in Table 1.

5.2.2. Baseline Models Enhancement

We initialize edge weights for five GCN-based baseline models based on the OD traffic relationship and compare them using six metrics on the Shenzhen bike-sharing dataset and the Haikou ride-hailing dataset. The results are shown in Table 2, where all the models are enhanced in different aspects. (1) The edge quantity in the four graphs is significantly reduced (Haikou:111502647outflow,3138inflow:𝐻𝑎𝑖𝑘𝑜𝑢11150subscript2647𝑜𝑢𝑡𝑓𝑙𝑜𝑤subscript3138𝑖𝑛𝑓𝑙𝑜𝑤Haikou:11150\Rightarrow 2647_{outflow},3138_{inflow}italic_H italic_a italic_i italic_k italic_o italic_u : 11150 ⇒ 2647 start_POSTSUBSCRIPT italic_o italic_u italic_t italic_f italic_l italic_o italic_w end_POSTSUBSCRIPT , 3138 start_POSTSUBSCRIPT italic_i italic_n italic_f italic_l italic_o italic_w end_POSTSUBSCRIPT; Shenzhen:11097413711outflow,14995inflow:𝑆𝑒𝑛𝑧𝑒𝑛110974subscript13711𝑜𝑢𝑡𝑓𝑙𝑜𝑤subscript14995𝑖𝑛𝑓𝑙𝑜𝑤Shenzhen:110974\Rightarrow 13711_{outflow},14995_{inflow}italic_S italic_h italic_e italic_n italic_z italic_h italic_e italic_n : 110974 ⇒ 13711 start_POSTSUBSCRIPT italic_o italic_u italic_t italic_f italic_l italic_o italic_w end_POSTSUBSCRIPT , 14995 start_POSTSUBSCRIPT italic_i italic_n italic_f italic_l italic_o italic_w end_POSTSUBSCRIPT), and the graph structure is greatly simplified. (2) The A3TGCN and GCN models are greatly enhanced and the metrics are dramatically improved, for example, A3TGCN is enhanced by 40% and 25% on the two datasets, and GCN is enhanced by 66% and 45% on the two datasets, respectively, in terms of accuracy. These two models mainly reflect the spatial dependency, and their enhancement fully reflects that our method significantly enhances the spatial representation of graphs. (3) The improvement in accuracy on the two datasets is 22% and 6% for TGCN and 6% and 5% for STGCN, which is smaller than that of the A3TGCN and GCN models, mainly due to the fact that they model the time dependence as well. The enhancement is much greater on the Shenzhen bike-sharing dataset than on the Haikou ride-hailing dataset. In the graphs we built, the Haikou ride-hailing dataset has 180 nodes while the Shenzhen bike-sharing dataset has 1120 nodes, demonstrating that our method performs exceptionally well when optimizing large graph structures. In addition, as shown in Fig. 7, our method also speeds up the convergence of GNNs and the training process is smoother compared to the original model. (4) The improvement of GraphWaveNet is not significant. One possible reason is that its adaptive graph structure captures spatial dependencies and compensates for the lack of the original graph structure. However, the training speed of the original GraphWaveNet is very slow. Our method significantly reduces the training time of GraphWaveNet by approximately 30% on the Haikou ride-hailing dataset and approximately 90% on the Shenzhen bike-sharing dataset, while maintaining a similar level of prediction accuracy.

6. Conclusions

This paper presents a novel approach to construct graphs that can be used for free-floating mode traffic demand forecasting. Our approach proposes the HDPC-L density-based hierarchical clustering method, which significantly enhances computational efficiency and fosters a more coherent graph structure. By adopting a data-driven perspective, we extract Origin-Destination (OD) traffic information from the original dataset, which enables us to initialize edge weights and simplify the graph structure. Our method has demonstrated great improvements, with an average accuracy increase of 24.96% and 19.46% on the two datasets, respectively. Additionally, our approach has significantly improved computational efficiency by reducing training time by 12.05% and 32.40% on the two datasets, respectively.

We also notice that this study has several limitations, and there are many possible future research directions. We will explore how to optimize the graph structures we build in conjunction with other data, and look forward to extending the approach to a broader field and building corresponding python library for use.

7. Acknowledgments

This study was supported by the National Key R&\&&D Program of China [Grant 2021ZD0112700]. We would like to thank Didi Chuxing GAIA Initiative for providing the ride-hailing order data and Shenzhen Municipal Government for providing the bike-sharing order data.

References

  • (1)
  • Bai et al. (2021) Jiandong Bai, Jiawei Zhu, Yujiao Song, Ling Zhao, Zhixiang Hou, Ronghua Du, and Haifeng Li. 2021. A3t-gcn: Attention temporal graph convolutional network for traffic forecasting. ISPRS International Journal of Geo-Information 10, 7 (2021), 485.
  • Chen et al. (2022) Changlu Chen, Yanbin Liu, Ling Chen, and Chengqi Zhang. 2022. Bidirectional spatial-temporal adaptive transformer for Urban traffic flow forecasting. IEEE Transactions on Neural Networks and Learning Systems (2022).
  • Cui et al. (2019) Zhiyong Cui, Kristian Henrickson, Ruimin Ke, and Yinhai Wang. 2019. Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting. IEEE Transactions on Intelligent Transportation Systems 21, 11 (2019), 4883–4894.
  • De Fabritiis et al. (2008) Corrado De Fabritiis, Roberto Ragona, and Gaetano Valenti. 2008. Traffic estimation and prediction based on real time floating car data. In 2008 11th international IEEE conference on intelligent transportation systems. IEEE, 197–203.
  • Du et al. (2021) Bowen Du, Xiao Hu, Leilei Sun, Junming Liu, Yanan Qiao, and Weifeng Lv. 2021. Traffic Demand Prediction Based on Dynamic Transition Convolutional Neural Network. IEEE Transactions on Intelligent Transportation Systems 22, 2 (2021), 1237–1247.
  • Fang et al. (2021) Mengyuan Fang, Luliang Tang, Xue Yang, Yang Chen, Chaokui Li, and Qingquan Li. 2021. FTPG: A fine-grained traffic prediction method with graph attention network using big trace data. IEEE Transactions on Intelligent Transportation Systems 23, 6 (2021), 5163–5175.
  • Gupta et al. (2023) Mridul Gupta, Hariprasad Kodamana, and Sayan Ranu. 2023. Frigate: Frugal Spatio-temporal Forecasting on Road Networks. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23). Association for Computing Machinery, 649–660.
  • He et al. (2022) Yuxin He, Lishuai Li, Xinting Zhu, and Kwok Leung Tsui. 2022. Multi-Graph Convolutional-Recurrent Neural Network (MGC-RNN) for Short-Term Forecasting of Transit Passenger Flow. IEEE Transactions on Intelligent Transportation Systems 23, 10 (2022), 18155–18174.
  • Hua et al. (2020) Mingzhuang Hua, Xuewu Chen, Shujie Zheng, Long Cheng, and **gxu Chen. 2020. Estimating the parking demand of free-floating bike sharing: A journey-data-based study of Nan**g, China. Journal of Cleaner Production 244 (2020), 118764.
  • Huang et al. (2022) Feihu Huang, Peiyu Yi, **ce Wang, Mengshi Li, Jian Peng, and Xi Xiong. 2022. A dynamical spatial-temporal graph neural network for traffic demand prediction. Information Sciences 594 (2022), 286–304.
  • Hulot et al. (2018) Pierre Hulot, Daniel Aloise, and Sanjay Dominik Jena. 2018. Towards Station-Level Demand Prediction for Effective Rebalancing in Bike-Sharing Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’18). Association for Computing Machinery, 378–386.
  • James (2022) JQ James. 2022. Graph construction for traffic prediction: A data-driven approach. IEEE Transactions on Intelligent Transportation Systems 23, 9 (2022), 15015–15027.
  • Jiang et al. (2023) Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, and **gyuan Wang. 2023. PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for Traffic Flow Prediction. Proceedings of the AAAI Conference on Artificial Intelligence 37, 4 (Jun. 2023), 4365–4373.
  • Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations.
  • Lan et al. (2022) Shiyong Lan, Yitong Ma, Weikang Huang, Wenwu Wang, Hongyu Yang, and Pyang Li. 2022. Dstagnn: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In International conference on machine learning. PMLR, 11906–11917.
  • Li and Axhausen (2020) Aoyong Li and Kay W Axhausen. 2020. Short-term traffic demand prediction using graph convolutional neural networks. AGILE: GIScience Series 1 (2020), 12.
  • Li and Lasenby (2021) Duo Li and Joan Lasenby. 2021. Spatiotemporal attention-based graph convolution network for segment-level traffic prediction. IEEE Transactions on Intelligent Transportation Systems 23, 7 (2021), 8337–8345.
  • Li et al. (2022) Guanyao Li, Xiaofeng Wang, Gunarto Sindoro Njoo, Shuhan Zhong, S-H Gary Chan, Chih-Chieh Hung, and Wen-Chih Peng. 2022. A data-driven spatial-temporal graph neural network for docked bike prediction. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 713–726.
  • Li et al. (2018) Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In International Conference on Learning Representations.
  • Li et al. (2015) Yexin Li, Yu Zheng, Huichu Zhang, and Lei Chen. 2015. Traffic prediction in a bike-sharing system. In Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems. 1–10.
  • Lv et al. (2020) Chang Lv, Chaoyong Zhang, Kunlei Lian, Ya** Ren, and Leilei Meng. 2020. A hybrid algorithm for the static bike-sharing re-positioning problem based on an effective clustering strategy. Transportation Research Part B: Methodological 140, C (2020), 1–21.
  • Lv et al. (2018) Zhongjian Lv, Jiajie Xu, Kai Zheng, Hongzhi Yin, Pengpeng Zhao, and Xiaofang Zhou. 2018. Lc-rnn: A deep learning model for traffic speed prediction.. In IJCAI, Vol. 2018. 27th.
  • Rodriguez and Laio (2014) Alex Rodriguez and Alessandro Laio. 2014. Clustering by fast search and find of density peaks. science 344, 6191 (2014), 1492–1496.
  • Song et al. (2020) Chao Song, Youfang Lin, Shengnan Guo, and Huaiyu Wan. 2020. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 914–921.
  • Tang et al. (2021) **jun Tang, Jian Liang, Fang Liu, **g**g Hao, and Yinhai Wang. 2021. Multi-community passenger demand prediction at region level based on spatio-temporal graph convolutional network. Transportation Research Part C: Emerging Technologies 124 (2021).
  • Tian et al. (2024) Zihao Tian, **g Zhou, Lixin Tian, and David Z.W. Wang. 2024. Dynamic spatio-temporal interactive clustering strategy for free-floating bike-sharing. Transportation Research Part B: Methodological 179 (2024), 102872.
  • Wang et al. (2019) Junhua Wang, Tianyang Luo, and Ting Fu. 2019. Crash prediction based on traffic platoon characteristics using floating car trajectory data and the machine learning approach. Accident Analysis & Prevention 133 (2019), 105320.
  • Wu et al. (2019) Zonghan Wu, Shirui Pan, Guodong Long, **g Jiang, and Chengqi Zhang. 2019. Graph wavenet for deep spatial-temporal graph modeling. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). AAAI Press, 1907–1913.
  • Xie et al. (2023) Peng Xie, Minbo Ma, Tianrui Li, Shenggong Ji, Shengdong Du, Zeng Yu, and Junbo Zhang. 2023. Spatio-Temporal Dynamic Graph Relation Learning for Urban Metro Flow Prediction. IEEE Transactions on Knowledge and Data Engineering (2023).
  • Yao et al. (2018) Huaxiu Yao, Fei Wu, **tao Ke, ** Ye, and Zhenhui Li. 2018. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
  • Yu et al. (2018) Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). AAAI Press, 3634–3640.
  • Yuan et al. (2018) Zhuoning Yuan, Xun Zhou, and Tianbao Yang. 2018. Hetero-ConvLSTM: A Deep Learning Approach to Traffic Accident Prediction on Heterogeneous Spatio-Temporal Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD ’18). Association for Computing Machinery, New York, NY, USA, 984–992.
  • Zhang et al. (2019) Kunpeng Zhang, Zijian Liu, and Liang Zheng. 2019. Short-term prediction of passenger demand in multi-zone level: Temporal convolutional neural network with multi-task learning. IEEE transactions on intelligent transportation systems 21, 4 (2019), 1480–1490.
  • Zhao et al. (2020) Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and Haifeng Li. 2020. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Transactions on Intelligent Transportation Systems 21, 9 (2020), 3848–3858.
  • Zhou et al. (2018) Xian Zhou, Yanyan Shen, Yanmin Zhu, and Linpeng Huang. 2018. Predicting multi-step citywide passenger demands using attention-based neural networks. In Proceedings of the Eleventh ACM international conference on web search and data mining. 736–744.
  • Zonoozi et al. (2018) Ali Zonoozi, Jung-jae Kim, Xiao-Li Li, and Gao Cong. 2018. Periodic-CRN: A Convolutional Recurrent Model for Crowd Density Prediction with Recurring Periodic Patterns. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization.