Graph Construction with Flexible Nodes for Traffic Demand Prediction

**yan Hou , Shan Liu , Ya Zhang School of Automation, Southeast UniversityNan**gChina [email protected] [email protected] [email protected] and Haotong Qin Department of Information Technology and Electrical Engineering, ETH ZürichZürichSwitzerland [email protected]

(2024)

Abstract.

Graph neural networks (GNNs) have been widely applied in traffic demand prediction, and transportation modes can be divided into station-based mode and free-floating traffic mode. Existing research in traffic graph construction primarily relies on map matching to construct graphs based on the road network. However, the complexity and inhomogeneity of data distribution in free-floating traffic demand forecasting make road network matching inflexible. To tackle these challenges, this paper introduces a novel graph construction method tailored to free-floating traffic mode. We propose a novel density-based clustering algorithm (HDPC-L) to determine the flexible positioning of nodes in the graph, overcoming the computational bottlenecks of traditional clustering algorithms and enabling effective handling of large-scale datasets. Furthermore, we extract valuable information from ridership data to initialize the edge weights of GNNs. Comprehensive experiments on two real-world datasets, the Shenzhen bike-sharing dataset and the Haikou ride-hailing dataset, show that the method significantly improves the performance of the model. On average, our models show an improvement in accuracy of around 25% and 19.5% on the two datasets. Additionally, it significantly enhances computational efficiency, reducing training time by approximately 12% and 32.5% on the two datasets. We make our code available at https://github.com/hou**yan/HDPC-L-ODInit.

Graph Construction,Graph Neural Network,Free-Floating Traffic Mode,Traffic Demand Prediction

^†^†copyright: acmlicensed^†^†journalyear: 2024^†^†doi: XXXXXXX.XXXXXXX^†^†conference: Make sure to enter the correct conference title from your rights confirmation emai; August 25–29, 2024; Barcelona, Spain^†^†isbn: 978-1-4503-XXXX-X/24/06^†^†ccs: Information systems Spatial-temporal systems

1. INTRODUCTION

In recent years, a variety of new modes of transportation have emerged, facilitating the lives of residents while also bringing great challenges to transportation planning, especially in the increasingly complex urban road network. For instance, the rapid expansion of free-floating bike sharing systems (FFBS) in China has led to a widely reported problem of oversupply(Tian et al., 2024). To enhance efficiency, operators require effective rebalancing methods and intelligent management strategies. Particularly when dealing with large-scale complex systems, the implementation of spatial and time-based zoning management strategies becomes crucial. However, these new transportation systems suffer from significant spatial and temporal imbalances due to various factors. Thus, accurate prediction of service demand at different locations and times is necessary to enhance traveler satisfaction and reduce operational costs(Tian et al., 2024).

Previous studies(Du et al., 2021; Yao et al., 2018; Lv et al., 2018; Zhou et al., 2018; Zonoozi et al., 2018) have attempted to use convolutional neural networks (CNNs) for predicting free-floating traffic demand, but this approach is fundamentally flawed because CNNs are not suitable for handling non-Euclidean spatial data, such as transportation networks. For instance, in image data, the proximity of pixels often corresponds to semantic similarity, whereas in the spatial distribution of traffic demand, distant regions may hold greater relevance than nearby regions due to a complex topological relationship. This relationship poses a challenge for CNNs as they struggle to capture such nuances.

Graph neural networks (GNNs) offer a valuable approach for capturing non-Euclidean spatial relationships within transportation networks. Consequently, previous research has explored various methodologies for constructing graphs based on transportation networks, which can generally be categorized into two groups: manual determination and automatic determination through machine learning algorithms. In manual determination, some researchers have directly matched origins and destinations with the road network, using midpoints or intersections of roads as nodes (De Fabritiis et al., 2008; Wang et al., 2019). Other studies have adopted a grid-based approach, partitioning the space into grids with each grid serving as a node(Yuan et al., 2018; Huang et al., 2022; Zhang et al., 2019). However, these algorithms have limitations. First of all, they lack flexibility because they only allow nodes to be selected from a limited number of options, and the granularity at which they are modeled is fixed and cannot be flexibly adjusted. Secondly, due to the complex and intricate nature of large-scale traffic networks, different strategies must be developed for different areas to accurately determine the nodes, reducing efficiency and restricting applicability to a large number of cases. Alternatively, machine learning methods can be employed for automatic node determination, often utilizing clustering techniques(Hulot et al., 2018; Li et al., 2015). However, these algorithms are typically designed to cluster and group a pre-existing set of fixed nodes, limiting their flexibility. Additionally, the high computational complexity of clustering algorithms makes them challenging to apply to large-scale datasets. In conclusion, while manual determination and clustering-based automatic determination have been explored, they each possess certain limitations in terms of flexibility, efficiency, and applicability. We aim to address these limitations and develop innovative methods for defining rational and flexible nodes that lead to the construction of accurate and insightful graphs, which in turn will lead to the application of graph neural networks to free-floating traffic mode demand forecasting.

Refer to caption — (a) ride-hailing order distribution of Haikou

To address the aforementioned challenges, we propose HDPC-L, a hierarchical clustering algorithm based on density clustering. Traditional clustering algorithms are not well-suited for datasets associated with transportation demand problems due to imbalanced data distributions. Taking the Shenzhen bike-sharing dataset as an example, in Fig. 1, the density of data points in different regions exhibits significant disparity and disorder. Ordinary methods like K-means fail to yield satisfactory results in such scenarios. In contrast, our method overcomes this limitation by employing density-based clustering techniques. Moreover, the concept of hierarchical clustering significantly reduces the computational complexity of our approach while extending its performance capabilities. Regardless of the regional scale, our algorithm enables the construction of a reasonable graph structure. This graph structure is also highly flexible, allowing us to freely adjust the number of nodes as required and model at various levels of granularity.

Unlike many existing models(Wu et al., 2019; Song et al., 2020; Chen et al., 2022; Fang et al., 2021; James, 2022; Jiang et al., 2023; Lan et al., 2022) which primarily focus on structural improvements through module redesign or the addition of attention or transformer modules, our approach aims to explore and harness valuable information embedded in the ridership data itself. For datasets containing typical origin-destination (OD) information, we employ statistical methods to uncover the OD flow relationships between nodes. Based on this, we generate a weight matrix that serves as an initialization parameter alongside the original adjacency matrix, and subsequently train our models. We conducted experiments using five baseline models and observed improvements across all six evaluation metrics: Accuracy, RMSE, R2, Explained Variance, Edge Quantity and Training Time for each Epoch.

Our main contributions can be summarized as follows:

$\bullet$

To the best of our knowledge, we are the first to construct graphs with flexible nodes for predicting free-floating traffic demand, expanding the application area of GNNs.
$\bullet$

We introduce HDPC-L, a novel hierarchical density clustering algorithm designed to identify rational and adaptable graph nodes. HDPC-L addresses the shortcomings encountered by existing algorithms such as high computational complexity, lack of flexibility, and unsuitability for large-scale datasets.
$\bullet$

Pioneeringly, we extract origin-destination (OD) information from the dataset and use it as the basis for initializing the graph edge weights. This fusion simplifies the graph while improving its representational capabilities, which in turn significantly improves the performance of the model.
$\bullet$

We validate the effectiveness of this approach on two real-world datasets, with 24.96% and 19.46% improvement in accuracy and 12.05% and 32.40% reduction in training time, respectively.

2. RELATED WORK

Our paper is related to the following research directions:

2.1. Graph Construction

Graph neural networks (GNNs) possess remarkable capabilities in capturing spatial dependencies within transportation networks, prompting extensive research into constructing graph structures. These studies can be broadly classified into three categories: (1)The first category involves fixed graph structures where nodes are derived from real-world sensor locations. For instance, some researchers leverage the PeMS dataset for highway traffic prediction in California (Fang et al., 2021; Chen et al., 2022; Li et al., 2018; Wu et al., 2019; Jiang et al., 2023; Li and Lasenby, 2021; James, 2022; Zhao et al., 2020), while others utilize datasets from Chicago and Los Angeles bike-sharing systems for stacked demand prediction(Li et al., 2022). However, the construction of such graphs is inflexible, demanding high road network accuracy, lacking flexibility, and offering limited options for node selection. (2)The second category involves rasterizing space and using grid points as graph nodes. Examples include dividing cities into square areas (Huang et al., 2022), segmenting total areas into smaller regions (Li and Axhausen, 2020; Zhang et al., 2019), or delineating small areas based on road networks (Tang et al., 2021). (3)The third category employs unsupervised clustering for graph construction. Researchers have reclustered stations in New York’s bike-sharing dataset for station classification(Li et al., 2015). Additionally, methods like Traffic Analysis Zone (TAZ) clustering for free-floating bike-sharing systems (FFBS) have been developed, requiring spatial and temporal data analysis with high computational complexity(Tian et al., 2024; Hua et al., 2020; Lv et al., 2020). Our HDPC-L method addresses these challenges by flexibly generating rational nodes with minimal computational complexity. Furthermore, due to its hierarchical clustering design, we can increase the number of clusters infinitely by increasing the number of layers to reduce the modeling granularity.

2.2. Graph Neural Network

Graph Neural Networks (GNNs) have become important tools for representing spatial topological relationships in Intelligent Transportation Systems (ITS) due to their expertise in handling non-Euclidean data. Deep learning architectures leveraging GNNs find extensive application in traffic prediction tasks: DCRNN (Li et al., 2018) conceptualizes each moment within the traffic system as a diffusion process of vehicles across the road network; TGCN (Zhao et al., 2020) integrates Graph Convolutional Networks (GCNs) and Gated Recurrent Units (GRUs) to capture both spatial and temporal dependencies concurrently; GraphWaveNet (Wu et al., 2019) seamlessly amalgamates GCNs and Temporal Convolutional Networks (TCNs) for enhanced predictive capabilities; AST-GAT (Li and Lasenby, 2021) employs multi-head graph attention blocks to capture intricate spatial dependencies; Frigate (Gupta et al., 2023) introduces a novel GNN tailored for scenarios involving missing data, such as road sensor failures or closures; STDGRL (Xie et al., 2023) pioneers a spatio-temporal dynamic graph-relational learning model for predicting traffic flow in urban subway stations; MGC-RNN (He et al., 2022) explores spatio-temporal prediction tasks by fusing heterogeneous data from multiple sources; TGC-LSTM (Cui et al., 2019) defines traffic graph convolution based on physical network topology, augmenting interpretability by integrating L1 and L2 parameters into the model’s loss function. Past research efforts predominantly focused on enhancing model performance through architectural expansions and parameter increments, often neglecting the intrinsic characteristics of the data itself. Our approach diverges by mining Origin-Destination (OD) flow information inherent in traffic ridership datasets. We leverage this information to initialize the edge weights of graph neural networks using the graph adjacency matrix, resulting in a substantial performance enhancement over the baseline model.

2.3. Free-Floating Traffic Demand Prediction

Existing research on forecasting free-floating traffic demand can be broadly categorized into two main streams: those manually constructing the graph structure by aligning it with the road network or rasterizing the spatial domain, as mentioned in Section 2.2 (Zhao et al., 2020; Li and Lasenby, 2021; Li et al., 2018; Wu et al., 2019; Xie et al., 2023), and those that eschew graph structures and rely on Convolutional Neural Networks (CNNs) to encapsulate spatial dependencies within the traffic network. For methods leveraging CNNs to capture spatial dependencies: DTCNN (Du et al., 2021) introduces a dynamically shifted convolutional neural network tailored for precise traffic demand forecasting; DMVST-Net (Yao et al., 2018) proposes a deep multi-view spatio-temporal network framework to model spatial and temporal relationships, utilizing local CNNs to capture local spatial correlations; LC-RNN incorporates a network-embedded convolutional structure aimed at capturing topology-aware features (Lv et al., 2018); Some researchers have adopted a convolutional and ConvLSTM unit-based encoder-decoder framework in citywide passenger demand (Zhou et al., 2018).

3. Preliminaries

Traffic Graph. In traffic prediction problems, traffic network usually be constructed as a graph $G(N,E)$ , where $N$ denotes the node set of the graph and $E\subseteq\mathbb{R}^{N\times N}$ denotes the edge set. In this model, the graph node sets represent the key points of different regions. Taking the bike sharing dataset as an example, the center of clustering for each cluster is a node that manages the shared bikes within the cluster.

Definition 0 ().

(Inflow and Outflow) In this paper, we define $I^{t}\subseteq\mathbb{R}^{N\times N}$ and $O^{t}\subseteq\mathbb{R}^{N\times N}$ . For $I^{t}$ , we use it to refer the inflow among nodes at time slot t, and the same goes for $O^{t}$ . Therefore, we can define $\mathbb{I}=\{I^{1},I^{2},I^{3},\ldots,I^{T}\}\subseteq\mathbb{R}^{T\times N% \times N}$ to represent the inflow among nodes over a period of time, where $T$ denotes the number of time slots and $N$ denotes the number of nodes of the graph, and the same goes for $\mathbb{O}=\{O^{1},O^{2},O^{3},\ldots,O^{T}\}\subseteq\mathbb{R}^{T\times N% \times N}$ .

Free-Floating Traffic Mode. The term ”free-floating” refers to a transportation system that enables users to pick up and drop off shared vehicles at any location within a designated area, rather than being restricted to specific stations or hubs. This mode is commonly associated with various shared mobility services such as ride-hailing, bike-sharing, delivery, or car-sharing, offering users the freedom and flexibility to initiate and conclude their trips at any legal parking spot within a defined operational zone.

4. Methodology

4.1. Overview

The method’s overview depicted in Fig. 2 is as follows:

•

Spatial Division and Map**: The space is segmented into grids with varying granularities. Subsequently, the geographic location information of each data point in the dataset is mapped to the corresponding grid sequentially.
•

HDPC-L Hierarchical Clustering: Beginning with the coarsest granularity grid, termed $layer1$ , data points are clustered using the proposed HDPC-L algorithm to derive the clustering result of the first layer, denoted as $layer1-cluster-order$ . Moving to finer granularities, termed $layer2$ , the clustering process continues. However, instead of clustering all data points in the second layer together, they are segregated based on $layer1-cluster-order$ . Data points belonging to the same cluster in the first layer are then clustered using the HDPC-L algorithm to obtain the clustering result of the second layer, termed $layer2-cluster-order$ . This hierarchical clustering process continues until the desired number of nodes is achieved, thereby determining the graph’s fineness and establishing the set of nodes $N$ .
•

Edge Weight Initialization: Based on the clustering results from Step 2, the area covered by the data set is redivided, and each data point is assigned to its corresponding cluster. Statistical analysis is then conducted to derive all Origin-Destination (OD) traffic relationships among the clustered centers. This process generates the edge weight matrix $W$ of the graph $G$ .
•

Graph Construction: Utilizing the node set $N$ and edge weight matrix W obtained from steps Step 2 and Step 3, respectively, the weighted graph structure $G(N,E,W)$ is established.

4.2. Spatial Division and Map**

To accomplish the clustering task effectively, it is imperative to represent the ridership data within the dataset in a spatially coherent manner. In this regard, we partition the entire geographical area covered by the dataset into a series of grids, each characterized by varying levels of granularity. The outcome of this spatial division is illustrated in Fig. 3, which provides a schematic depiction of the data point density distribution within the Shenzhen bike-sharing dataset under square grid divisions of diverse granularity. Evidently, the choice of granularity profoundly influences our ability to capture the nuances of the data distribution. With larger granularity, we can discern the global distribution patterns, whereas smaller granularity facilitates a more refined characterization of local variations within sub-regions. This approach enables us to effectively balance the representation of both global and local data characteristics, thereby enhancing the robustness and accuracy of subsequent clustering analyses.

4.3. HDPC-L Hierarchical Clustering

4.3.1. Hierarchical Clustering

Hierarchical clustering is an approach used to reduce the computational complexity of clustering algorithms and improve their efficiency. In order to represent the density distribution of origins and destinations, a fine-grained grid is often utilized, such as dividing the original space into a $1000\times 1000$ grid, as illustrated in Section 4.2. However, this division can lead to a huge number of data points, resulting in long computation times or unreasonable clustering outcomes. To address this issue, we adopt a hierarchical clustering method that follows the concept of coarse-to-fine graph modeling. This approach enhances the clustering process by significantly reducing computation requirements and generating more reasonable results. The coarse-to-fine graph approach captures macroscopic spatial dependencies at larger granularities and captures small changes in local spatial dependencies at smaller granularities.

Referring to the schematic in Figure 2, we can gain a clear understanding of the hierarchical clustering approach, with a particular focus on the HDPC-L Hierarchical Clustering section. Firstly, we perform the initial level of clustering on the global area using the largest granularity grid. Our proposed HDPC-L algorithm enables us to manually set or automatically select the desired number of clusters. Consequently, we obtain the first level of clustering results. Subsequently, we repeat this process on a grid with smaller granularity, dividing the ridership data based on the clustering results from the previous layer. Each class is treated as a sub-region and undergoes its own clustering operation to yield the clustering results of the second layer. This process continues, with each subsequent layer delving deeper into the clustering process, until we obtain the desired number of nodes. As the layers progress, the number of nodes increases, leading to a higher level of modeling refinement. Moreover, the hierarchical clustering algorithm is widely applicable and highly adaptable. It can be tailored to suit almost any large-scale dataset by adjusting the number of clustering layers, fine-tuning clustering algorithm parameters, and employing other relevant techniques.

4.3.2. HDPC-L

Density-based clustering algorithms offer several advantages that align with the challenges encountered when modeling free-floating traffic mode problem. Firstly, these algorithms identify clusters of diverse shapes and sizes, providing a level of flexibility that surpasses other clustering methods. Additionally, they demonstrate robustness in the presence of noise and outliers, as they prioritize the identification of regions with high data density rather than individual data points. Moreover, density-based clustering algorithms are adept at handling clusters with varying densities, making them particularly suitable for datasets with non-uniform cluster distribution. As a result, we propose the HDPC-L algorithm, which is based on density peak clustering (DPC)(Rodriguez and Laio, 2014) to effectively identify the graph nodes in the free-floating traffic mode dataset.

The steps for HDPC-L are shown in Algothrim 1. The HDPC-L algorithm operates on two fundamental assumptions. Firstly, it posits that the center of a class cluster is encompassed by lower-density data points within the cluster, with the center representing the point of highest density in its vicinity. Secondly, it emphasizes maximizing the distance between the centers of class clusters. These assumptions underpin the approach taken by the HDPC-L algorithm in tackling the complexities of the free-floating traffic mode datasets. To satisfy these two assumptions, the following two concepts need to be defined—local density $\rho_{i}$ and relative distance $\delta_{i}$ .

(1)

\rho_{i}=\sum\limits_{i\neq j}\beta exp\left[-\left(\frac{d_{ij}}{d_{c}}\right% )^{2}\right]

In Eq. 1, $\beta$ represents the density of data points, $d_{ij}$ represents the Euclidean distance between data point $i$ and data point $j$ , and $d_{c}$ represents the neighborhood truncation distance of data point $i$ .

(2)

\delta_{i}=\max_{i\neq j}{d_{ij}}

(3)

\delta_{i}=\min_{j:\rho_{j}>\rho_{i}}{d_{ij}}

Eq. 2 and 3 are the two formulas for relative distance $\rho_{i}$ referring to the minimum distance between the sample point $i$ and other points with higher density, and the local density of each data point needs to be sorted before calculating the sample point $i$ . Eq. 2 is the relative distance of the sample with the highest density and Eq. 3 is the relative distance of the remaining data points. The data point with the highest density must be the center of the density, and we artificially set its distance to the maximum value. The remaining density peaks need to satisfy both a higher local density $\rho$ and a larger relative distance $\delta$ . Thus, we can define decision variable $\gamma$ :

(4)

\gamma=\rho_{i}\times{\delta_{i}}^{L}

In Eq. 4, L denotes the number of the cluster layer. The reason we define Eq. 4 in this way is that as the number of layers of hierarchical clustering deepens, the influence of $\delta$ on $\gamma$ becomes less and less due to smaller sub-regions, which leads us to get many invalid clustering centers very close to each other in the densely populated regions. To offset this weakening of delta’s influence, we choose $\delta^{L}$ as the criterion for relative distance, which greatly alleviates the decay of $\delta$ ’s influence as the layer deepens.

Input: Sample dataset

D

, number of clustering layers

L

Output: Cluster centers locations

X,Y

, coding of the cluster to which each data point belongs

N

foreach layer in $L$ do

if not the first layer then

Get the results of the previous layers of clustering; Divide

D

into multiple subsets based on the clustering results of the previous layers.

foreach subset in $D$ do

Calculate the distance matrix

d_{ij}

using the subset data; Determine the neighborhood stage distance

d_{c}

; Calculating local density

\rho_{i}

and relative distance

\delta_{i}

; Select clustering centers based on

\gamma

; Categorize data points that are not cluster centers.

Algorithm 1 Steps of HDPC-L clustering algorithm.

4.4. Graph Neural Network Enhancement

4.4.1. Spacial Dependence Modeling

The characterization of spatial dependence plays a crucial role in traffic demand forecasting. CNNs are not suitable for handling non-Euclidean structured data like traffic networks, and we need more powerful tools to effectively characterize these complex spatial dependencies. Fortunately, the emergence of GNNs provides a promising solution for addressing the challenges associated with characterizing graph structures. Currently, most research in traffic prediction leverages GCNs to capture spatial dependencies. By utilizing the known adjacency matrix $A$ and feature matrix $X$ , GCNs can express the transfer relationship between features at each layer, as demonstrated in Eq. 5. In this equation, $H$ represents the features of each layer, where $X$ corresponds to the input layer. Additionally, $\widetilde{A}$ is computed by adding the unit matrix $I$ to the adjacency matrix $A$ . Furthermore, $\sigma$ denotes a nonlinear activation function, while $\widetilde{D}$ signifies the degree matrix derived from $\widetilde{A}$ . Lastly, $W$ represents a learnable parameter matrix. By stacking multiple GCN network layers according to Eq. 5, we can effectively capture the intricate spatial dependencies within the transportation network.

(5)

H^{(l+1)}=\sigma\left(\widetilde{D}^{-\frac{1}{2}}\widetilde{A}\widetilde{D}^{% -\frac{1}{2}}H^{(l)}W^{(l)}\right)

4.4.2. Edge Weight Initialization

The approach for station-based mode differs from that of the free-floating traffic mode. In the case of the station-based mode, the adjacency matrix can be directly obtained from the road network, which can be used to construct the GCN model. However, in the case of the free-floating traffic mode, as explained in Section 4.3, identifying nodes $N$ in the graph $G(N,E)$ does not directly correspond to specific locations within the road network. Consequently, it becomes necessary to extract valuable information from the ridership data to determine if there is any prior knowledge to construct the adjacency matrix of the graph. Free-floating traffic mode datasets possess distinct characteristics compared to other types of traffic datasets, typically containing ridership information. This raw information provides valuable insights for graph construction, particularly regarding the OD flow. To effectively build the graph, we need to begin with spatio-temporal matching, which involves assigning each order to the clusters established in Section 4.3 based on its geographic location. In this way the origins and destinations of the order can help determine which clusters have OD traffic exchanges between them, and by connecting the points with exchanges we obtain the adjacency matrix $A$ of the graph. However, unlike the relatively simpler structure of real-world road networks, the traffic prediction for the free-floating traffic mode is characterized by complexity, flexibility, and diversity. As a result, the graph constructed using this approach becomes highly intricate, with a significantly larger number of edges compared to the graph of the station-based model with a similar order of magnitude in terms of nodes. In order to solve this problem and simplify the structure of the graph reasonably , we compute the OD flow between all nodes within each time slot, obtaining the OD feature matrices $\mathbb{O}$ and $\mathbb{I}$ as defined in Definition. 1.

(6)

\begin{cases}\widetilde{A}_{i}=Aggr{\left(\mathbb{I}\right)}\mid\mid A\\ \widetilde{A}_{o}=Aggr{\left(\mathbb{O}\right)}\mid\mid A\\ \end{cases}

By utilizing Eq. 6, we can obtain the weighted adjacency matrix. In this equation, $Aggr(\textbf{\textperiodcentered})$ denotes the aggregation function applied to the OD flow feature matrix. In this paper, the feature matrices of all time slots are summed and then normalized to determine the degree of correlation between the nodes, after which the degree of correlation between the nodes is filtered according to a set threshold to determine the final weight matrix. The symbol $\mid\mid$ represents the process of combining the weight matrix with the original adjacency matrix $A$ after aggregation. It is important to note that different models may employ various approaches to handle this combination process. The comparison in correlation between nodes is shown in Fig. 4, where it can be clearly seen that the adjacencies have been greatly simplified and have different weights, allowing each node in the model to better focus on the nodes with which it is most correlated.

5. Numerical Experiments

5.1. Experimental Setup

5.1.1. Construction of the Datasets

To validate the effectiveness of our approach, we conducted four sets of experiments on two real-world datasets. The first dataset we used is the Shenzhen bike-sharing dataset, which consists of 59.3 million orders. The data spans from May 17th, 2021 to June 27th, covering a total of 42 days. After obtaining the clustering results and establishing the nodes, we divided the time span into 15-minute intervals. Each interval was then assigned to the corresponding node, allowing to create an inflow and outflow datasets for each station. The second dataset we utilized is the Haikou ride-hailing Dataset, containing 12.4 million orders. This dataset spans from May 1st, 2017 to October 1st, 2017, covering a total of 184 days. The next treatment is roughly the same as above, the only difference is the time slot is 30-minute.

5.1.2. Evaluation Metrics

We quantify the performance of our model using six metrics:

(1) Edge Quantity: The total number of edges of the graph built based on the dataset.

(2) Training Time for each Epoch: The time in seconds for the model to train an epoch on the dataset.

(3) Root Mean Squared Error (RMSE):

(7)

RMSE=\sqrt{\frac{1}{MN}\sum\limits_{j=1}^{M}\sum\limits_{i=1}^{N}{\left(y_{i}^% {j}-\widehat{y}_{i}^{j}\right)}^{2}}

(4) Accuracy:

(8)

Accuracy=1-\frac{1}{MN}\sum\limits_{j=1}^{M}\sum\limits_{i=1}^{N}{\frac{y_{i}^% {j}-\widehat{y}_{i}^{j}}{y_{i}^{j}}}

(5) Coefficient of Determination( $R^{2}$ ):

(9)

R^{2}=1-\frac{\sum_{j=1}^{M}\sum_{i=1}^{N}{\left(y_{i}^{j}-\widehat{y}_{i}^{j}% \right)}^{2}}{\sum_{j=1}^{M}\sum_{i=1}^{N}{\left(y_{i}^{j}-\overline{Y}\right)% }^{2}}

(6) Explained Variance( $E-V$ ):

(10)

E-V=1-\frac{Var\left(Y-\widehat{Y}\right)}{Var\left(Y\right)}

where $y_{i}^{j}$ and $\widehat{y}_{i}^{j}$ represent the ground truth and predicted one of the $j$ th time sample in the $i$ th node. $M$ is the number of the time slots; $N$ is the number of nodes; $Y$ and $\widehat{Y}$ represent the set of $y_{i}^{j}$ and $\widehat{y}_{i}^{j}$ respectively, and $\overline{Y}$ is the average of $Y$ .

Specifically, $RMSE$ and $Accuracy$ are used to measure the prediction error and prediction precision, respectively. $R^{2}$ and $E-V$ calculate the correlation coefficient, which measures the ability of the model to represent the ground truth data: the larger the value is, the better the model is.

5.1.3. Task Setting

We divide the Shenzhen dockless bike-sharing and Haikou ride-hailing datasets into training and testing sets according to 8:2. We uniformly use 16 historical time slots to predict 4 future time slots. To ensure a fair comparison, we maintained consistency in the hyperparameters used before and after the enhancement of the baseline model. However, it’s important to note that the hyperparameters may vary between different baseline models. This approach ensures that any observed improvements can be attributed to the modifications made to the baseline model rather than differences in hyperparameter settings.

5.1.4. Baseline Models

We validate our approach on the following baseline model:

•

GCN(Kipf and Welling, 2016): A type of GNNs aggregates information from neighboring nodes to update the node’s representation.
•

TGCN(Zhao et al., 2020): It combines GCN to learn complex topologies to capture spatial dependencies and GRU to learn dynamic changes in traffic data to capture temporal dependencies.
•

A3TGCN(Bai et al., 2021): It is similar to the TGCN structure in that it introduces an attention mechanism to adjust the importance of different time points and combines global time information to improve prediction accuracy.
•

STGCN(Yu et al., 2018): It uses convolution to build the model, GCN is used to capture spatial features and TCN is used to capture temporal features.
•

GraphWaveNet(Wu et al., 2019): It develops a novel adaptive dependency matrix and learns it through node embedding.

Table 1. Average area of clusters in each layer after HDPC-L hierarchical clustering in Shenzhen and Haikou.

	$Layer1$	$Layer2$	$Layer3$	$Layer4$
Shenzhen	$143.52km^{2}$	$23.92km^{2}$	$6.06km^{2}$	$1.54km^{2}$
Haikou	$390.60km^{2}$	$97.65km^{2}$	$32.55km^{2}$	$10.85km^{2}$

Table 2. Comparison of the performance of baseline models using our improved methodology across 2 real-world datasets.

Dataset	Model	Outflow					Inflow
Dataset	Model	Time/Epoch	Accuracy	RMSE	$R^{2}$	$E-V$	Time/Epoch	Accuracy	RMSE	$R^{2}$	$E-V$
Haikou Ride-Hailing	A3TGCN	82.40	4.97%	0.050	0.080	0.083	84.94	5.45%	0.053	0.076	0.077
	A3TGCN(ours)	57.35	47.27%	0.027	0.698	0.701	55.87	45.29%	0.031	0.677	0.680
	GCN	0.85	1.58%	0.051	0.006	0.006	0.87	4.74%	0.054	0.034	0.034
	GCN(ours)	0.85	74.51%	0.012	0.926	0.927	0.85	76.40%	0.012	0.937	0.938
	GraphWaveNet	136.10	70.35%	0.016	0.906	0.906	137.70	70.85%	0.017	0.909	0.909
	GraphWaveNet(ours)	104.90	70.55%	0.015	0.907	0.907	101.70	70.97%	0.016	0.910	0.910
	TGCN	14.76	73.61%	0.013	0.918	0.920	14.19	75.78%	0.013	0.931	0.931
	TGCN(ours)	14.46	80.57%	0.010	0.955	0.955	14.26	81.34%	0.010	0.959	0.960
	STGCN	7.53	66.73%	0.015	0.842	0.843	8.00	68.24%	0.016	0.872	0.874
	STGCN(ours)	7.65	73.29%	0.013	0.910	0.911	7.62	71.71%	0.015	0.906	0.912
Shenzhen Bike-Sharing	A3TGCN	117.20	11.10%	0.016	0.170	0.172	117.30	11.53%	0.018	0.173	0.176
	A3TGCN(ours)	25.12	37.09%	0.011	0.568	0.572	26.35	34.07%	0.013	0.528	0.531
	GCN	0.31	6.832%	0.017	0.081	0.082	0.28	7.27%	0.019	0.107	0.109
	GCN(ours)	0.38	52.00%	0.008	0.740	0.760	0.27	52.27%	0.009	0.740	0.759
	GraphWaveNet	1538.19	51.00%	0.008	0.713	0.716	786.31	50.64%	0.0094	0.709	0.716
	GraphWaveNet(ours)	61.69	50.61%	0.008	0.708	0.712	62.45	50.56%	0.0096	0.704	0.707
	TGCN	5.80	28.39%	0.010	0.379	0.377	5.80	30.89%	0.011	0.389	0.382
	TGCN(ours)	5.89	52.00%	0.007	0.706	0.707	5.76	51.64%	0.008	0.711	0.717
	STGCN	10.18	54.07%	0.008	0.759	0.761	10.18	57.30%	0.009	0.792	0.793
	STGCN(ours)	10.25	62.14%	0.007	0.829	0.827	10.17	61.20%	0.008	0.823	0.824

5.2. Experimental Results

5.2.1. Graph Construction

We utilized the HDPC-L method to obtain the graph nodes and selected K-means for comparison. The results obtained by HDPC-L and K-means when clustering 72 nodes are depicted in Figure 5. Density-based clustering proves to be more suitable than conventional clustering methods. As seen in Figure 5(b), the nodes obtained by K-means are distributed in a relatively uniform manner, whereas the nodes obtained by HDPC-L are clustered in regions with higher density. This clustering behavior aligns more closely with the requirements of real-world free-floating traffic management systems, making it a more reasonable approach. Initially, we decided to compare the impact of hierarchical clustering on the results. However, during the experiments, we encountered a limitation in directly clustering based on density, as it required a minimum of $865G$ of computer running memory. Unfortunately, our device did not possess such a large memory capacity, making it infeasible to obtain 1120 nodes through direct density-based clustering. This highlighted the impracticality of performing clustering operations on large-scale datasets solely based on density. Instead, we adopted a hierarchical clustering approach, which proved to be a viable solution. Figure 5(d) shows the final 1120 nodes obtained through HDPC-L on Shenzhen bike-sharing dataset. Evidently, hierarchical clustering plays a crucial role in addressing large-scale clustering challenges. As shown in Fig. 6, we show part of the distribution of our finalized graph nodes in the city of Shenzhen, where the black circle represents the site and the colored area represents the area under the jurisdiction of the site. We performed four layers of clustering on both datasets, and the average area of the clusters in each layer is shown in Table 1.

5.2.2. Baseline Models Enhancement

We initialize edge weights for five GCN-based baseline models based on the OD traffic relationship and compare them using six metrics on the Shenzhen bike-sharing dataset and the Haikou ride-hailing dataset. The results are shown in Table 2, where all the models are enhanced in different aspects. (1) The edge quantity in the four graphs is significantly reduced ( $Haikou:11150\Rightarrow 2647_{outflow},3138_{inflow}$ ; $Shenzhen:110974\Rightarrow 13711_{outflow},14995_{inflow}$ ), and the graph structure is greatly simplified. (2) The A3TGCN and GCN models are greatly enhanced and the metrics are dramatically improved, for example, A3TGCN is enhanced by 40% and 25% on the two datasets, and GCN is enhanced by 66% and 45% on the two datasets, respectively, in terms of accuracy. These two models mainly reflect the spatial dependency, and their enhancement fully reflects that our method significantly enhances the spatial representation of graphs. (3) The improvement in accuracy on the two datasets is 22% and 6% for TGCN and 6% and 5% for STGCN, which is smaller than that of the A3TGCN and GCN models, mainly due to the fact that they model the time dependence as well. The enhancement is much greater on the Shenzhen bike-sharing dataset than on the Haikou ride-hailing dataset. In the graphs we built, the Haikou ride-hailing dataset has 180 nodes while the Shenzhen bike-sharing dataset has 1120 nodes, demonstrating that our method performs exceptionally well when optimizing large graph structures. In addition, as shown in Fig. 7, our method also speeds up the convergence of GNNs and the training process is smoother compared to the original model. (4) The improvement of GraphWaveNet is not significant. One possible reason is that its adaptive graph structure captures spatial dependencies and compensates for the lack of the original graph structure. However, the training speed of the original GraphWaveNet is very slow. Our method significantly reduces the training time of GraphWaveNet by approximately 30% on the Haikou ride-hailing dataset and approximately 90% on the Shenzhen bike-sharing dataset, while maintaining a similar level of prediction accuracy.

6. Conclusions

This paper presents a novel approach to construct graphs that can be used for free-floating mode traffic demand forecasting. Our approach proposes the HDPC-L density-based hierarchical clustering method, which significantly enhances computational efficiency and fosters a more coherent graph structure. By adopting a data-driven perspective, we extract Origin-Destination (OD) traffic information from the original dataset, which enables us to initialize edge weights and simplify the graph structure. Our method has demonstrated great improvements, with an average accuracy increase of 24.96% and 19.46% on the two datasets, respectively. Additionally, our approach has significantly improved computational efficiency by reducing training time by 12.05% and 32.40% on the two datasets, respectively.

We also notice that this study has several limitations, and there are many possible future research directions. We will explore how to optimize the graph structures we build in conjunction with other data, and look forward to extending the approach to a broader field and building corresponding python library for use.

7. Acknowledgments

This study was supported by the National Key R $\&$ D Program of China [Grant 2021ZD0112700]. We would like to thank Didi Chuxing GAIA Initiative for providing the ride-hailing order data and Shenzhen Municipal Government for providing the bike-sharing order data.

References

(1)
Bai et al. (2021) Jiandong Bai, Jiawei Zhu, Yujiao Song, Ling Zhao, Zhixiang Hou, Ronghua Du, and Haifeng Li. 2021. A3t-gcn: Attention temporal graph convolutional network for traffic forecasting. ISPRS International Journal of Geo-Information 10, 7 (2021), 485.
Chen et al. (2022) Changlu Chen, Yanbin Liu, Ling Chen, and Chengqi Zhang. 2022. Bidirectional spatial-temporal adaptive transformer for Urban traffic flow forecasting. IEEE Transactions on Neural Networks and Learning Systems (2022).
Cui et al. (2019) Zhiyong Cui, Kristian Henrickson, Ruimin Ke, and Yinhai Wang. 2019. Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting. IEEE Transactions on Intelligent Transportation Systems 21, 11 (2019), 4883–4894.
De Fabritiis et al. (2008) Corrado De Fabritiis, Roberto Ragona, and Gaetano Valenti. 2008. Traffic estimation and prediction based on real time floating car data. In 2008 11th international IEEE conference on intelligent transportation systems. IEEE, 197–203.
Du et al. (2021) Bowen Du, Xiao Hu, Leilei Sun, Junming Liu, Yanan Qiao, and Weifeng Lv. 2021. Traffic Demand Prediction Based on Dynamic Transition Convolutional Neural Network. IEEE Transactions on Intelligent Transportation Systems 22, 2 (2021), 1237–1247.
Fang et al. (2021) Mengyuan Fang, Luliang Tang, Xue Yang, Yang Chen, Chaokui Li, and Qingquan Li. 2021. FTPG: A fine-grained traffic prediction method with graph attention network using big trace data. IEEE Transactions on Intelligent Transportation Systems 23, 6 (2021), 5163–5175.
Gupta et al. (2023) Mridul Gupta, Hariprasad Kodamana, and Sayan Ranu. 2023. Frigate: Frugal Spatio-temporal Forecasting on Road Networks. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23). Association for Computing Machinery, 649–660.
He et al. (2022) Yuxin He, Lishuai Li, Xinting Zhu, and Kwok Leung Tsui. 2022. Multi-Graph Convolutional-Recurrent Neural Network (MGC-RNN) for Short-Term Forecasting of Transit Passenger Flow. IEEE Transactions on Intelligent Transportation Systems 23, 10 (2022), 18155–18174.
Hua et al. (2020) Mingzhuang Hua, Xuewu Chen, Shujie Zheng, Long Cheng, and **gxu Chen. 2020. Estimating the parking demand of free-floating bike sharing: A journey-data-based study of Nan**g, China. Journal of Cleaner Production 244 (2020), 118764.
Huang et al. (2022) Feihu Huang, Peiyu Yi, **ce Wang, Mengshi Li, Jian Peng, and Xi Xiong. 2022. A dynamical spatial-temporal graph neural network for traffic demand prediction. Information Sciences 594 (2022), 286–304.
Hulot et al. (2018) Pierre Hulot, Daniel Aloise, and Sanjay Dominik Jena. 2018. Towards Station-Level Demand Prediction for Effective Rebalancing in Bike-Sharing Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’18). Association for Computing Machinery, 378–386.
James (2022) JQ James. 2022. Graph construction for traffic prediction: A data-driven approach. IEEE Transactions on Intelligent Transportation Systems 23, 9 (2022), 15015–15027.
Jiang et al. (2023) Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, and **gyuan Wang. 2023. PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for Traffic Flow Prediction. Proceedings of the AAAI Conference on Artificial Intelligence 37, 4 (Jun. 2023), 4365–4373.
Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations.
Lan et al. (2022) Shiyong Lan, Yitong Ma, Weikang Huang, Wenwu Wang, Hongyu Yang, and Pyang Li. 2022. Dstagnn: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In International conference on machine learning. PMLR, 11906–11917.
Li and Axhausen (2020) Aoyong Li and Kay W Axhausen. 2020. Short-term traffic demand prediction using graph convolutional neural networks. AGILE: GIScience Series 1 (2020), 12.
Li and Lasenby (2021) Duo Li and Joan Lasenby. 2021. Spatiotemporal attention-based graph convolution network for segment-level traffic prediction. IEEE Transactions on Intelligent Transportation Systems 23, 7 (2021), 8337–8345.
Li et al. (2022) Guanyao Li, Xiaofeng Wang, Gunarto Sindoro Njoo, Shuhan Zhong, S-H Gary Chan, Chih-Chieh Hung, and Wen-Chih Peng. 2022. A data-driven spatial-temporal graph neural network for docked bike prediction. In 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 713–726.
Li et al. (2018) Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In International Conference on Learning Representations.
Li et al. (2015) Yexin Li, Yu Zheng, Huichu Zhang, and Lei Chen. 2015. Traffic prediction in a bike-sharing system. In Proceedings of the 23rd SIGSPATIAL international conference on advances in geographic information systems. 1–10.
Lv et al. (2020) Chang Lv, Chaoyong Zhang, Kunlei Lian, Ya** Ren, and Leilei Meng. 2020. A hybrid algorithm for the static bike-sharing re-positioning problem based on an effective clustering strategy. Transportation Research Part B: Methodological 140, C (2020), 1–21.
Lv et al. (2018) Zhongjian Lv, Jiajie Xu, Kai Zheng, Hongzhi Yin, Pengpeng Zhao, and Xiaofang Zhou. 2018. Lc-rnn: A deep learning model for traffic speed prediction.. In IJCAI, Vol. 2018. 27th.
Rodriguez and Laio (2014) Alex Rodriguez and Alessandro Laio. 2014. Clustering by fast search and find of density peaks. science 344, 6191 (2014), 1492–1496.
Song et al. (2020) Chao Song, Youfang Lin, Shengnan Guo, and Huaiyu Wan. 2020. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 914–921.
Tang et al. (2021) **jun Tang, Jian Liang, Fang Liu, **g**g Hao, and Yinhai Wang. 2021. Multi-community passenger demand prediction at region level based on spatio-temporal graph convolutional network. Transportation Research Part C: Emerging Technologies 124 (2021).
Tian et al. (2024) Zihao Tian, **g Zhou, Lixin Tian, and David Z.W. Wang. 2024. Dynamic spatio-temporal interactive clustering strategy for free-floating bike-sharing. Transportation Research Part B: Methodological 179 (2024), 102872.
Wang et al. (2019) Junhua Wang, Tianyang Luo, and Ting Fu. 2019. Crash prediction based on traffic platoon characteristics using floating car trajectory data and the machine learning approach. Accident Analysis & Prevention 133 (2019), 105320.
Wu et al. (2019) Zonghan Wu, Shirui Pan, Guodong Long, **g Jiang, and Chengqi Zhang. 2019. Graph wavenet for deep spatial-temporal graph modeling. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI’19). AAAI Press, 1907–1913.
Xie et al. (2023) Peng Xie, Minbo Ma, Tianrui Li, Shenggong Ji, Shengdong Du, Zeng Yu, and Junbo Zhang. 2023. Spatio-Temporal Dynamic Graph Relation Learning for Urban Metro Flow Prediction. IEEE Transactions on Knowledge and Data Engineering (2023).
Yao et al. (2018) Huaxiu Yao, Fei Wu, **tao Ke, ** Ye, and Zhenhui Li. 2018. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the AAAI conference on artificial intelligence, Vol. 32.
Yu et al. (2018) Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). AAAI Press, 3634–3640.
Yuan et al. (2018) Zhuoning Yuan, Xun Zhou, and Tianbao Yang. 2018. Hetero-ConvLSTM: A Deep Learning Approach to Traffic Accident Prediction on Heterogeneous Spatio-Temporal Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD ’18). Association for Computing Machinery, New York, NY, USA, 984–992.
Zhang et al. (2019) Kunpeng Zhang, Zijian Liu, and Liang Zheng. 2019. Short-term prediction of passenger demand in multi-zone level: Temporal convolutional neural network with multi-task learning. IEEE transactions on intelligent transportation systems 21, 4 (2019), 1480–1490.
Zhao et al. (2020) Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and Haifeng Li. 2020. T-GCN: A Temporal Graph Convolutional Network for Traffic Prediction. IEEE Transactions on Intelligent Transportation Systems 21, 9 (2020), 3848–3858.
Zhou et al. (2018) Xian Zhou, Yanyan Shen, Yanmin Zhu, and Linpeng Huang. 2018. Predicting multi-step citywide passenger demands using attention-based neural networks. In Proceedings of the Eleventh ACM international conference on web search and data mining. 736–744.
Zonoozi et al. (2018) Ali Zonoozi, Jung-jae Kim, Xiao-Li Li, and Gao Cong. 2018. Periodic-CRN: A Convolutional Recurrent Model for Crowd Density Prediction with Recurring Periodic Patterns. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization.