Unveiling Delay Effects in Traffic Forecasting: A Perspective from Spatial-Temporal Delay Differential Equations

Qingqing Long Computer Network Information Center, Chinese Academy of SciencesChina [email protected] , Zheng Fang Peking UniversityChina [email protected] , Chen Fang Institute of Zoology, Chinese Academy of SciencesChina [email protected] , Chen Chong Terminus GroupChina [email protected] , Pengfei Wang Computer Network Information Center, Chinese Academy of Sciences.University of Chinese Academy of SciencesChina [email protected] and Yuanchun Zhou Computer Network Information Center, Chinese Academy of Sciences.University of Chinese Academy of SciencesChina [email protected]

(2024)

Abstract.

Traffic flow forecasting is a fundamental research issue for transportation planning and management, which serves as a canonical and typical example of spatial-temporal predictions. In recent years, Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs) have achieved great success in capturing spatial-temporal correlations for traffic flow forecasting. Yet, two non-ignorable issues haven’t been well solved: 1) The message passing in GNNs is immediate, while in reality the spatial message interactions among neighboring nodes can be delayed. The change of traffic flow at one node will take several minutes, i.e., time delay, to influence its connected neighbors. 2) Traffic conditions undergo continuous changes. The prediction frequency for traffic flow forecasting may vary based on specific scenario requirements. Most existing discretized models require retraining for each prediction horizon, restricting their applicability. To tackle the above issues, we propose a neural Spatial-Temporal Delay Differential Equation model, namely STDDE. It includes both delay effects and continuity into a unified delay differential equation framework, which explicitly models the time delay in spatial information propagation. Furthermore, theoretical proofs are provided to show its stability. Then we design a learnable traffic-graph time-delay estimator, which utilizes the continuity of the hidden states to achieve the gradient backward process. Finally, we propose a continuous output module, allowing us to accurately predict traffic flow at various frequencies, which provides more flexibility and adaptability to different scenarios. Extensive experiments show the superiority of the proposed STDDE along with competitive computational efficiency. Moreover, both quantitative and qualitative experiments are conducted to validate the concept of a delay-aware module. Also, the flexibility validation shows the effectiveness of the continuous output module.

deep graph learning, differential equation, traffic network, traffic flow prediction, continuous systems

^†^†journalyear: 2024^†^†copyright: rightsretained^†^†conference: Proceedings of the ACM Web Conference 2024; May 13–17, 2024; Singapore, Singapore^†^†booktitle: Proceedings of the ACM Web Conference 2024 (WWW ’24), May 13–17, 2024, Singapore, Singapore^†^†doi: 10.1145/3589334.3645688^†^†isbn: 979-8-4007-0171-9/24/05^†^†ccs: Information systems Spatial-temporal systems^†^†ccs: Mathematics of computing Continuous functions^†^†ccs: Applied computing Transportation

1. Introduction

Traffic forecasting is a fundamental research problem of Intelligent Transportation Systems (ITS) (Nagy and Simon, 2018; Yu et al., 2018; Ran and Boyce, 2012; Fang et al., 2022), which affects a variety of smart city applications (Song et al., 2020a; Ju et al., 2024), such as trip planning (Choi et al., 2021; Fang et al., 2020) accident prediction (Islam et al., 2022; Ju et al., 2023), and urban management (Chavhan and Venkataram, 2020; Yuan et al., 2021). Traffic flow forecasting aims to predict the future traffic flow based on historical data and underlying traffic networks.

Traffic flow forecasting is a challenging task due to the inherent spatial-temporal dependencies. Benefiting from the flourishing of deep learning, a large number of deep models have been proposed for traffic forecasting. In the temporal dimension, RNN-based models and their variants (Schuster and Paliwal, 1997) occupy the mainstream status, and temporal convolution networks (Lu et al., 2020) have also attracted much attention due to their superior computation efficiency. In the spatial dimension, considering that most traffic networks are non-Euclidean other than grid-partitioned, GNN-based methods (Yu et al., 2018; Li et al., 2018; Long et al., 2020) beat CNN-based ones (Zhang et al., 2019) and become predominant owing to their strong ability to deal with graph-structured data. Extensive works combine the spatial module and the temporal module to achieve significant improvements, among which STGCN (Yu et al., 2018) and DCRNN (Li et al., 2018) and DSTAGN (Lan et al., 2022) are the representative.

Refer to caption — (a) General GNN propagation process.

Nevertheless, previous works prove the following shortcomings,

(1) The delays in the graph signal propagation process are overlooked. When an incident occurs at specific nodes, the influence will take several minutes (i.e. time delay) to propagate to their neighboring nodes. However, the delay effect is largely neglected in existing spatial-temporal traffic forecasting issues. Time series models (Fu et al., 2016; Benidis et al., 2022), such as RNN and GRU, are capable of modeling scenarios in which the delay remains consistent across all nodes and timestamps. In contrast, the practice is quite the opposite, time-delays vary significantly at different nodes and timestamps, as illustrated in Fig. 1(c). It shows the time-delay distribution among neighbors in the PEMS03 dataset. Therefore, a separate module is required to characterize and model these variations. As shown in Fig. 1(a) and 1(b), general GNNs propagate the suddenly changed message indistinguishably based on the adjacency relation, leading to a sub-optimal prediction ahead of the ground truth. Thus it is urgent to involve delay effects in spatial-temporal traffic forecasting.

(2) The inherent continuity in traffic system is not well-explored. Existing methods mainly utilize RNNs (Schuster and Paliwal, 1997) or TCNs (Lu et al., 2020), which accept discrete observations as input, to capture the temporal dependencies. These methods are limited in terms of flexibility and applicability. Specifically, for the same traffic system, the required prediction horizon and resolution may vary across different applied scenarios, and the model needs to be retrained for each specific demand. Also, traffic data is notable for its inherent sparsity (Liu et al., 2020; Zhou et al., 2020). This sparsity arises due to the limited availability of traffic sensors, particularly in extensive road networks. For example, the sampling precision of sensors deployed within the traffic network may be at a 10-minute interval. However, during the prediction phase, we aspire to attain finer-grained forecasting precision to enable rapid responses to events, such as travel time planning (Aryandoust et al., 2019) or traffic emergency management (Zhou et al., 2020).

To tackle above mentioned issues, we propose a neural Spatial-Temporal Delay Differential Equation model (STDDE). In contrast to existing methods, STDDE presents an innovative paradigm for spatial-temporal traffic analysis by addressing the aforementioned two challenges. STDDE explicitly captures and leverages delayed spatial interactions among neighboring nodes. Furthermore, it models spatial-temporal evolution signals from a continuous perspective, departing from traditional recurrent approaches. Specifically, the delay values can be acquired through pre-processing or a learnable estimator. Then we combine the specific historical hidden states of its own and its neighbors to effectively integrate spatial and temporal information by using a Delay-aware Differential Equation (DDE). Then we theoretically prove the proposed delay differential equation is asymptotically stable. We conduct experiments on six popular used real-world traffic datasets. The results demonstrate that our model outperforms state-of-the-art models while maintaining competitive computational efficiency. Quantitative and qualitative experiments are conducted to validate the effectiveness of the delay-aware module. Additionally, the flexibility validation confirms the effectiveness of the continuous output module.

The main contributions of this work are summarized as follows:

•

We propose a Spatial-Temporal Delay Differential Equation model, namely STDDE, which includes both delay effects and continuity into a unified delay differential equation framework, which explicitly models the time delay in spatial information propagation.
•

We design a learnable traffic-graph time-delay estimator, which utilizes the continuity of the hidden states to achieve the gradient backward process.
•

We propose a continuous output module, allowing us to accurately predict traffic flow at various frequencies, which provides more flexibility to different scenarios.
•

We conduct experiments on six popular used datasets, in which results show that our model outperforms the SOTAs and exhibits competitive computational efficiency.

2. Related Work

2.1. Traffic Flow Forecasting

A large body of research has been conducted on traffic flow forecasting in recent years. Traffic flow forecasting can be viewed as a spatial-temporal forecasting task leveraging spatial-temporal data collected by various sensors to predict future traffic conditions. In recent years, deep learning methods have dominated traffic flow forecasting issues, due to their superior ability to model complex spatial-temporal correlations. The models combining graph neural networks (GNN) (Kipf and Welling, 2016; Long et al., 2021a, b) and recurrent neural networks (RNN) (Schuster and Paliwal, 1997) are the representative. Specifically, DCRNN (Li et al., 2017) views the traffic flow as a diffusion process on a directed graph and utilizes GRU to capture the temporal features. STGCN (Yu et al., 2018) utilizes graph convolution and 1D convolution to capture spatial dependencies and temporal correlations respectively. (Jiang et al., 2023) proposes a PDFormer model. While PDFormer mentions the concept of delay, its core mechanism involves utilizing attention to the historical time series, rather than explicitly utilizing delay in the propagation process. It implies that it still cannot capture the intricate delay information among graph vertices. In general, despite their achieved success, all existing works are limited to the spatial-temporal stacking structure and ignore the delay effect, which deviates from the real situation of traffic.

2.2. Neural Differential Equations

The neural ordinary differential equation (NODE) (Chen et al., 2018) was first proposed as a continuous version of residual neural networks (ResNet) (Wu et al., 2019). Due to its apparent suitability for dynamics-governed time-series, NODE is soon utilized in the time series analysis, especially when the input data is irregularly sampled or partially observed (De Brouwer et al., 2019; Rubanova et al., 2019; Lechner and Hasani, 2020). However, the solution of neural ODE is totally determined by the initial condition, which means later arriving data would not exert influence on the equation, this is also why neural ODE is generally applied cooperatively with RNN modules to deal with incoming data. Neural control differential equation (NCDE) (Kidger et al., 2020) solved this problem by constructing a continuous path from discrete input data, and adjusting the evolution trajectory according to the continuous control signal. Another parallel and relevant work is the Neural Delay Differential Equation (NDDE) (Zhu et al., 2021). As emphasized in (Dupont et al., 2019), the flow of NODE cannot represent the systems with the effect of time delay. The emergence of NDDE fills the blank.

Few pioneering works have been conducted in traffic forecasting with a neural differential equation framework. STGODE (Fang et al., 2021) first utilizes the neural ODE to transform graph convolution into a continuous version to acquire a larger spatial-temporal receptive field. STG-NCDE (Choi et al., 2021) adopts the neural CDE to deal with irregular-sampled time series. Despite the success have achieved, none of these works take the delay effect into consideration. In this paper, we first extend the NDDE to multi-variable conditions for spatial-temporal modeling and cooperate with NCDE to construct continuous traffic signal evolution.

3. Preliminary

3.1. Problem Definition

In this paper, we focus on the long-term traffic flow forecasting problem. The traffic network is represented as a graph $\mathcal{G}=(V,E,A)$ , where $V$ is the set of $N$ traffic nodes, $E$ is the set of edges, and $A\in\mathbb{R}^{N\times N}$ is an adjacency matrix representing the connectivity of $N$ nodes. The traffic flow is represented as a flow matrix $X\in\mathbb{R}^{T\times N}$ , and $X_{tn}$ denotes the traffic flow of node $n$ at time $t$ . The goal of traffic flow forecasting is to learn a map** function $f$ to predict the future $T^{\prime}$ steps traffic flow given the historical $T$ steps information, which can be formulated as follows,

(1)

\left[X_{t-T+1,:},X_{t-T+2,:},\cdots,X_{t,:};\mathcal{G}\right]\underset{train% }{\overset{f}{\longrightarrow}}\left[X_{t+1,:},X_{t+2,:},\cdots,X_{t+T^{\prime% },:}\right].

Moreover, the model trained at some fixed grain may need to generate a differently-grained prediction to satisfy the complicated real-world needs, which is formulated as follows,

(2)

\left[X_{t-T+1,:},X_{t-T+2,:},\cdots,X_{t,:};\mathcal{G}\right]\underset{infer% }{\overset{f}{\longrightarrow}}\left[X_{t+dt_{1},:},X_{t+dt_{2},:},\cdots,X_{t% +dt_{n},:}\right].

where $dt_{1},dt_{2},\cdots,dt_{n}$ are arbitrary positive numbers.

3.2. Neural Differential Equations

3.2.1. Neural ODE (NODE)

The residual connection structure can be viewed as a discrete manner of Neural Ordinary Differential Equation (NODE). The update of representation $h$ is a special case of the following equation,

(3)

h_{t+\Delta t}=h_{t}+\Delta t\cdot f(h_{t},\theta_{t}),

with $\Delta t=1$ . Through letting $\Delta t\rightarrow 0$ and unifying $\theta_{t}$ into $\theta$ for parameter efficiency, we get the continuous version,

(4)

h(T)=h(0)+\int_{0}^{T}f\left(h(t),t,\theta\right)\mathrm{d}t,

where $h(0)$ is acquired from an input transformation. As the derivative in the ODE is parameterized with a neural network, the above version is named Neural ODE. To achieve memory efficiency, the adjoint sensitivity method is adopted in the backward process (Chen et al., 2018), which computes the gradients through another ODE rather than step-by-step backpropagation.

3.2.2. Neural NCDE (NCDE)

To establish connections with input data, NCDE is proposed and formulated as follows,

(5)

h(T)=h(0)+\int_{0}^{T}f(h(t),t,\theta)\mathrm{d}X_{t},

where the integral is a Riemann–Stieltjes integral (Mozyrska et al., 2009), and $X_{t}$ can be viewed as a signal controller in driving the equation evolution process. As Fig. 2 shows, neural CDE eliminates the discontinuity at the data arriving point and renders the whole process continuous in the hidden manifold space.

4. Model: STDDE

Fig. 3 shows the overall framework of our proposed STDDE. It consists of two components. The first component includes both delay effects and continuity into a unified delay differential equation framework, which explicitly models the time delay in spatial information propagation. As the Fig. 3 shows, the hidden state of one node, and the flows evolve in a delay-effect manner, i.e. the hidden state of $v_{i}$ at $t$ is influenced by the state of $v_{j}$ at $t-\tau_{ij}$ , where $\tau_{ij}$ is the delay from $v_{j}$ to $v_{i}$ . The second component is the continuous output module, allowing us to accurately predict traffic flow at various frequencies. More details will be demonstrated in the following sections.

4.1. Spatial-temporal Delay-aware Neural Differential Equations

4.1.1. Neural DDE (NDDE)

We first introduce the framework of delay-aware neural differential equations. NDDE introduces the delay effect to improve the precision of signal modeling, in which the evolution process is related to its history,

(6)

h(t)=\left\{\begin{aligned} &\phi(t),&t\leq 0,\\ &h(0)+\int_{0}^{T}f(h(t),h(t-\tau),t,\theta)\mathrm{d}t,&t>0.\end{aligned}\right.

where $\tau$ is the delay value, and $\phi(t)$ is the history function. The introduction of the delay $\tau$ extends the representation ability of neural ODE and enables modeling a more complex evolution process.

In this paper, we take the GRU (Fu et al., 2016) as an example to elaborate on the specific derivation of STDDE. Specifically, let $h_{t},z_{t},g_{t}$ denote the hidden state, update gate, and update vector respectively, the GRU is defined as follows,

(7)		$\displaystyle z_{t}$	$\displaystyle=\sigma(W_{z}h_{t-1}+U_{z}g_{t}+b_{z}),$
	$\displaystyle h_{t}$	$\displaystyle=z_{t}\odot h_{t-1}+(1-z_{t})\odot g_{t},$

where $\sigma$ is the sigmoid activation function, $W_{z},U_{z}$ and $b_{z}$ are parameters, and $\odot$ denotes element-wise production. By subtracting $h_{t-1}$ from this update equation, we have

(8)

\Delta h=h_{t}-h_{t-1}=(1-z_{t})\odot(g_{t}-h_{t-1}).

This naturally leads to the following ODE,

(9)

\frac{\mathrm{d}h(t)}{\mathrm{d}t}=(1-z(t))\odot(g(t)-h(t)).

Different from ODE, DDE requires a continuous history function rather than a single point, to serve as the initial state. A common practice is to set the history function as a time-constant one and approximate it with a multi-layer perception according to the input data,

(10)

\phi(t)=\text{constant}=\text{MLP}(x),\ \ \ t\leq 0

where $x$ is the input data.

4.1.2. Incorporating Spatio-temporal Delayed Correlations into NDDE

We extend differential equations to the spatial-temporal modeling of the traffic domain. To incorporate spatial-temporal correlations, we utilize graph neural networks to extract spatial features and view them as update vectors, that is,

(11)

g_{i}(t)=c\sum_{j\in\mathcal{N}(i)}\alpha_{ij}f(h_{j}(t-\tau_{ij})),

where $g_{i}(t)$ is the update vector of node $i$ at time $t$ , $\alpha_{ij}$ and $\tau_{ij}$ are the edge weight and delay value between node $v_{i}$ and $v_{j}$ respectively, $\mathcal{N}(i)$ denotes the neighbors of node $v_{i}$ , $f$ is a linear transformation, and $c$ is a constant to control the ratio of spatial information. In this formulation, we extend DDE to accommodate multi-variable conditions, to choose the specific history hidden states for node update. Specifically, to update the representation of node $v_{i}$ at time $t$ , we incorporate information from its neighbor $v_{j}$ at time $t-\tau_{ij}$ , taking into account the delayed information propagation with a time delay of $\tau_{ij}$ . The graph convolution operation is implemented with DGL package (Wang et al., 2019) in this work, whose complexity is proportional to the number of edges.

4.2. Traffic-Graph Time-Delay Estimator

As shown in Fig. (1), there exist propagation delays in real-world traffic conditions, which are in contrast to the general GNN propagation process. For instance, when a traffic incident transpires in a particular area, it may necessitate several minutes to impact traffic conditions in adjacent regions. To gain more accurate depictions of the time delay, we design two delay estimators to capture the propagation delay between connected nodes.

The direct approach estimates delays among neighboring nodes by maximizing the cross-correlation (MCC) (Azaria and Hertz, 1984) as a pre-processing step. Specifically, given two time series, $x_{i}$ and $x_{j}$ , where we assume that $x_{j}$ is influenced by $x_{i}$ , we initially smooth them through interpolation, resulting in $\tilde{x}_{i}$ and $\tilde{x}_{j}$ . Subsequently, we determine the delay, denoted as $\tau_{ij}$ , by identifying the peak of their cross-correlation after shifting, as expressed by the below equation,

(12)

\tau_{ij}=\arg\max_{k}\text{corr}(\tilde{x}_{i}^{\rightarrow{k}},\tilde{x}_{j}),

where $\tilde{x}_{i}^{\rightarrow{k}}$ denotes performing a $k$ -step shift to $\tilde{x}_{i}$ , and $corr$ is the Pearson correlation function in this paper. We estimate all the delay values in advance through pre-processing based on historical data.

The second approach involves modeling time delay as a learnable pattern. The time delay implicitly reflects external factors associated with the traffic network, including road length, road capacity, and more. Furthermore, the delay itself exhibits inherent variability, for example, longer delays often occur during morning and evening rush hours. In this approach, we assign two learnable delay parameters to each edge: one for peak hours and another for non-peak hours.

Please note that the delay value $\tau$ serves as an indicator to select a historical state in the equation (11). Generally, $\tau$ is considered non-learnable in this context because the model cannot compute the gradient of $\tau$ , which is theoretically $\frac{\mathrm{d}h(t-\tau)}{\mathrm{d}\tau}$ . However, thanks to the continuous modeling approach, we can indeed obtain this gradient. As demonstrated in equation (9), the derivative of $h$ with respect to $t$ is well-defined. Consequently, we can incorporate the gradient of $\tau$ in the neural network by explicitly defining the backward computation of $h$ with respect to $\tau$ .

4.3. State Evolution Controller

One key challenge with DDE is that, once the network parameters are fixed, the dynamic evolution becomes entirely self-contained and does not integrate incoming inputs, leading to the loss of valuable information. To address this issue, we introduce a control signal inspired by Neural CDE, offering a solution to this problem.

Following Neural CDE, we generate a continuous representation from the raw inputs through the natural cubic spline method, which ensures a minimum of two continuous differentiable properties,

(13)

\displaystyle X(t)=\Phi\left(\{x^{0},t_{0}\},\{x^{1},t_{1}\},\cdots,\{x^{n},t_% {n}\}\right),

where $X(t)$ is a continuous representation, $\Phi$ denotes the natural cubic spline function, and $x^{i}$ denotes the input at time $t_{i}$ . Thus we have

(14)

\frac{\mathrm{d}h_{i}(t)}{\mathrm{d}t}=(1-z_{i}(t))\odot(g_{i}(t)-h_{i}(t))% \tilde{f}\left(\frac{\mathrm{d}X_{i}(t)}{\mathrm{d}t}\right),

where $\tilde{f}$ is a transformation function to match dimensions, and $X_{i}$ is the continuous representation of node $v_{i}$ . The derivative of $X$ , denoted as $\frac{\mathrm{d}X_{i}(t)}{\mathrm{d}t}$ , signifies the trend or fluctuation in traffic flow, constantly influencing the direction of dynamic evolution.

We formulate the complete update process of hidden states as follows,

(15)		$\displaystyle g_{i}(t)$	$\displaystyle=c\sum_{j\in\mathcal{N}(i)}\alpha_{ij}f\left(h_{j}(t-\tau_{ij})% \right),$
	$\displaystyle z_{i}(t)$	$\displaystyle=\sigma(W_{z}h_{i}(t)+U_{z}g_{i}(t)+b_{z}),$
	$\displaystyle\frac{\mathrm{d}h_{i}(t)}{\mathrm{d}t}$	$\displaystyle=(1-z_{i}(t))\odot(g_{i}(t)-h_{i}(t))\tilde{f}\left(\frac{\mathrm% {d}X_{i}(t)}{\mathrm{d}t}\right),$
	$\displaystyle h_{i}(t)$	$\displaystyle=\left\{\begin{aligned} &\phi_{i}(t),&t\leq 0\\ &h_{i}(0)+\int_{0}^{t}\frac{\mathrm{d}h_{i}(t)}{\mathrm{d}t}\mathrm{d}t,&t>0.% \end{aligned}\right.$

4.4. Continuous Output Module

We employ another STDDE to generate the final outputs. In this approach, we consider the last stage of the hidden flow in the input process as the history function for the output process. This strategy offers two key advantages: Firstly, the hidden states remain continuous within the manifold space, ensuring unity between the input and output processes. Secondly, unlike traditional output layers that provide predictions with a fixed horizon, we can accurately predict traffic flow at various frequencies. It provides more flexibility and adaptability to different scenarios.

(16)		$\displaystyle\frac{\mathrm{d}h(t)}{\mathrm{d}t}$	$\displaystyle=(1-z(t))\odot(g(t)-h(t))$
	$\displaystyle h_{i}(t^{\prime})$	$\displaystyle=h_{i}(T)+\int_{T}^{t^{\prime}}\frac{\mathrm{d}h_{i}(t)}{\mathrm{% d}t}\mathrm{d}t,$
	$\displaystyle y_{i}(t^{\prime})$	$\displaystyle=f\left(h_{i}(t^{\prime})\right),$
	$\displaystyle Y_{i}$	$\displaystyle=\left[y_{i}(t_{T+1}),y_{i}(t_{T+2}),\cdots,y_{i}(t_{T+T^{\prime}% })\right]$

where $f$ is a map** function to get final outputs from hidden states, and $t^{\prime}$ is any target output point. This output approach better highlights the continuity of STDDE and fully capitalizes on its capabilities. With this model, we have the flexibility to generate predictions at any time, rather than being limited to a specific point. The continuous output module is well-suited for scenarios involving sparse traffic sensor data, especially when a higher level of precision is required during inference than in the training phase.

Finally, as our objective function in the context of traffic flow forecasting, we employ the widely-used Huber loss, which is known for its robustness in handling outliers compared to the squared error loss.

(17)

\mathcal{L}(Y,\hat{Y})=\left\{\begin{aligned} &\frac{1}{2}(Y-\hat{Y})^{2}&,\ &% |Y-\hat{Y}|\leq\delta\\ &\delta|Y-\hat{Y}|-\frac{1}{2}\delta^{2}&,\ &\text{otherwise}\\ \end{aligned}\right.

where $\hat{Y}$ is the ground truth, and $\delta$ is a hyperparameter which controls the sensitivity to outliers.

The time complexity analysis of STDDE is presented in the Appendix.

4.5. Why It Works?

4.5.1. Connection to Existing Works

STDDE includes both delay effects and continuity into a unified delay differential equation framework, which explicitly models the time delay in spatial information propagation. A prior related study on delay-aware traffic forecasting is PDFormer (Jiang et al., 2023). While PDFormer mentions the concept of time delay, its core mechanism involves utilizing attention with the historical time series, rather than explicitly utilizing delay in the propagation process. It implies that it still cannot capture the delay information in both spatial and temporal views.

Then we analyze the generalizability of STDDE from two perspectives: 1) In the temporal dimension, where GRU and its variants (Fu et al., 2016) can be considered as special cases of STDDE when integration $\int\frac{dh_{i}(t)}{dt}$ is discrete. 2) In the spatial dimension, when all the time-delays $\tau_{ij}$ are set to zero, STDDE will degenerate to general GNNs (Ju et al., 2023; Song et al., 2020b).

4.5.2. Stability

Stability is a critical property for a DDE. Here we provide a theoretical analysis of our proposed DDE.

Definition 1 ().

A delay differential equation is linked to a characteristic equation. If the real parts of all characteristic roots of the associated equation are negative, the delay differential equation is considered asymptotically stable.

Theorem 1 ().

The proposed DDE is asymptotically stable when the balance constant $c\leq 1/K$ .

Proof.

Please see the Appendix for more details about the definition and the proof. ∎

5. Experiments

5.1. Datasets

We verify the performance of STDDE on six real-world traffic datasets, namely PeMSD7 (M), PeMSD7 (L), PeMS03, PeMS04, PeMS07, and PeMS08, which are collected by Caltrans Performance Measurement System in real-time every 30 seconds (Chen et al., 2001) and aggregated into 5-min intervals, which means there are 288 time-steps for one day. More details of the datasets are listed in Table 1. We standardize the input by removing the mean and scaling to unit variance.

Datasets	#Sensors	#Edges	Time Steps
PeMSD7 (M)	228	1132	12672
PeMSD7 (L)	1026	10150	12672
PeMS03	358	547	26208
PeMS04	307	340	16992
PeMS07	883	866	28224
PeMS08	170	295	17856

Table 1. The summary of datasets used in our paper.

5.2. Baselines

We select the following representative baselines as our competitors, and more details can be found in the Appendix:

•

Spatio-Temporal Graph Convolution Models including STGCN (Yu et al., 2018), STSGCN (Song et al., 2020a), DCRNN (Li et al., 2018), AGCRN (Bai et al., 2020), ASTGCN (Guo et al., 2019), FOGS (Rao et al., 2022). STGCN utilizes graph convolution and 1D convolution to capture spatial-temporal correlations. STSGCN utilizes multiple localized spatial-temporal subgraph modules to capture the spatial-temporal correlations directly. DCRNN integrates graph convolution into an encoder-decoder gated recurrent unit. AGCRN captures node-specific spatial and temporal correlations automatically without a pre-defined graph. ASTGCN utilizes spatial and temporal attention mechanisms to model their dynamics. DSTAGN constructs a dynamic graph instead of relying on a pre-defined static one. FOGS utilizes first-order gradients rather than specific flows, which effectively circumvent issues associated with fitting irregularly-shaped distributions.
•

Spatial-Temporal Graph Ordinary Differential Equation Models, including STG-ODE (Fang et al., 2021) and STG-NCDE (Choi et al., 2021). STGODE proposes an ordinary differential equation-based continuous GNN, to capture long-range spatial-temporal dependencies. STG-NCDE designs two NCDEs to capture temporal and spatial properties respectively.
•

Delay-aware Traffic Models only include one related work, which is PDFormer (Jiang et al., 2023). Its transformer-based mechanism involves utilizing attention to the historical time series.

5.3. Experimental Settings

We split all datasets with a ratio of 6: 2: 2 into training sets, validation sets, and test sets. One hour of historical data is used to predict traffic conditions in the next 60 minutes, i.e. $T=12$ and $T^{\prime}=12$ . All experiments are conducted on the same Linux server and GPU. The dimension of hidden states is set to 64. We train our model using the Adam optimizer with a learning rate of 0.001. The batch size is 32 and the training epoch is 200. Mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE) are used to measure the performance. For baselines, we use their officially reported results if accessible. If not, we run their codes based on their recommendation configurations.

Dataset	Metric	STGCN	DCRNN	ASTGCN(r)	STSGCN	STGODE	AGCRN	STG-NCDE	DSTAGNN	FOGS	PDFormer	STDDE
	RMSE	7.55	7.18	6.87	5.93	5.66	5.54	5.39	5.54	5.54	5.60	5.19
PeMSD7(M)	MAE	4.01	3.83	3.61	3.01	2.97	2.79	2.68	2.78	2.76	2.81	2.56
	MAPE	9.67	9.81	8.84	7.55	7.36	7.02	6.76	6.93	6.83	7.06	6.61
	RMSE	8.28	8.33	7.64	6.88	5.98	5.92	5.76	5.98	6.04	5.90	5.63
PeMSD7(L)	MAE	4.84	4.33	4.09	3.61	3.22	2.99	2.87	2.98	2.96	2.92	2.77
	MAPE	11.76	11.41	10.25	9.13	7.94	7.59	7.31	7.50	7.48	7.54	7.26
	RMSE	30.42	30.31	29.56	29.21	27.84	28.25	27.09	27.39	24.85	25.96	24.52
PeMS03	MAE	17.55	17.99	17.34	17.48	16.50	15.98	15.57	15.62	15.06	14.95	15.03
	MAPE	17.43	18.34	17.21	16.78	16.69	15.23	15.06	14.74	15.03	15.58	14.69
	RMSE	36.01	37.65	35.22	33.65	32.82	32.26	31.09	31.71	31.29	29.96	29.86
PeMS04	MAE	22.66	24.63	22.94	21.19	20.84	19.83	19.21	19.38	19.44	18.31	18.11
	MAPE	14.34	17.01	16.43	13.90	13.77	12.97	12.76	12.77	12.81	12.07	12.07
	RMSE	39.34	38.61	37.87	39.03	37.54	36.55	33.84	34.88	34.09	32.80	32.59
PeMS07	MAE	25.33	25.22	24.01	24.26	22.99	22.37	20.53	21.62	20.79	19.78	19.47
	MAPE	11.21	11.82	10.73	10.21	10.14	9.12	8.80	9.24	8.75	8.54	8.49
	RMSE	27.88	27.83	26.22	26.80	25.97	25.22	24.81	25.08	25.36	24.61	24.31
PeMS08	MAE	18.11	17.46	16.64	17.13	16.81	15.95	15.45	15.85	16.10	15.66	15.12
	MAPE	11.34	11.39	10.6	10.96	10.62	10.09	9.92	9.93	9.85	9.61	9.74

Table 2. Performance comparison of baselines and proposed STDDE on six popular used real-world traffic datasets.

5.4. Conceptual Experiments

We first provide conceptual experiments to evaluate the necessity of motivation and the effectiveness of the proposed delay-aware differential equation and continuous output module.

5.4.1. Quantitative and Qualitative Validation of Time-delay

For quantitative validation, we design an invariant of our model, STDDE-no-delay, which sets all the delays as zero for comparison. We compute the average delay of each node in PEMS04 and choose the first $15\%$ and the last $15\%$ nodes after sorting the average delay value in ascending order, to compare the performances between STDDE and STDDE-no-delay. Table 3 shows the result. We find that the results of the nodes with larger average delay are much worse than that of nodes with small delays (MAPE is not a stable metric because it is susceptible to the small values), which indicates the difficulty of dealing with long-range correlations. And STDDE achieves a larger improvement for the long-delay nodes due to its ability to model delay effects.

For qualitative validation, we provide a case study to evaluate the effectiveness of the proposed STDDE in capturing time delay in traffic flow forecasting, we carry out a case study in the real-world dataset. We select two connected neighbor nodes 196 and 198 from PeMS04 dataset to visualize the STDDE’s perception with time delay in traffic flow forecasting. Results are shown in Fig 4, the prediction results of STDDE are remarkably closer to the ground truth than STDDE-no-delay. In addition, there is a huge rise in node 196’s traffic waveform in Fig 4 (a), and the result in (b) shows that STDDE-no-delay performs inaccurate feedback while STDDE does not. It further shows STDDE is able to capture and utilize the time delay information in traffic flow forecasting.

Data	Metric	STDDE-no-delay	STDDE	Gain
	RMSE	16.97	16.86	0.65%
First $15\%$	MAE	11.54	11.47	0.61%
	MAPE	19.71	18.37	6.80%
	RMSE	37.72	34.59	8.30%
Last $15\%$	MAE	25.06	23.24	7.26%
	MAPE	14.63	13.96	4.65%

Table 3. Performances facing delays of different extent.

5.4.2. Flexibility Validation of the Continuous-output Module

To test the flexibility of our model in real-world scenarios, we introduced a more challenging setting. We still have historical 60-min data to predict the traffic flow in the next 60 minutes. However, during the training process, we set the time interval as 10/15/20 minutes, which means the input steps are 6/4/3, and during the inference process, we change the time interval to 5 minutes. This configuration rigorously assesses the model’s adaptability. For STDDE, owing to its continuity, we only need to increase the number of chosen states in the output module, from 6/4/3 to 12. For baseline models, we first acquire their prediction and then adopt the linear interpolation method to acquire a more fine-grained output. The results are presented in Figure 5. In summary, the performance will degrade with the increase of the input interval. The performance of STDDE is significantly better than that of the baseline models. Compared to the linear interpolation method, the STDDE output module can model the inherent continuity and generate more accurate predictions.

5.5. Overall Performances and Analysis

Table 2 shows the results of the proposed STDDE model and competitive baselines on traffic flow forecasting tasks in six popular used real-world datasets. We conclude with the following findings:

•

Our model yields the best performance regarding all the metrics for most datasets, which suggests the effectiveness of our spatial-temporal delay traffic flow forecasting.
•

Continuous spatial-temporal neural networks, i.e. STGODE, STG-NCDE, and STDDE, perform better than traditional GNN-based ones, such as popularly used AGCRN, STGCN, DCRNN, and DSTAGNN. It shows the direction of continuous modeling in spatial-temporal traffic flow forecasting is effective and worth gaining more attention.
•

The proposed STDDE and PDFormer generally perform better than other continuous spatial-temporal methods, i.e. STGODE and STG-NCDE, which indicates that capturing and utilizing historical delay-related information is necessary and of great significance.
•

STDDE gains better performance than PDFormer, which shows the effectiveness of explicit spatial-temporal delay-aware differential equations and continuous modeling.

Model	# Parameters	PeMSD7 (M)		PeMSD7 (L)
Model	# Parameters	Train	Infer	Train	Infer
STGODE	328,646	131	13	1107	146
FOGS	1,674,188	50	3	531	42
DSTAGNN	2,784,988	168	43	1222	209
PDFormer	531,165	120	11	1292	138
STDDE	175,830	82	9	734	84

Table 4. Comparison of # parameters and running time in one epoch. (Unit: seconds)

5.6. Model Analysis

5.6.1. Ablation Studies

To verify the effectiveness of different modules of STDDE, we conduct the following ablation experiments on PeMS04 dataset and compare results with its corresponding variants.

•

v1 (STDDE-no-delay): We ignore the delay effect, and thus the model degenerates to an ODE model, to verify whether capturing the time delay signal is contributing.
•

v2 (STDDE-zero-history): We set the history function as zero to verify the necessity of learnable history states.
•

v3 (STDDE-fixed-delay): We use the pre-processed delay values as the inputs of STDDE.

The result is presented in Fig 6. The result shows that STDDE-no-delay has a significant performance gap with STDDE, which shows the necessity of utilizing time delay. Also, STDDE-fixed-delay performs worse than STDDE, which clearly shows the superiority of learnable delay values. In addition, STDDE performs better than STDDE-zero-history, because the historical states of DDE is critical to the update of a period of future states.

5.6.2. Model Efficiency Analysis

We conduct model efficiency analysis on STDDE and several representative baselines, i.e. STGODE, DSTAGNN, FOGS and PDFormer, in PeMSD7 (M) and PeMSD7 (L) datasets. Tab. 4 reports the number of parameters, average training, and inference time per epoch. We find that STDDE achieves competitive computational efficiency in both the training and inference phases. In the largest dataset PeMSD7 (L), compared with the best-performing PDFormer, STDDE reduces the training and inference time by over 40% and 20%, respectively.

5.6.3. Parameter Analysis

We analyze the dimension of hidden states in STDDE, which influences the complexity of the state space. Fig 7 shows the result on dataset PEMS04. The performance rises with the increase of hidden dimension and achieves the best when the dimension is 128. Considering the balance of effectiveness and efficiency, we set the dimension as 64 in our model.

6. Conclusion

In this paper, we propose STDDE which includes both delay effects and continuity into a unified delay differential equation framework. It explicitly models the time delay in spatial information propagation. To gain more accurate depictions of the time delay, we design a traffic-graph time-delay estimator. In addition, we propose a continuous output module, allowing us to accurately predict traffic flow at various frequencies, which provides more flexibility and adaptability to different scenarios. Experimental results show the effectiveness and efficiency of STDDE.

Acknowledgments

This research was supported by the Natural Science Foundation of China under Grant No. 61836013 and grants from the Strategic Priority Research Program of the Chinese Academy of Sciences XDB38030300.

References

(1)
Aryandoust et al. (2019) Arsam Aryandoust, Oscar van Vliet, and Anthony Patt. 2019. City-scale car traffic and parking density maps from Uber Movement travel time data. Scientific data 6, 1 (2019), 158.
Azaria and Hertz (1984) Mordechai Azaria and David Hertz. 1984. Time delay estimation by generalized cross correlation methods. IEEE Transactions on Acoustics, Speech, and Signal Processing 32, 2 (1984), 280–285.
Bai et al. (2020) Lei Bai, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. 2020. Adaptive graph convolutional recurrent network for traffic forecasting. arXiv preprint arXiv:2007.02842 (2020).
Benidis et al. (2022) Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Yuyang Wang, Danielle Maddix, Caner Turkmen, Jan Gasthaus, Michael Bohlke-Schneider, David Salinas, Lorenzo Stella, et al. 2022. Deep learning for time series forecasting: Tutorial and literature survey. Comput. Surveys 55, 6 (2022), 1–36.
Chavhan and Venkataram (2020) Suresh Chavhan and Pallapa Venkataram. 2020. Prediction based traffic management in a metropolitan area. Journal of traffic and transportation engineering (English edition) 7, 4 (2020), 447–466.
Chen et al. (2001) Chao Chen, Karl Petty, Alexander Skabardonis, Pravin Varaiya, and Zhanfeng Jia. 2001. Freeway performance measurement system: mining loop detector data. Transportation Research Record 1748, 1 (2001), 96–102.
Chen et al. (2018) Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. 2018. Neural ordinary differential equations. arXiv preprint arXiv:1806.07366 (2018).
Choi et al. (2021) Jeongwhan Choi, Hwangyong Choi, Jeehyun Hwang, and Noseong Park. 2021. Graph Neural Controlled Differential Equations for Traffic Forecasting. arXiv preprint arXiv:2112.03558 (2021).
De Brouwer et al. (2019) Edward De Brouwer, Jaak Simm, Adam Arany, and Yves Moreau. 2019. Gru-ode-bayes: Continuous modeling of sporadically-observed time series. Advances in neural information processing systems 32 (2019).
Dupont et al. (2019) Emilien Dupont, Arnaud Doucet, and Yee Whye Teh. 2019. Augmented neural odes. Advances in Neural Information Processing Systems 32 (2019).
Fang et al. (2020) Xiaomin Fang, Jizhou Huang, Fan Wang, Lingke Zeng, Hai** Liang, and Haifeng Wang. 2020. Constgat: Contextual spatial-temporal graph attention network for travel time estimation at baidu maps. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2697–2705.
Fang et al. (2021) Zheng Fang, Qingqing Long, Guojie Song, and Kunqing Xie. 2021. Spatial-temporal graph ode networks for traffic flow forecasting. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 364–373.
Fang et al. (2022) Zheng Fang, Lingjun Xu, Guojie Song, Qingqing Long, and Yingxue Zhang. 2022. Polarized graph neural networks. In Proceedings of the ACM Web Conference 2022. 1404–1413.
Fu et al. (2016) Rui Fu, Zuo Zhang, and Li Li. 2016. Using LSTM and GRU neural network methods for traffic flow prediction. In 2016 31st Youth academic annual conference of Chinese association of automation (YAC). IEEE, 324–328.
Guo et al. (2019) Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, and Huaiyu Wan. 2019. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 922–929.
Islam et al. (2022) Jaminur Islam, Jose Paolo Talusan, Shameek Bhattacharjee, Francis Tiausas, Sayyed Mohsen Vazirizade, Abhishek Dubey, Keiichi Yasumoto, and Sajal K Das. 2022. Anomaly based Incident Detection in Large Scale Smart Transportation Systems. In 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS). IEEE, 215–224.
Jiang et al. (2023) Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, and **gyuan Wang. 2023. PDFormer: Propagation Delay-aware Dynamic Long-range Transformer for Traffic Flow Prediction. arXiv preprint arXiv:2301.07945 (2023).
Ju et al. (2023) Wei Ju, Zheng Fang, Yiyang Gu, Zequn Liu, Qingqing Long, Ziyue Qiao, Yifang Qin, Jianhao Shen, Fang Sun, Zhi** Xiao, et al. 2023. A Comprehensive Survey on Deep Graph Representation Learning. arXiv preprint arXiv:2304.05055 (2023).
Ju et al. (2024) Wei Ju, Siyu Yi, Yifan Wang, Qingqing Long, Junyu Luo, Zhi** Xiao, and Ming Zhang. 2024. A Survey of Data-Efficient Graph Learning. arXiv preprint arXiv:2402.00447 (2024).
Kidger et al. (2020) Patrick Kidger, James Morrill, James Foster, and Terry Lyons. 2020. Neural controlled differential equations for irregular time series. Advances in Neural Information Processing Systems 33 (2020), 6696–6707.
Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
Lan et al. (2022) Shiyong Lan, Yitong Ma, Weikang Huang, Wenwu Wang, Hongyu Yang, and Pyang Li. 2022. Dstagnn: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In International conference on machine learning. PMLR, 11906–11917.
Lechner and Hasani (2020) Mathias Lechner and Ramin Hasani. 2020. Learning long-term dependencies in irregularly-sampled time series. arXiv preprint arXiv:2006.04418 (2020).
Li et al. (2017) Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2017. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017).
Li et al. (2018) Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In International Conference on Learning Representations.
Liu et al. (2020) Jielun Liu, Ghim ** Ong, and Xiqun Chen. 2020. GraphSAGE-based traffic speed forecasting for segment network with sparse data. IEEE Transactions on Intelligent Transportation Systems 23, 3 (2020), 1755–1766.
Long et al. (2020) Qingqing Long, Yilun **, Guojie Song, Yi Li, and Wei Lin. 2020. Graph Structural-topic Neural Network. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1065–1073.
Long et al. (2021a) Qingqing Long, Yilun **, Yi Wu, and Guojie Song. 2021a. Theoretically improving graph neural networks via anonymous walk graph kernels. In Proceedings of the Web Conference 2021. 1204–1214.
Long et al. (2021b) Qingqing Long, Lingjun Xu, Zheng Fang, and Guojie Song. 2021b. HGK-GNN: Heterogeneous Graph Kernel based Graph Neural Networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1129–1138.
Lu et al. (2020) Bin Lu, Xiaoying Gan, Haiming **, Luoyi Fu, and Haisong Zhang. 2020. Spatiotemporal adaptive gated graph convolution network for urban traffic flow forecasting. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1025–1034.
Mozyrska et al. (2009) Dorota Mozyrska, Ewa Pawluszewicz, and Delfim FM Torres. 2009. The Riemann-Stieltjes integral on time scales. arXiv preprint arXiv:0903.1224 (2009).
Nagy and Simon (2018) Attila M Nagy and Vilmos Simon. 2018. Survey on traffic prediction in smart cities. Pervasive and Mobile Computing 50 (2018), 148–163.
Ran and Boyce (2012) Bin Ran and David Boyce. 2012. Modeling dynamic transportation networks: an intelligent transportation system oriented approach. Springer Science & Business Media.
Rao et al. (2022) Xuan Rao, Hao Wang, Liang Zhang, **g Li, Shuo Shang, and Peng Han. 2022. FOGS: First-order gradient supervision with learning-based graph for traffic flow forecasting. In Proceedings of International Joint Conference on Artificial Intelligence, IJCAI. ijcai. org.
Rubanova et al. (2019) Yulia Rubanova, Ricky TQ Chen, and David K Duvenaud. 2019. Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems 32 (2019).
Schuster and Paliwal (1997) Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45, 11 (1997), 2673–2681.
Söderlind (2006) Gustaf Söderlind. 2006. The logarithmic norm. History and modern theory. BIT Numerical Mathematics 46, 3 (2006), 631–652.
Song et al. (2020a) Chao Song, Youfang Lin, Shengnan Guo, and Huaiyu Wan. 2020a. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 914–921.
Song et al. (2020b) Guojie Song, Qingqing Long, Yi Luo, Yiming Wang, and Yilun **. 2020b. Deep convolutional neural network based medical concept normalization. IEEE Transactions on Big Data 8, 5 (2020), 1195–1208.
Ström (1975) Torsten Ström. 1975. On logarithmic norms. SIAM J. Numer. Anal. 12, 5 (1975), 741–753.
Sun (2006) Le** Sun. 2006. Stability analysis for delay differential equations with multidelays and numerical examples. Mathematics of computation 75, 253 (2006), 151–165.
Wang et al. (2019) Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, ****g Zhou, Qi Huang, Chao Ma, et al. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. (2019).
Wu et al. (2019) Zifeng Wu, Chunhua Shen, and Anton Van Den Hengel. 2019. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition 90 (2019), 119–133.
Yorke (1970) James A Yorke. 1970. Asymptotic stability for one dimensional differential-delay equations. Journal of Differential equations 7, 1 (1970), 189–202.
Yu et al. (2018) Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 3634–3640.
Yuan et al. (2021) Haitao Yuan, Guoliang Li, Zhifeng Bao, and Ling Feng. 2021. An effective joint prediction model for travel demands and traffic flows. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 348–359.
Zhang et al. (2019) Weibin Zhang, Yinghao Yu, Yong Qi, Feng Shu, and Yinhai Wang. 2019. Short-term traffic flow prediction based on spatio-temporal analysis and CNN deep learning. Transportmetrica A: Transport Science 15, 2 (2019), 1688–1711.
Zhou et al. (2020) Zhengyang Zhou, Yang Wang, Xike Xie, Lianliang Chen, and Chaochao Zhu. 2020. Foresee urban sparse traffic accidents: A spatiotemporal multi-granularity perspective. IEEE Transactions on Knowledge and Data Engineering 34, 8 (2020), 3786–3799.
Zhu et al. (2021) Qunxi Zhu, Yao Guo, and Wei Lin. 2021. Neural delay differential equations. arXiv preprint arXiv:2102.10801 (2021).

Appendix A Appendix

A.1. Time Complexity Analysis

The time complexity of STDDE is mainly associated with the ODE module, and the complexity of ODE depends on the solving algorithm. In this paper, we employ the Euler solver, so the complexity of aggregation is inversely proportional to the step length $\eta$ . As we have $T$ time slots, the aggregation needs to be calculated $\frac{T}{\eta}$ times. As the complexity of neighbor aggregation is $O(E)$ , the algorithm’s complexity finally becomes $O(\frac{TE}{\eta})$ .

A.2. Theoretical Analysis

Here we discuss the stability of the proposed DDE. We begin with introducing the basic definitions and lemmas.

Definition 2 ().

The logarithmic norm $\mu$ of a square matrix $A$ is defined as

(18)

\mu_{p}(A)=\lim_{\delta\longrightarrow 0^{+}}\frac{||I+\delta A||_{p}-1}{% \delta},

where $I$ is the identity matrix of the same dimension as $A$ , and $||*||_{p}$ denotes an induced matrix norm.

The logarithmic norm is widely used in differential equation analysis (Ström, 1975; Söderlind, 2006), and plays an important role in our analysis.

Lemma 1 ().

(Ström, 1975) Let $A=(a_{ij})\in\mathbb{R}^{n\times n}$ , then

(19)	$\displaystyle\mu_{1}(A)$	$\displaystyle=\max_{j}\left[a_{jj}+\sum_{i,i\neq j}\|a_{ij}\|\right],$
(20)	$\displaystyle\mu_{2}(A)$	$\displaystyle=\frac{1}{2}\max_{i}\left[\lambda_{i}(A+A^{T})\right],$
(21)	$\displaystyle\mu_{\infty}(A)$	$\displaystyle=\max_{i}\left[a_{ii}+\sum_{j,j\neq i}\|a_{ij}\|\right].$

From theorem1, we see that $\mu_{p}(A)$ is easy to calculate for $p=1,\infty$ , or estimate for $p=2$ , which brings convenience.

Definition 3 ().

A linear multi-delay differential equation is defined as

(22)

\frac{\mathrm{d}h(t)}{\mathrm{d}t}=A_{0}h(t)+\sum_{k=1}^{K}A_{k}h(t_{\tau_{k}}),

where $h(t)=(h_{0}(t),\cdots,h_{N}(t)$ is a vector, $A_{0},A_{k}\in\mathbb{R}^{n\times n}$ are constant matrix, and $h(t_{\tau_{k}})=(h_{1}(t-\tau_{k1}),\cdots,h_{N}(t-\tau_{kN})$ .

Taking the simplest two-variable delay differential equations as an example,

(23)

\left\{\begin{aligned} \frac{\mathrm{d}u(t)}{\mathrm{d}t}&=a_{1}u(t)+b_{1}v(t-% \tau_{2}),\\ \frac{\mathrm{d}v(t)}{\mathrm{d}t}&=a_{2}u(t)+b_{2}v(t-\tau_{1}),\end{aligned}\right.

by letting

(24)

h(t)=(u(t),v(t))^{T},h(t_{\tau})=(u(t-\tau_{1}),v(t-\tau_{2}))^{T},

(25)

A=\begin{pmatrix}a_{1}&0\\ 0&a_{2}\end{pmatrix},B=\begin{pmatrix}0&b_{1}\\ b_{2}&0\end{pmatrix},

we have

(26)

\frac{\mathrm{d}h(t)}{\mathrm{d}t}=Ay(t)+By(t_{\tau}).

Theorem 1 ().

The proposed DDE is bounded between two linear multi-delay systems.

Proof.

Without considering the control signal in Eqn. (9), we have

(27)

0\leq\frac{\mathrm{d}h_{i}(t)}{\mathrm{d}t}\leq g_{i}(t)-h_{i}(t).

The lower bound is easy, and we reformulate the upper bound as

(28)

g_{i}(t)-h_{i}(t)=-h_{i}(t)+c\sum_{j\in\mathcal{N}_{i}}\alpha_{ij}h_{j}(t-\tau% _{ij})

Further, let $H(t)=(h_{1}(t),h_{2}(t),\cdots,h_{n}(t))^{T}$ , we have

(29)

\displaystyle\frac{\mathrm{d}H(t)}{\mathrm{d}t}\leq-IH(t)+\sum_{k=1}^{K}A_{k}% \cdot H(t_{\tau_{k}}),

where $A_{k}$ is a matrix in which the $i$ -th row contains at most one non-zero element of $c\alpha_{ij}$ , and $H(t_{\tau_{k}})=(h_{1}(t_{\tau_{k1}}),\cdots,h_{n}(t_{\tau_{kn}}))$ is the re-group of delayed signals. $K$ denotes the maximum of node degrees, as each edge appears only once in the matrices. ∎

Definition 4 ().

The characteristic equation associated with the differential equation (22) is defined as

(30)

\det\left[zI-A_{0}-\sum_{k=1}^{N}A_{K}\exp(-zT_{k})\right]=0,

where $\exp(-zT_{k})=\text{diag}(e^{-z\tau_{k1}},e^{-z\tau_{k2}},\cdots,e^{-z\tau_{kN% }})$

Lemma 2 ().

(Sun, 2006) If the real parts of all characteristic roots of equation (30) are less than zero, then the system is asymptotically stable.

Lemma 3 ().

(Sun, 2006) If the condition

(31)

\mu(A_{0})+\sum_{k=1}^{K}||A_{k}||\leq 0

holds, then Lemma 2 holds, and the system is asymptotically stable.

Theorem 2 ().

The proposed DDE is asymptotically stable when the balance constant $c\leq 1/K$ .

Proof.

We prove the result starting from its upper bound and lower bound. The lower bound is clear to satisfy the condition in Lemma 3. We focus on the upper bound. We prove the result for norm $p=\infty$ , and the result can be generalized to other norms easily. According to the theorem 1, we have

(32)		$\displaystyle\mu_{\infty}(A_{0})$	$\displaystyle=\mu_{\infty}(-I)=-1,$
(33)		$\displaystyle\|\|A_{k}\|\|_{\infty}$	$\displaystyle=\max_{i}\sum_{j=1}^{n}\|A_{k_{ij}}\|=\max_{i}c\alpha_{ij(k)}.$

The second equation holds because there is at most one non-zero element in each row of $A_{k}$ , and $c\alpha_{ij(k)}$ denotes the element in row $i$ . As $\alpha_{ij}$ is the normalized weight for graph convolution, we have

(34)

||A_{k}||_{\infty}=\max_{i}c\alpha_{ij(k)}\leq c,

and when $c<=1/K$ ,

(35)

\mu(A_{0})+\sum_{k=1}^{K}||A_{k}||\leq-1+K\cdot c\leq 0,

the condition (33) holds. Thus the lower and upper bound of DDE are both asymptotically stable (Yorke, 1970). ∎