License: CC BY 4.0
arXiv:2402.01231v2 [cs.LG] 26 Feb 2024

Unveiling Delay Effects in Traffic Forecasting: A Perspective from Spatial-Temporal Delay Differential Equations

Qingqing Long Computer Network Information Center, Chinese Academy of SciencesChina [email protected] Zheng Fang Peking UniversityChina [email protected] Chen Fang Institute of Zoology, Chinese Academy of SciencesChina [email protected] Chen Chong Terminus GroupChina [email protected] Pengfei Wang Computer Network Information Center, Chinese Academy of Sciences.University of Chinese Academy of SciencesChina [email protected]  and  Yuanchun Zhou Computer Network Information Center, Chinese Academy of Sciences.University of Chinese Academy of SciencesChina [email protected]
(2024)
Abstract.

Traffic flow forecasting is a fundamental research issue for transportation planning and management, which serves as a canonical and typical example of spatial-temporal predictions. In recent years, Graph Neural Networks (GNNs) and Recurrent Neural Networks (RNNs) have achieved great success in capturing spatial-temporal correlations for traffic flow forecasting. Yet, two non-ignorable issues haven’t been well solved: 1) The message passing in GNNs is immediate, while in reality the spatial message interactions among neighboring nodes can be delayed. The change of traffic flow at one node will take several minutes, i.e., time delay, to influence its connected neighbors. 2) Traffic conditions undergo continuous changes. The prediction frequency for traffic flow forecasting may vary based on specific scenario requirements. Most existing discretized models require retraining for each prediction horizon, restricting their applicability. To tackle the above issues, we propose a neural Spatial-Temporal Delay Differential Equation model, namely STDDE. It includes both delay effects and continuity into a unified delay differential equation framework, which explicitly models the time delay in spatial information propagation. Furthermore, theoretical proofs are provided to show its stability. Then we design a learnable traffic-graph time-delay estimator, which utilizes the continuity of the hidden states to achieve the gradient backward process. Finally, we propose a continuous output module, allowing us to accurately predict traffic flow at various frequencies, which provides more flexibility and adaptability to different scenarios. Extensive experiments show the superiority of the proposed STDDE along with competitive computational efficiency. Moreover, both quantitative and qualitative experiments are conducted to validate the concept of a delay-aware module. Also, the flexibility validation shows the effectiveness of the continuous output module.

deep graph learning, differential equation, traffic network, traffic flow prediction, continuous systems
journalyear: 2024copyright: rightsretainedconference: Proceedings of the ACM Web Conference 2024; May 13–17, 2024; Singapore, Singaporebooktitle: Proceedings of the ACM Web Conference 2024 (WWW ’24), May 13–17, 2024, Singapore, Singaporedoi: 10.1145/3589334.3645688isbn: 979-8-4007-0171-9/24/05ccs: Information systems Spatial-temporal systemsccs: Mathematics of computing Continuous functionsccs: Applied computing Transportation

1. Introduction

Traffic forecasting is a fundamental research problem of Intelligent Transportation Systems (ITS) (Nagy and Simon, 2018; Yu et al., 2018; Ran and Boyce, 2012; Fang et al., 2022), which affects a variety of smart city applications (Song et al., 2020a; Ju et al., 2024), such as trip planning (Choi et al., 2021; Fang et al., 2020) accident prediction (Islam et al., 2022; Ju et al., 2023), and urban management (Chavhan and Venkataram, 2020; Yuan et al., 2021). Traffic flow forecasting aims to predict the future traffic flow based on historical data and underlying traffic networks.

Traffic flow forecasting is a challenging task due to the inherent spatial-temporal dependencies. Benefiting from the flourishing of deep learning, a large number of deep models have been proposed for traffic forecasting. In the temporal dimension, RNN-based models and their variants (Schuster and Paliwal, 1997) occupy the mainstream status, and temporal convolution networks (Lu et al., 2020) have also attracted much attention due to their superior computation efficiency. In the spatial dimension, considering that most traffic networks are non-Euclidean other than grid-partitioned, GNN-based methods (Yu et al., 2018; Li et al., 2018; Long et al., 2020) beat CNN-based ones (Zhang et al., 2019) and become predominant owing to their strong ability to deal with graph-structured data. Extensive works combine the spatial module and the temporal module to achieve significant improvements, among which STGCN (Yu et al., 2018) and DCRNN (Li et al., 2018) and DSTAGN (Lan et al., 2022) are the representative.

Refer to caption
(a) General GNN propagation process.
Refer to caption
(b) Realistic propagation process.
Refer to caption
(c) Distribution of time-delay in traffic dataset PEMS04.
Figure 1. (a) and (b) show the comparison of spatial-temporal signal propagation between general GNNs and realistic conditions. Node 2 and node 4 receive the same update information simultaneously in graph propagation, while they do not in the realistic scene. Fig. (c) shows the distribution of delay values in the real-world traffic network, which are computed based on the max-cross-correlation method (Azaria and Hertz, 1984).

Nevertheless, previous works prove the following shortcomings,

(1) The delays in the graph signal propagation process are overlooked. When an incident occurs at specific nodes, the influence will take several minutes (i.e. time delay) to propagate to their neighboring nodes. However, the delay effect is largely neglected in existing spatial-temporal traffic forecasting issues. Time series models (Fu et al., 2016; Benidis et al., 2022), such as RNN and GRU, are capable of modeling scenarios in which the delay remains consistent across all nodes and timestamps. In contrast, the practice is quite the opposite, time-delays vary significantly at different nodes and timestamps, as illustrated in Fig. 1(c). It shows the time-delay distribution among neighbors in the PEMS03 dataset. Therefore, a separate module is required to characterize and model these variations. As shown in Fig. 1(a) and 1(b), general GNNs propagate the suddenly changed message indistinguishably based on the adjacency relation, leading to a sub-optimal prediction ahead of the ground truth. Thus it is urgent to involve delay effects in spatial-temporal traffic forecasting.

(2) The inherent continuity in traffic system is not well-explored. Existing methods mainly utilize RNNs (Schuster and Paliwal, 1997) or TCNs (Lu et al., 2020), which accept discrete observations as input, to capture the temporal dependencies. These methods are limited in terms of flexibility and applicability. Specifically, for the same traffic system, the required prediction horizon and resolution may vary across different applied scenarios, and the model needs to be retrained for each specific demand. Also, traffic data is notable for its inherent sparsity (Liu et al., 2020; Zhou et al., 2020). This sparsity arises due to the limited availability of traffic sensors, particularly in extensive road networks. For example, the sampling precision of sensors deployed within the traffic network may be at a 10-minute interval. However, during the prediction phase, we aspire to attain finer-grained forecasting precision to enable rapid responses to events, such as travel time planning (Aryandoust et al., 2019) or traffic emergency management (Zhou et al., 2020).

To tackle above mentioned issues, we propose a neural Spatial-Temporal Delay Differential Equation model (STDDE). In contrast to existing methods, STDDE presents an innovative paradigm for spatial-temporal traffic analysis by addressing the aforementioned two challenges. STDDE explicitly captures and leverages delayed spatial interactions among neighboring nodes. Furthermore, it models spatial-temporal evolution signals from a continuous perspective, departing from traditional recurrent approaches. Specifically, the delay values can be acquired through pre-processing or a learnable estimator. Then we combine the specific historical hidden states of its own and its neighbors to effectively integrate spatial and temporal information by using a Delay-aware Differential Equation (DDE). Then we theoretically prove the proposed delay differential equation is asymptotically stable. We conduct experiments on six popular used real-world traffic datasets. The results demonstrate that our model outperforms state-of-the-art models while maintaining competitive computational efficiency. Quantitative and qualitative experiments are conducted to validate the effectiveness of the delay-aware module. Additionally, the flexibility validation confirms the effectiveness of the continuous output module.

The main contributions of this work are summarized as follows:

  • We propose a Spatial-Temporal Delay Differential Equation model, namely STDDE, which includes both delay effects and continuity into a unified delay differential equation framework, which explicitly models the time delay in spatial information propagation.

  • We design a learnable traffic-graph time-delay estimator, which utilizes the continuity of the hidden states to achieve the gradient backward process.

  • We propose a continuous output module, allowing us to accurately predict traffic flow at various frequencies, which provides more flexibility to different scenarios.

  • We conduct experiments on six popular used datasets, in which results show that our model outperforms the SOTAs and exhibits competitive computational efficiency.

2. Related Work

2.1. Traffic Flow Forecasting

A large body of research has been conducted on traffic flow forecasting in recent years. Traffic flow forecasting can be viewed as a spatial-temporal forecasting task leveraging spatial-temporal data collected by various sensors to predict future traffic conditions. In recent years, deep learning methods have dominated traffic flow forecasting issues, due to their superior ability to model complex spatial-temporal correlations. The models combining graph neural networks (GNN) (Kipf and Welling, 2016; Long et al., 2021a, b) and recurrent neural networks (RNN) (Schuster and Paliwal, 1997) are the representative. Specifically, DCRNN (Li et al., 2017) views the traffic flow as a diffusion process on a directed graph and utilizes GRU to capture the temporal features. STGCN (Yu et al., 2018) utilizes graph convolution and 1D convolution to capture spatial dependencies and temporal correlations respectively. (Jiang et al., 2023) proposes a PDFormer model. While PDFormer mentions the concept of delay, its core mechanism involves utilizing attention to the historical time series, rather than explicitly utilizing delay in the propagation process. It implies that it still cannot capture the intricate delay information among graph vertices. In general, despite their achieved success, all existing works are limited to the spatial-temporal stacking structure and ignore the delay effect, which deviates from the real situation of traffic.

2.2. Neural Differential Equations

The neural ordinary differential equation (NODE) (Chen et al., 2018) was first proposed as a continuous version of residual neural networks (ResNet) (Wu et al., 2019). Due to its apparent suitability for dynamics-governed time-series, NODE is soon utilized in the time series analysis, especially when the input data is irregularly sampled or partially observed (De Brouwer et al., 2019; Rubanova et al., 2019; Lechner and Hasani, 2020). However, the solution of neural ODE is totally determined by the initial condition, which means later arriving data would not exert influence on the equation, this is also why neural ODE is generally applied cooperatively with RNN modules to deal with incoming data. Neural control differential equation (NCDE) (Kidger et al., 2020) solved this problem by constructing a continuous path from discrete input data, and adjusting the evolution trajectory according to the continuous control signal. Another parallel and relevant work is the Neural Delay Differential Equation (NDDE) (Zhu et al., 2021). As emphasized in (Dupont et al., 2019), the flow of NODE cannot represent the systems with the effect of time delay. The emergence of NDDE fills the blank.

Few pioneering works have been conducted in traffic forecasting with a neural differential equation framework. STGODE (Fang et al., 2021) first utilizes the neural ODE to transform graph convolution into a continuous version to acquire a larger spatial-temporal receptive field. STG-NCDE (Choi et al., 2021) adopts the neural CDE to deal with irregular-sampled time series. Despite the success have achieved, none of these works take the delay effect into consideration. In this paper, we first extend the NDDE to multi-variable conditions for spatial-temporal modeling and cooperate with NCDE to construct continuous traffic signal evolution.

3. Preliminary

3.1. Problem Definition

In this paper, we focus on the long-term traffic flow forecasting problem. The traffic network is represented as a graph 𝒢=(V,E,A)𝒢𝑉𝐸𝐴\mathcal{G}=(V,E,A)caligraphic_G = ( italic_V , italic_E , italic_A ), where V𝑉Vitalic_V is the set of N𝑁Nitalic_N traffic nodes, E𝐸Eitalic_E is the set of edges, and AN×N𝐴superscript𝑁𝑁A\in\mathbb{R}^{N\times N}italic_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT is an adjacency matrix representing the connectivity of N𝑁Nitalic_N nodes. The traffic flow is represented as a flow matrix XT×N𝑋superscript𝑇𝑁X\in\mathbb{R}^{T\times N}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_T × italic_N end_POSTSUPERSCRIPT, and Xtnsubscript𝑋𝑡𝑛X_{tn}italic_X start_POSTSUBSCRIPT italic_t italic_n end_POSTSUBSCRIPT denotes the traffic flow of node n𝑛nitalic_n at time t𝑡titalic_t. The goal of traffic flow forecasting is to learn a map** function f𝑓fitalic_f to predict the future Tsuperscript𝑇T^{\prime}italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT steps traffic flow given the historical T𝑇Titalic_T steps information, which can be formulated as follows,

(1) [XtT+1,:,XtT+2,:,,Xt,:;𝒢]train𝑓[Xt+1,:,Xt+2,:,,Xt+T,:].subscript𝑋𝑡𝑇1:subscript𝑋𝑡𝑇2:subscript𝑋𝑡:𝒢𝑡𝑟𝑎𝑖𝑛𝑓subscript𝑋𝑡1:subscript𝑋𝑡2:subscript𝑋𝑡superscript𝑇:\left[X_{t-T+1,:},X_{t-T+2,:},\cdots,X_{t,:};\mathcal{G}\right]\underset{train% }{\overset{f}{\longrightarrow}}\left[X_{t+1,:},X_{t+2,:},\cdots,X_{t+T^{\prime% },:}\right].[ italic_X start_POSTSUBSCRIPT italic_t - italic_T + 1 , : end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t - italic_T + 2 , : end_POSTSUBSCRIPT , ⋯ , italic_X start_POSTSUBSCRIPT italic_t , : end_POSTSUBSCRIPT ; caligraphic_G ] start_UNDERACCENT italic_t italic_r italic_a italic_i italic_n end_UNDERACCENT start_ARG overitalic_f start_ARG ⟶ end_ARG end_ARG [ italic_X start_POSTSUBSCRIPT italic_t + 1 , : end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t + 2 , : end_POSTSUBSCRIPT , ⋯ , italic_X start_POSTSUBSCRIPT italic_t + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , : end_POSTSUBSCRIPT ] .

Moreover, the model trained at some fixed grain may need to generate a differently-grained prediction to satisfy the complicated real-world needs, which is formulated as follows,

(2) [XtT+1,:,XtT+2,:,,Xt,:;𝒢]infer𝑓[Xt+dt1,:,Xt+dt2,:,,Xt+dtn,:].subscript𝑋𝑡𝑇1:subscript𝑋𝑡𝑇2:subscript𝑋𝑡:𝒢𝑖𝑛𝑓𝑒𝑟𝑓subscript𝑋𝑡𝑑subscript𝑡1:subscript𝑋𝑡𝑑subscript𝑡2:subscript𝑋𝑡𝑑subscript𝑡𝑛:\left[X_{t-T+1,:},X_{t-T+2,:},\cdots,X_{t,:};\mathcal{G}\right]\underset{infer% }{\overset{f}{\longrightarrow}}\left[X_{t+dt_{1},:},X_{t+dt_{2},:},\cdots,X_{t% +dt_{n},:}\right].[ italic_X start_POSTSUBSCRIPT italic_t - italic_T + 1 , : end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t - italic_T + 2 , : end_POSTSUBSCRIPT , ⋯ , italic_X start_POSTSUBSCRIPT italic_t , : end_POSTSUBSCRIPT ; caligraphic_G ] start_UNDERACCENT italic_i italic_n italic_f italic_e italic_r end_UNDERACCENT start_ARG overitalic_f start_ARG ⟶ end_ARG end_ARG [ italic_X start_POSTSUBSCRIPT italic_t + italic_d italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , : end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_t + italic_d italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , : end_POSTSUBSCRIPT , ⋯ , italic_X start_POSTSUBSCRIPT italic_t + italic_d italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , : end_POSTSUBSCRIPT ] .

where dt1,dt2,,dtn𝑑subscript𝑡1𝑑subscript𝑡2𝑑subscript𝑡𝑛dt_{1},dt_{2},\cdots,dt_{n}italic_d italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_d italic_t start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_d italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are arbitrary positive numbers.

3.2. Neural Differential Equations

3.2.1. Neural ODE (NODE)

The residual connection structure can be viewed as a discrete manner of Neural Ordinary Differential Equation (NODE). The update of representation hhitalic_h is a special case of the following equation,

(3) ht+Δt=ht+Δtf(ht,θt),subscript𝑡Δ𝑡subscript𝑡Δ𝑡𝑓subscript𝑡subscript𝜃𝑡h_{t+\Delta t}=h_{t}+\Delta t\cdot f(h_{t},\theta_{t}),italic_h start_POSTSUBSCRIPT italic_t + roman_Δ italic_t end_POSTSUBSCRIPT = italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + roman_Δ italic_t ⋅ italic_f ( italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ,

with Δt=1Δ𝑡1\Delta t=1roman_Δ italic_t = 1. Through letting Δt0Δ𝑡0\Delta t\rightarrow 0roman_Δ italic_t → 0 and unifying θtsubscript𝜃𝑡\theta_{t}italic_θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT into θ𝜃\thetaitalic_θ for parameter efficiency, we get the continuous version,

(4) h(T)=h(0)+0Tf(h(t),t,θ)dt,𝑇0superscriptsubscript0𝑇𝑓𝑡𝑡𝜃differential-d𝑡h(T)=h(0)+\int_{0}^{T}f\left(h(t),t,\theta\right)\mathrm{d}t,italic_h ( italic_T ) = italic_h ( 0 ) + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f ( italic_h ( italic_t ) , italic_t , italic_θ ) roman_d italic_t ,

where h(0)0h(0)italic_h ( 0 ) is acquired from an input transformation. As the derivative in the ODE is parameterized with a neural network, the above version is named Neural ODE. To achieve memory efficiency, the adjoint sensitivity method is adopted in the backward process (Chen et al., 2018), which computes the gradients through another ODE rather than step-by-step backpropagation.

3.2.2. Neural NCDE (NCDE)

To establish connections with input data, NCDE is proposed and formulated as follows,

(5) h(T)=h(0)+0Tf(h(t),t,θ)dXt,𝑇0superscriptsubscript0𝑇𝑓𝑡𝑡𝜃differential-dsubscript𝑋𝑡h(T)=h(0)+\int_{0}^{T}f(h(t),t,\theta)\mathrm{d}X_{t},italic_h ( italic_T ) = italic_h ( 0 ) + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f ( italic_h ( italic_t ) , italic_t , italic_θ ) roman_d italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where the integral is a Riemann–Stieltjes integral (Mozyrska et al., 2009), and Xtsubscript𝑋𝑡X_{t}italic_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT can be viewed as a signal controller in driving the equation evolution process. As Fig. 2 shows, neural CDE eliminates the discontinuity at the data arriving point and renders the whole process continuous in the hidden manifold space.

Refer to caption
(a) ODE-RNN
Refer to caption
(b) CDE
Figure 2. The workflow comparison of original discrete time-series processing and CDE processing scheme.

4. Model: STDDE

Fig. 3 shows the overall framework of our proposed STDDE. It consists of two components. The first component includes both delay effects and continuity into a unified delay differential equation framework, which explicitly models the time delay in spatial information propagation. As the Fig. 3 shows, the hidden state of one node, and the flows evolve in a delay-effect manner, i.e. the hidden state of visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at t𝑡titalic_t is influenced by the state of vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT at tτij𝑡subscript𝜏𝑖𝑗t-\tau_{ij}italic_t - italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, where τijsubscript𝜏𝑖𝑗\tau_{ij}italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is the delay from vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT to visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The second component is the continuous output module, allowing us to accurately predict traffic flow at various frequencies. More details will be demonstrated in the following sections.

Refer to caption
Figure 3. Overview of STGDDE. It consists of two components. The first component includes both delay effects and continuity into a unified delay differential equation framework, which explicitly models the time delay in spatial information propagation. The second component is the continuous output module, allowing us to accurately predict traffic flow at various frequencies.

4.1. Spatial-temporal Delay-aware Neural Differential Equations

4.1.1. Neural DDE (NDDE)

We first introduce the framework of delay-aware neural differential equations. NDDE introduces the delay effect to improve the precision of signal modeling, in which the evolution process is related to its history,

(6) h(t)={ϕ(t),t0,h(0)+0Tf(h(t),h(tτ),t,θ)dt,t>0.h(t)=\left\{\begin{aligned} &\phi(t),&t\leq 0,\\ &h(0)+\int_{0}^{T}f(h(t),h(t-\tau),t,\theta)\mathrm{d}t,&t>0.\end{aligned}\right.italic_h ( italic_t ) = { start_ROW start_CELL end_CELL start_CELL italic_ϕ ( italic_t ) , end_CELL start_CELL italic_t ≤ 0 , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_h ( 0 ) + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_f ( italic_h ( italic_t ) , italic_h ( italic_t - italic_τ ) , italic_t , italic_θ ) roman_d italic_t , end_CELL start_CELL italic_t > 0 . end_CELL end_ROW

where τ𝜏\tauitalic_τ is the delay value, and ϕ(t)italic-ϕ𝑡\phi(t)italic_ϕ ( italic_t ) is the history function. The introduction of the delay τ𝜏\tauitalic_τ extends the representation ability of neural ODE and enables modeling a more complex evolution process.

In this paper, we take the GRU (Fu et al., 2016) as an example to elaborate on the specific derivation of STDDE. Specifically, let ht,zt,gtsubscript𝑡subscript𝑧𝑡subscript𝑔𝑡h_{t},z_{t},g_{t}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denote the hidden state, update gate, and update vector respectively, the GRU is defined as follows,

(7) ztsubscript𝑧𝑡\displaystyle z_{t}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =σ(Wzht1+Uzgt+bz),absent𝜎subscript𝑊𝑧subscript𝑡1subscript𝑈𝑧subscript𝑔𝑡subscript𝑏𝑧\displaystyle=\sigma(W_{z}h_{t-1}+U_{z}g_{t}+b_{z}),= italic_σ ( italic_W start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + italic_U start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_b start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) ,
htsubscript𝑡\displaystyle h_{t}italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =ztht1+(1zt)gt,absentdirect-productsubscript𝑧𝑡subscript𝑡1direct-product1subscript𝑧𝑡subscript𝑔𝑡\displaystyle=z_{t}\odot h_{t-1}+(1-z_{t})\odot g_{t},= italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⊙ italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + ( 1 - italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⊙ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ,

where σ𝜎\sigmaitalic_σ is the sigmoid activation function, Wz,Uzsubscript𝑊𝑧subscript𝑈𝑧W_{z},U_{z}italic_W start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT , italic_U start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT and bzsubscript𝑏𝑧b_{z}italic_b start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT are parameters, and direct-product\odot denotes element-wise production. By subtracting ht1subscript𝑡1h_{t-1}italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT from this update equation, we have

(8) Δh=htht1=(1zt)(gtht1).Δsubscript𝑡subscript𝑡1direct-product1subscript𝑧𝑡subscript𝑔𝑡subscript𝑡1\Delta h=h_{t}-h_{t-1}=(1-z_{t})\odot(g_{t}-h_{t-1}).roman_Δ italic_h = italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT = ( 1 - italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ⊙ ( italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_h start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) .

This naturally leads to the following ODE,

(9) dh(t)dt=(1z(t))(g(t)h(t)).d𝑡d𝑡direct-product1𝑧𝑡𝑔𝑡𝑡\frac{\mathrm{d}h(t)}{\mathrm{d}t}=(1-z(t))\odot(g(t)-h(t)).divide start_ARG roman_d italic_h ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG = ( 1 - italic_z ( italic_t ) ) ⊙ ( italic_g ( italic_t ) - italic_h ( italic_t ) ) .

Different from ODE, DDE requires a continuous history function rather than a single point, to serve as the initial state. A common practice is to set the history function as a time-constant one and approximate it with a multi-layer perception according to the input data,

(10) ϕ(t)=constant=MLP(x),t0formulae-sequenceitalic-ϕ𝑡constantMLP𝑥𝑡0\phi(t)=\text{constant}=\text{MLP}(x),\ \ \ t\leq 0italic_ϕ ( italic_t ) = constant = MLP ( italic_x ) , italic_t ≤ 0

where x𝑥xitalic_x is the input data.

4.1.2. Incorporating Spatio-temporal Delayed Correlations into NDDE

We extend differential equations to the spatial-temporal modeling of the traffic domain. To incorporate spatial-temporal correlations, we utilize graph neural networks to extract spatial features and view them as update vectors, that is,

(11) gi(t)=cj𝒩(i)αijf(hj(tτij)),subscript𝑔𝑖𝑡𝑐subscript𝑗𝒩𝑖subscript𝛼𝑖𝑗𝑓subscript𝑗𝑡subscript𝜏𝑖𝑗g_{i}(t)=c\sum_{j\in\mathcal{N}(i)}\alpha_{ij}f(h_{j}(t-\tau_{ij})),italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = italic_c ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_f ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t - italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ) ,

where gi(t)subscript𝑔𝑖𝑡g_{i}(t)italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) is the update vector of node i𝑖iitalic_i at time t𝑡titalic_t, αijsubscript𝛼𝑖𝑗\alpha_{ij}italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT and τijsubscript𝜏𝑖𝑗\tau_{ij}italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT are the edge weight and delay value between node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT respectively, 𝒩(i)𝒩𝑖\mathcal{N}(i)caligraphic_N ( italic_i ) denotes the neighbors of node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, f𝑓fitalic_f is a linear transformation, and c𝑐citalic_c is a constant to control the ratio of spatial information. In this formulation, we extend DDE to accommodate multi-variable conditions, to choose the specific history hidden states for node update. Specifically, to update the representation of node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT at time t𝑡titalic_t, we incorporate information from its neighbor vjsubscript𝑣𝑗v_{j}italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT at time tτij𝑡subscript𝜏𝑖𝑗t-\tau_{ij}italic_t - italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, taking into account the delayed information propagation with a time delay of τijsubscript𝜏𝑖𝑗\tau_{ij}italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. The graph convolution operation is implemented with DGL package (Wang et al., 2019) in this work, whose complexity is proportional to the number of edges.

4.2. Traffic-Graph Time-Delay Estimator

As shown in Fig. (1), there exist propagation delays in real-world traffic conditions, which are in contrast to the general GNN propagation process. For instance, when a traffic incident transpires in a particular area, it may necessitate several minutes to impact traffic conditions in adjacent regions. To gain more accurate depictions of the time delay, we design two delay estimators to capture the propagation delay between connected nodes.

The direct approach estimates delays among neighboring nodes by maximizing the cross-correlation (MCC) (Azaria and Hertz, 1984) as a pre-processing step. Specifically, given two time series, xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, where we assume that xjsubscript𝑥𝑗x_{j}italic_x start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT is influenced by xisubscript𝑥𝑖x_{i}italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, we initially smooth them through interpolation, resulting in x~isubscript~𝑥𝑖\tilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and x~jsubscript~𝑥𝑗\tilde{x}_{j}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Subsequently, we determine the delay, denoted as τijsubscript𝜏𝑖𝑗\tau_{ij}italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, by identifying the peak of their cross-correlation after shifting, as expressed by the below equation,

(12) τij=argmaxkcorr(x~ik,x~j),subscript𝜏𝑖𝑗subscript𝑘corrsuperscriptsubscript~𝑥𝑖absent𝑘subscript~𝑥𝑗\tau_{ij}=\arg\max_{k}\text{corr}(\tilde{x}_{i}^{\rightarrow{k}},\tilde{x}_{j}),italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = roman_arg roman_max start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT corr ( over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT → italic_k end_POSTSUPERSCRIPT , over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ,

where x~iksuperscriptsubscript~𝑥𝑖absent𝑘\tilde{x}_{i}^{\rightarrow{k}}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT → italic_k end_POSTSUPERSCRIPT denotes performing a k𝑘kitalic_k-step shift to x~isubscript~𝑥𝑖\tilde{x}_{i}over~ start_ARG italic_x end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and corr𝑐𝑜𝑟𝑟corritalic_c italic_o italic_r italic_r is the Pearson correlation function in this paper. We estimate all the delay values in advance through pre-processing based on historical data.

The second approach involves modeling time delay as a learnable pattern. The time delay implicitly reflects external factors associated with the traffic network, including road length, road capacity, and more. Furthermore, the delay itself exhibits inherent variability, for example, longer delays often occur during morning and evening rush hours. In this approach, we assign two learnable delay parameters to each edge: one for peak hours and another for non-peak hours.

Please note that the delay value τ𝜏\tauitalic_τ serves as an indicator to select a historical state in the equation (11). Generally, τ𝜏\tauitalic_τ is considered non-learnable in this context because the model cannot compute the gradient of τ𝜏\tauitalic_τ, which is theoretically dh(tτ)dτd𝑡𝜏d𝜏\frac{\mathrm{d}h(t-\tau)}{\mathrm{d}\tau}divide start_ARG roman_d italic_h ( italic_t - italic_τ ) end_ARG start_ARG roman_d italic_τ end_ARG. However, thanks to the continuous modeling approach, we can indeed obtain this gradient. As demonstrated in equation (9), the derivative of hhitalic_h with respect to t𝑡titalic_t is well-defined. Consequently, we can incorporate the gradient of τ𝜏\tauitalic_τ in the neural network by explicitly defining the backward computation of hhitalic_h with respect to τ𝜏\tauitalic_τ.

4.3. State Evolution Controller

One key challenge with DDE is that, once the network parameters are fixed, the dynamic evolution becomes entirely self-contained and does not integrate incoming inputs, leading to the loss of valuable information. To address this issue, we introduce a control signal inspired by Neural CDE, offering a solution to this problem.

Following Neural CDE, we generate a continuous representation from the raw inputs through the natural cubic spline method, which ensures a minimum of two continuous differentiable properties,

(13) X(t)=Φ({x0,t0},{x1,t1},,{xn,tn}),𝑋𝑡Φsuperscript𝑥0subscript𝑡0superscript𝑥1subscript𝑡1superscript𝑥𝑛subscript𝑡𝑛\displaystyle X(t)=\Phi\left(\{x^{0},t_{0}\},\{x^{1},t_{1}\},\cdots,\{x^{n},t_% {n}\}\right),italic_X ( italic_t ) = roman_Φ ( { italic_x start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT } , { italic_x start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT } , ⋯ , { italic_x start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } ) ,

where X(t)𝑋𝑡X(t)italic_X ( italic_t ) is a continuous representation, ΦΦ\Phiroman_Φ denotes the natural cubic spline function, and xisuperscript𝑥𝑖x^{i}italic_x start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT denotes the input at time tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Thus we have

(14) dhi(t)dt=(1zi(t))(gi(t)hi(t))f~(dXi(t)dt),dsubscript𝑖𝑡d𝑡direct-product1subscript𝑧𝑖𝑡subscript𝑔𝑖𝑡subscript𝑖𝑡~𝑓dsubscript𝑋𝑖𝑡d𝑡\frac{\mathrm{d}h_{i}(t)}{\mathrm{d}t}=(1-z_{i}(t))\odot(g_{i}(t)-h_{i}(t))% \tilde{f}\left(\frac{\mathrm{d}X_{i}(t)}{\mathrm{d}t}\right),divide start_ARG roman_d italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG = ( 1 - italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ) ⊙ ( italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ) over~ start_ARG italic_f end_ARG ( divide start_ARG roman_d italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG ) ,

where f~~𝑓\tilde{f}over~ start_ARG italic_f end_ARG is a transformation function to match dimensions, and Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the continuous representation of node visubscript𝑣𝑖v_{i}italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. The derivative of X𝑋Xitalic_X, denoted as dXi(t)dtdsubscript𝑋𝑖𝑡d𝑡\frac{\mathrm{d}X_{i}(t)}{\mathrm{d}t}divide start_ARG roman_d italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG, signifies the trend or fluctuation in traffic flow, constantly influencing the direction of dynamic evolution.

We formulate the complete update process of hidden states as follows,

(15) gi(t)subscript𝑔𝑖𝑡\displaystyle g_{i}(t)italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) =cj𝒩(i)αijf(hj(tτij)),absent𝑐subscript𝑗𝒩𝑖subscript𝛼𝑖𝑗𝑓subscript𝑗𝑡subscript𝜏𝑖𝑗\displaystyle=c\sum_{j\in\mathcal{N}(i)}\alpha_{ij}f\left(h_{j}(t-\tau_{ij})% \right),= italic_c ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N ( italic_i ) end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_f ( italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t - italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ) ,
zi(t)subscript𝑧𝑖𝑡\displaystyle z_{i}(t)italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) =σ(Wzhi(t)+Uzgi(t)+bz),absent𝜎subscript𝑊𝑧subscript𝑖𝑡subscript𝑈𝑧subscript𝑔𝑖𝑡subscript𝑏𝑧\displaystyle=\sigma(W_{z}h_{i}(t)+U_{z}g_{i}(t)+b_{z}),= italic_σ ( italic_W start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) + italic_U start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) + italic_b start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) ,
dhi(t)dtdsubscript𝑖𝑡d𝑡\displaystyle\frac{\mathrm{d}h_{i}(t)}{\mathrm{d}t}divide start_ARG roman_d italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG =(1zi(t))(gi(t)hi(t))f~(dXi(t)dt),absentdirect-product1subscript𝑧𝑖𝑡subscript𝑔𝑖𝑡subscript𝑖𝑡~𝑓dsubscript𝑋𝑖𝑡d𝑡\displaystyle=(1-z_{i}(t))\odot(g_{i}(t)-h_{i}(t))\tilde{f}\left(\frac{\mathrm% {d}X_{i}(t)}{\mathrm{d}t}\right),= ( 1 - italic_z start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ) ⊙ ( italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ) over~ start_ARG italic_f end_ARG ( divide start_ARG roman_d italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG ) ,
hi(t)subscript𝑖𝑡\displaystyle h_{i}(t)italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) ={ϕi(t),t0hi(0)+0tdhi(t)dtdt,t>0.\displaystyle=\left\{\begin{aligned} &\phi_{i}(t),&t\leq 0\\ &h_{i}(0)+\int_{0}^{t}\frac{\mathrm{d}h_{i}(t)}{\mathrm{d}t}\mathrm{d}t,&t>0.% \end{aligned}\right.= { start_ROW start_CELL end_CELL start_CELL italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) , end_CELL start_CELL italic_t ≤ 0 end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( 0 ) + ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT divide start_ARG roman_d italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG roman_d italic_t , end_CELL start_CELL italic_t > 0 . end_CELL end_ROW

4.4. Continuous Output Module

We employ another STDDE to generate the final outputs. In this approach, we consider the last stage of the hidden flow in the input process as the history function for the output process. This strategy offers two key advantages: Firstly, the hidden states remain continuous within the manifold space, ensuring unity between the input and output processes. Secondly, unlike traditional output layers that provide predictions with a fixed horizon, we can accurately predict traffic flow at various frequencies. It provides more flexibility and adaptability to different scenarios.

(16) dh(t)dtd𝑡d𝑡\displaystyle\frac{\mathrm{d}h(t)}{\mathrm{d}t}divide start_ARG roman_d italic_h ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG =(1z(t))(g(t)h(t))absentdirect-product1𝑧𝑡𝑔𝑡𝑡\displaystyle=(1-z(t))\odot(g(t)-h(t))= ( 1 - italic_z ( italic_t ) ) ⊙ ( italic_g ( italic_t ) - italic_h ( italic_t ) )
hi(t)subscript𝑖superscript𝑡\displaystyle h_{i}(t^{\prime})italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) =hi(T)+Ttdhi(t)dtdt,absentsubscript𝑖𝑇superscriptsubscript𝑇superscript𝑡dsubscript𝑖𝑡d𝑡differential-d𝑡\displaystyle=h_{i}(T)+\int_{T}^{t^{\prime}}\frac{\mathrm{d}h_{i}(t)}{\mathrm{% d}t}\mathrm{d}t,= italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_T ) + ∫ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT divide start_ARG roman_d italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG roman_d italic_t ,
yi(t)subscript𝑦𝑖superscript𝑡\displaystyle y_{i}(t^{\prime})italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) =f(hi(t)),absent𝑓subscript𝑖superscript𝑡\displaystyle=f\left(h_{i}(t^{\prime})\right),= italic_f ( italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) ) ,
Yisubscript𝑌𝑖\displaystyle Y_{i}italic_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT =[yi(tT+1),yi(tT+2),,yi(tT+T)]absentsubscript𝑦𝑖subscript𝑡𝑇1subscript𝑦𝑖subscript𝑡𝑇2subscript𝑦𝑖subscript𝑡𝑇superscript𝑇\displaystyle=\left[y_{i}(t_{T+1}),y_{i}(t_{T+2}),\cdots,y_{i}(t_{T+T^{\prime}% })\right]= [ italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_T + 1 end_POSTSUBSCRIPT ) , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_T + 2 end_POSTSUBSCRIPT ) , ⋯ , italic_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_T + italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT ) ]

where f𝑓fitalic_f is a map** function to get final outputs from hidden states, and tsuperscript𝑡t^{\prime}italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is any target output point. This output approach better highlights the continuity of STDDE and fully capitalizes on its capabilities. With this model, we have the flexibility to generate predictions at any time, rather than being limited to a specific point. The continuous output module is well-suited for scenarios involving sparse traffic sensor data, especially when a higher level of precision is required during inference than in the training phase.

Finally, as our objective function in the context of traffic flow forecasting, we employ the widely-used Huber loss, which is known for its robustness in handling outliers compared to the squared error loss.

(17) (Y,Y^)={12(YY^)2,|YY^|δδ|YY^|12δ2,otherwise\mathcal{L}(Y,\hat{Y})=\left\{\begin{aligned} &\frac{1}{2}(Y-\hat{Y})^{2}&,\ &% |Y-\hat{Y}|\leq\delta\\ &\delta|Y-\hat{Y}|-\frac{1}{2}\delta^{2}&,\ &\text{otherwise}\\ \end{aligned}\right.caligraphic_L ( italic_Y , over^ start_ARG italic_Y end_ARG ) = { start_ROW start_CELL end_CELL start_CELL divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( italic_Y - over^ start_ARG italic_Y end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL , end_CELL start_CELL | italic_Y - over^ start_ARG italic_Y end_ARG | ≤ italic_δ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_δ | italic_Y - over^ start_ARG italic_Y end_ARG | - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_δ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL , end_CELL start_CELL otherwise end_CELL end_ROW

where Y^^𝑌\hat{Y}over^ start_ARG italic_Y end_ARG is the ground truth, and δ𝛿\deltaitalic_δ is a hyperparameter which controls the sensitivity to outliers.

The time complexity analysis of STDDE is presented in the Appendix.

4.5. Why It Works?

4.5.1. Connection to Existing Works

STDDE includes both delay effects and continuity into a unified delay differential equation framework, which explicitly models the time delay in spatial information propagation. A prior related study on delay-aware traffic forecasting is PDFormer (Jiang et al., 2023). While PDFormer mentions the concept of time delay, its core mechanism involves utilizing attention with the historical time series, rather than explicitly utilizing delay in the propagation process. It implies that it still cannot capture the delay information in both spatial and temporal views.

Then we analyze the generalizability of STDDE from two perspectives: 1) In the temporal dimension, where GRU and its variants (Fu et al., 2016) can be considered as special cases of STDDE when integration dhi(t)dt𝑑subscript𝑖𝑡𝑑𝑡\int\frac{dh_{i}(t)}{dt}∫ divide start_ARG italic_d italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG italic_d italic_t end_ARG is discrete. 2) In the spatial dimension, when all the time-delays τijsubscript𝜏𝑖𝑗\tau_{ij}italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT are set to zero, STDDE will degenerate to general GNNs (Ju et al., 2023; Song et al., 2020b).

4.5.2. Stability

Stability is a critical property for a DDE. Here we provide a theoretical analysis of our proposed DDE.

Definition 1 ().

A delay differential equation is linked to a characteristic equation. If the real parts of all characteristic roots of the associated equation are negative, the delay differential equation is considered asymptotically stable.

Theorem 1 ().

The proposed DDE is asymptotically stable when the balance constant c1/K𝑐1𝐾c\leq 1/Kitalic_c ≤ 1 / italic_K.

Proof.

Please see the Appendix for more details about the definition and the proof. ∎

5. Experiments

5.1. Datasets

We verify the performance of STDDE on six real-world traffic datasets, namely PeMSD7 (M), PeMSD7 (L), PeMS03, PeMS04, PeMS07, and PeMS08, which are collected by Caltrans Performance Measurement System in real-time every 30 seconds (Chen et al., 2001) and aggregated into 5-min intervals, which means there are 288 time-steps for one day. More details of the datasets are listed in Table 1. We standardize the input by removing the mean and scaling to unit variance.

Datasets #Sensors #Edges Time Steps
PeMSD7 (M) 228 1132 12672
PeMSD7 (L) 1026 10150 12672
PeMS03 358 547 26208
PeMS04 307 340 16992
PeMS07 883 866 28224
PeMS08 170 295 17856
Table 1. The summary of datasets used in our paper.

5.2. Baselines

We select the following representative baselines as our competitors, and more details can be found in the Appendix:

  • Spatio-Temporal Graph Convolution Models including STGCN (Yu et al., 2018), STSGCN (Song et al., 2020a), DCRNN (Li et al., 2018), AGCRN (Bai et al., 2020), ASTGCN (Guo et al., 2019), FOGS (Rao et al., 2022). STGCN utilizes graph convolution and 1D convolution to capture spatial-temporal correlations. STSGCN utilizes multiple localized spatial-temporal subgraph modules to capture the spatial-temporal correlations directly. DCRNN integrates graph convolution into an encoder-decoder gated recurrent unit. AGCRN captures node-specific spatial and temporal correlations automatically without a pre-defined graph. ASTGCN utilizes spatial and temporal attention mechanisms to model their dynamics. DSTAGN constructs a dynamic graph instead of relying on a pre-defined static one. FOGS utilizes first-order gradients rather than specific flows, which effectively circumvent issues associated with fitting irregularly-shaped distributions.

  • Spatial-Temporal Graph Ordinary Differential Equation Models, including STG-ODE (Fang et al., 2021) and STG-NCDE (Choi et al., 2021). STGODE proposes an ordinary differential equation-based continuous GNN, to capture long-range spatial-temporal dependencies. STG-NCDE designs two NCDEs to capture temporal and spatial properties respectively.

  • Delay-aware Traffic Models only include one related work, which is PDFormer (Jiang et al., 2023). Its transformer-based mechanism involves utilizing attention to the historical time series.

5.3. Experimental Settings

We split all datasets with a ratio of 6: 2: 2 into training sets, validation sets, and test sets. One hour of historical data is used to predict traffic conditions in the next 60 minutes, i.e. T=12𝑇12T=12italic_T = 12 and T=12superscript𝑇12T^{\prime}=12italic_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = 12. All experiments are conducted on the same Linux server and GPU. The dimension of hidden states is set to 64. We train our model using the Adam optimizer with a learning rate of 0.001. The batch size is 32 and the training epoch is 200. Mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE) are used to measure the performance. For baselines, we use their officially reported results if accessible. If not, we run their codes based on their recommendation configurations.

Dataset Metric STGCN DCRNN ASTGCN(r) STSGCN STGODE AGCRN STG-NCDE DSTAGNN FOGS PDFormer STDDE
RMSE 7.55 7.18 6.87 5.93 5.66 5.54 5.39 5.54 5.54 5.60 5.19
PeMSD7(M) MAE 4.01 3.83 3.61 3.01 2.97 2.79 2.68 2.78 2.76 2.81 2.56
MAPE 9.67 9.81 8.84 7.55 7.36 7.02 6.76 6.93 6.83 7.06 6.61
RMSE 8.28 8.33 7.64 6.88 5.98 5.92 5.76 5.98 6.04 5.90 5.63
PeMSD7(L) MAE 4.84 4.33 4.09 3.61 3.22 2.99 2.87 2.98 2.96 2.92 2.77
MAPE 11.76 11.41 10.25 9.13 7.94 7.59 7.31 7.50 7.48 7.54 7.26
RMSE 30.42 30.31 29.56 29.21 27.84 28.25 27.09 27.39 24.85 25.96 24.52
PeMS03 MAE 17.55 17.99 17.34 17.48 16.50 15.98 15.57 15.62 15.06 14.95 15.03
MAPE 17.43 18.34 17.21 16.78 16.69 15.23 15.06 14.74 15.03 15.58 14.69
RMSE 36.01 37.65 35.22 33.65 32.82 32.26 31.09 31.71 31.29 29.96 29.86
PeMS04 MAE 22.66 24.63 22.94 21.19 20.84 19.83 19.21 19.38 19.44 18.31 18.11
MAPE 14.34 17.01 16.43 13.90 13.77 12.97 12.76 12.77 12.81 12.07 12.07
RMSE 39.34 38.61 37.87 39.03 37.54 36.55 33.84 34.88 34.09 32.80 32.59
PeMS07 MAE 25.33 25.22 24.01 24.26 22.99 22.37 20.53 21.62 20.79 19.78 19.47
MAPE 11.21 11.82 10.73 10.21 10.14 9.12 8.80 9.24 8.75 8.54 8.49
RMSE 27.88 27.83 26.22 26.80 25.97 25.22 24.81 25.08 25.36 24.61 24.31
PeMS08 MAE 18.11 17.46 16.64 17.13 16.81 15.95 15.45 15.85 16.10 15.66 15.12
MAPE 11.34 11.39 10.6 10.96 10.62 10.09 9.92 9.93 9.85 9.61 9.74
Table 2. Performance comparison of baselines and proposed STDDE on six popular used real-world traffic datasets.

5.4. Conceptual Experiments

We first provide conceptual experiments to evaluate the necessity of motivation and the effectiveness of the proposed delay-aware differential equation and continuous output module.

5.4.1. Quantitative and Qualitative Validation of Time-delay

For quantitative validation, we design an invariant of our model, STDDE-no-delay, which sets all the delays as zero for comparison. We compute the average delay of each node in PEMS04 and choose the first 15%percent1515\%15 % and the last 15%percent1515\%15 % nodes after sorting the average delay value in ascending order, to compare the performances between STDDE and STDDE-no-delay. Table 3 shows the result. We find that the results of the nodes with larger average delay are much worse than that of nodes with small delays (MAPE is not a stable metric because it is susceptible to the small values), which indicates the difficulty of dealing with long-range correlations. And STDDE achieves a larger improvement for the long-delay nodes due to its ability to model delay effects.

For qualitative validation, we provide a case study to evaluate the effectiveness of the proposed STDDE in capturing time delay in traffic flow forecasting, we carry out a case study in the real-world dataset. We select two connected neighbor nodes 196 and 198 from PeMS04 dataset to visualize the STDDE’s perception with time delay in traffic flow forecasting. Results are shown in Fig 4, the prediction results of STDDE are remarkably closer to the ground truth than STDDE-no-delay. In addition, there is a huge rise in node 196’s traffic waveform in Fig 4 (a), and the result in (b) shows that STDDE-no-delay performs inaccurate feedback while STDDE does not. It further shows STDDE is able to capture and utilize the time delay information in traffic flow forecasting.

Data Metric STDDE-no-delay STDDE Gain
RMSE 16.97 16.86 0.65%
First 15%percent1515\%15 % MAE 11.54 11.47 0.61%
MAPE 19.71 18.37 6.80%
RMSE 37.72 34.59 8.30%
Last 15%percent1515\%15 % MAE 25.06 23.24 7.26%
MAPE 14.63 13.96 4.65%
Table 3. Performances facing delays of different extent.
Refer to caption
Refer to caption
Figure 4. Comparison of prediction results between our model and STDDE-no-delay.

5.4.2. Flexibility Validation of the Continuous-output Module

To test the flexibility of our model in real-world scenarios, we introduced a more challenging setting. We still have historical 60-min data to predict the traffic flow in the next 60 minutes. However, during the training process, we set the time interval as 10/15/20 minutes, which means the input steps are 6/4/3, and during the inference process, we change the time interval to 5 minutes. This configuration rigorously assesses the model’s adaptability. For STDDE, owing to its continuity, we only need to increase the number of chosen states in the output module, from 6/4/3 to 12. For baseline models, we first acquire their prediction and then adopt the linear interpolation method to acquire a more fine-grained output. The results are presented in Figure 5. In summary, the performance will degrade with the increase of the input interval. The performance of STDDE is significantly better than that of the baseline models. Compared to the linear interpolation method, the STDDE output module can model the inherent continuity and generate more accurate predictions.

Refer to caption
(a) RMSE
Refer to caption
(b) MAE
Refer to caption
(c) MAPE
Figure 5. Performance comparison with input time intervals greater than inference intervals.

5.5. Overall Performances and Analysis

Table 2 shows the results of the proposed STDDE model and competitive baselines on traffic flow forecasting tasks in six popular used real-world datasets. We conclude with the following findings:

  • Our model yields the best performance regarding all the metrics for most datasets, which suggests the effectiveness of our spatial-temporal delay traffic flow forecasting.

  • Continuous spatial-temporal neural networks, i.e. STGODE, STG-NCDE, and STDDE, perform better than traditional GNN-based ones, such as popularly used AGCRN, STGCN, DCRNN, and DSTAGNN. It shows the direction of continuous modeling in spatial-temporal traffic flow forecasting is effective and worth gaining more attention.

  • The proposed STDDE and PDFormer generally perform better than other continuous spatial-temporal methods, i.e. STGODE and STG-NCDE, which indicates that capturing and utilizing historical delay-related information is necessary and of great significance.

  • STDDE gains better performance than PDFormer, which shows the effectiveness of explicit spatial-temporal delay-aware differential equations and continuous modeling.

Model # Parameters PeMSD7 (M) PeMSD7 (L)
Train Infer Train Infer
STGODE 328,646 131 13 1107 146
FOGS 1,674,188 50 3 531 42
DSTAGNN 2,784,988 168 43 1222 209
PDFormer 531,165 120 11 1292 138
STDDE 175,830 82 9 734 84
Table 4. Comparison of # parameters and running time in one epoch. (Unit: seconds)

5.6. Model Analysis

5.6.1. Ablation Studies

To verify the effectiveness of different modules of STDDE, we conduct the following ablation experiments on PeMS04 dataset and compare results with its corresponding variants.

  • v1 (STDDE-no-delay): We ignore the delay effect, and thus the model degenerates to an ODE model, to verify whether capturing the time delay signal is contributing.

  • v2 (STDDE-zero-history): We set the history function as zero to verify the necessity of learnable history states.

  • v3 (STDDE-fixed-delay): We use the pre-processed delay values as the inputs of STDDE.

Refer to caption
Figure 6. Ablation experiments of STDDE.

The result is presented in Fig 6. The result shows that STDDE-no-delay has a significant performance gap with STDDE, which shows the necessity of utilizing time delay. Also, STDDE-fixed-delay performs worse than STDDE, which clearly shows the superiority of learnable delay values. In addition, STDDE performs better than STDDE-zero-history, because the historical states of DDE is critical to the update of a period of future states.

5.6.2. Model Efficiency Analysis

We conduct model efficiency analysis on STDDE and several representative baselines, i.e. STGODE, DSTAGNN, FOGS and PDFormer, in PeMSD7 (M) and PeMSD7 (L) datasets. Tab. 4 reports the number of parameters, average training, and inference time per epoch. We find that STDDE achieves competitive computational efficiency in both the training and inference phases. In the largest dataset PeMSD7 (L), compared with the best-performing PDFormer, STDDE reduces the training and inference time by over 40% and 20%, respectively.

5.6.3. Parameter Analysis

We analyze the dimension of hidden states in STDDE, which influences the complexity of the state space. Fig 7 shows the result on dataset PEMS04. The performance rises with the increase of hidden dimension and achieves the best when the dimension is 128. Considering the balance of effectiveness and efficiency, we set the dimension as 64 in our model.

Refer to caption
Figure 7. STDDE results with the change of hidden size.

6. Conclusion

In this paper, we propose STDDE which includes both delay effects and continuity into a unified delay differential equation framework. It explicitly models the time delay in spatial information propagation. To gain more accurate depictions of the time delay, we design a traffic-graph time-delay estimator. In addition, we propose a continuous output module, allowing us to accurately predict traffic flow at various frequencies, which provides more flexibility and adaptability to different scenarios. Experimental results show the effectiveness and efficiency of STDDE.

Acknowledgments

This research was supported by the Natural Science Foundation of China under Grant No. 61836013 and grants from the Strategic Priority Research Program of the Chinese Academy of Sciences XDB38030300.

References

  • (1)
  • Aryandoust et al. (2019) Arsam Aryandoust, Oscar van Vliet, and Anthony Patt. 2019. City-scale car traffic and parking density maps from Uber Movement travel time data. Scientific data 6, 1 (2019), 158.
  • Azaria and Hertz (1984) Mordechai Azaria and David Hertz. 1984. Time delay estimation by generalized cross correlation methods. IEEE Transactions on Acoustics, Speech, and Signal Processing 32, 2 (1984), 280–285.
  • Bai et al. (2020) Lei Bai, Lina Yao, Can Li, Xianzhi Wang, and Can Wang. 2020. Adaptive graph convolutional recurrent network for traffic forecasting. arXiv preprint arXiv:2007.02842 (2020).
  • Benidis et al. (2022) Konstantinos Benidis, Syama Sundar Rangapuram, Valentin Flunkert, Yuyang Wang, Danielle Maddix, Caner Turkmen, Jan Gasthaus, Michael Bohlke-Schneider, David Salinas, Lorenzo Stella, et al. 2022. Deep learning for time series forecasting: Tutorial and literature survey. Comput. Surveys 55, 6 (2022), 1–36.
  • Chavhan and Venkataram (2020) Suresh Chavhan and Pallapa Venkataram. 2020. Prediction based traffic management in a metropolitan area. Journal of traffic and transportation engineering (English edition) 7, 4 (2020), 447–466.
  • Chen et al. (2001) Chao Chen, Karl Petty, Alexander Skabardonis, Pravin Varaiya, and Zhanfeng Jia. 2001. Freeway performance measurement system: mining loop detector data. Transportation Research Record 1748, 1 (2001), 96–102.
  • Chen et al. (2018) Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. 2018. Neural ordinary differential equations. arXiv preprint arXiv:1806.07366 (2018).
  • Choi et al. (2021) Jeongwhan Choi, Hwangyong Choi, Jeehyun Hwang, and Noseong Park. 2021. Graph Neural Controlled Differential Equations for Traffic Forecasting. arXiv preprint arXiv:2112.03558 (2021).
  • De Brouwer et al. (2019) Edward De Brouwer, Jaak Simm, Adam Arany, and Yves Moreau. 2019. Gru-ode-bayes: Continuous modeling of sporadically-observed time series. Advances in neural information processing systems 32 (2019).
  • Dupont et al. (2019) Emilien Dupont, Arnaud Doucet, and Yee Whye Teh. 2019. Augmented neural odes. Advances in Neural Information Processing Systems 32 (2019).
  • Fang et al. (2020) Xiaomin Fang, Jizhou Huang, Fan Wang, Lingke Zeng, Hai** Liang, and Haifeng Wang. 2020. Constgat: Contextual spatial-temporal graph attention network for travel time estimation at baidu maps. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2697–2705.
  • Fang et al. (2021) Zheng Fang, Qingqing Long, Guojie Song, and Kunqing Xie. 2021. Spatial-temporal graph ode networks for traffic flow forecasting. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 364–373.
  • Fang et al. (2022) Zheng Fang, Lingjun Xu, Guojie Song, Qingqing Long, and Yingxue Zhang. 2022. Polarized graph neural networks. In Proceedings of the ACM Web Conference 2022. 1404–1413.
  • Fu et al. (2016) Rui Fu, Zuo Zhang, and Li Li. 2016. Using LSTM and GRU neural network methods for traffic flow prediction. In 2016 31st Youth academic annual conference of Chinese association of automation (YAC). IEEE, 324–328.
  • Guo et al. (2019) Shengnan Guo, Youfang Lin, Ning Feng, Chao Song, and Huaiyu Wan. 2019. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 922–929.
  • Islam et al. (2022) Jaminur Islam, Jose Paolo Talusan, Shameek Bhattacharjee, Francis Tiausas, Sayyed Mohsen Vazirizade, Abhishek Dubey, Keiichi Yasumoto, and Sajal K Das. 2022. Anomaly based Incident Detection in Large Scale Smart Transportation Systems. In 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS). IEEE, 215–224.
  • Jiang et al. (2023) Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, and **gyuan Wang. 2023. PDFormer: Propagation Delay-aware Dynamic Long-range Transformer for Traffic Flow Prediction. arXiv preprint arXiv:2301.07945 (2023).
  • Ju et al. (2023) Wei Ju, Zheng Fang, Yiyang Gu, Zequn Liu, Qingqing Long, Ziyue Qiao, Yifang Qin, Jianhao Shen, Fang Sun, Zhi** Xiao, et al. 2023. A Comprehensive Survey on Deep Graph Representation Learning. arXiv preprint arXiv:2304.05055 (2023).
  • Ju et al. (2024) Wei Ju, Siyu Yi, Yifan Wang, Qingqing Long, Junyu Luo, Zhi** Xiao, and Ming Zhang. 2024. A Survey of Data-Efficient Graph Learning. arXiv preprint arXiv:2402.00447 (2024).
  • Kidger et al. (2020) Patrick Kidger, James Morrill, James Foster, and Terry Lyons. 2020. Neural controlled differential equations for irregular time series. Advances in Neural Information Processing Systems 33 (2020), 6696–6707.
  • Kipf and Welling (2016) Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
  • Lan et al. (2022) Shiyong Lan, Yitong Ma, Weikang Huang, Wenwu Wang, Hongyu Yang, and Pyang Li. 2022. Dstagnn: Dynamic spatial-temporal aware graph neural network for traffic flow forecasting. In International conference on machine learning. PMLR, 11906–11917.
  • Lechner and Hasani (2020) Mathias Lechner and Ramin Hasani. 2020. Learning long-term dependencies in irregularly-sampled time series. arXiv preprint arXiv:2006.04418 (2020).
  • Li et al. (2017) Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2017. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926 (2017).
  • Li et al. (2018) Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In International Conference on Learning Representations.
  • Liu et al. (2020) Jielun Liu, Ghim ** Ong, and Xiqun Chen. 2020. GraphSAGE-based traffic speed forecasting for segment network with sparse data. IEEE Transactions on Intelligent Transportation Systems 23, 3 (2020), 1755–1766.
  • Long et al. (2020) Qingqing Long, Yilun **, Guojie Song, Yi Li, and Wei Lin. 2020. Graph Structural-topic Neural Network. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1065–1073.
  • Long et al. (2021a) Qingqing Long, Yilun **, Yi Wu, and Guojie Song. 2021a. Theoretically improving graph neural networks via anonymous walk graph kernels. In Proceedings of the Web Conference 2021. 1204–1214.
  • Long et al. (2021b) Qingqing Long, Lingjun Xu, Zheng Fang, and Guojie Song. 2021b. HGK-GNN: Heterogeneous Graph Kernel based Graph Neural Networks. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 1129–1138.
  • Lu et al. (2020) Bin Lu, Xiaoying Gan, Haiming **, Luoyi Fu, and Haisong Zhang. 2020. Spatiotemporal adaptive gated graph convolution network for urban traffic flow forecasting. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 1025–1034.
  • Mozyrska et al. (2009) Dorota Mozyrska, Ewa Pawluszewicz, and Delfim FM Torres. 2009. The Riemann-Stieltjes integral on time scales. arXiv preprint arXiv:0903.1224 (2009).
  • Nagy and Simon (2018) Attila M Nagy and Vilmos Simon. 2018. Survey on traffic prediction in smart cities. Pervasive and Mobile Computing 50 (2018), 148–163.
  • Ran and Boyce (2012) Bin Ran and David Boyce. 2012. Modeling dynamic transportation networks: an intelligent transportation system oriented approach. Springer Science & Business Media.
  • Rao et al. (2022) Xuan Rao, Hao Wang, Liang Zhang, **g Li, Shuo Shang, and Peng Han. 2022. FOGS: First-order gradient supervision with learning-based graph for traffic flow forecasting. In Proceedings of International Joint Conference on Artificial Intelligence, IJCAI. ijcai. org.
  • Rubanova et al. (2019) Yulia Rubanova, Ricky TQ Chen, and David K Duvenaud. 2019. Latent ordinary differential equations for irregularly-sampled time series. Advances in neural information processing systems 32 (2019).
  • Schuster and Paliwal (1997) Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45, 11 (1997), 2673–2681.
  • Söderlind (2006) Gustaf Söderlind. 2006. The logarithmic norm. History and modern theory. BIT Numerical Mathematics 46, 3 (2006), 631–652.
  • Song et al. (2020a) Chao Song, Youfang Lin, Shengnan Guo, and Huaiyu Wan. 2020a. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 914–921.
  • Song et al. (2020b) Guojie Song, Qingqing Long, Yi Luo, Yiming Wang, and Yilun **. 2020b. Deep convolutional neural network based medical concept normalization. IEEE Transactions on Big Data 8, 5 (2020), 1195–1208.
  • Ström (1975) Torsten Ström. 1975. On logarithmic norms. SIAM J. Numer. Anal. 12, 5 (1975), 741–753.
  • Sun (2006) Le** Sun. 2006. Stability analysis for delay differential equations with multidelays and numerical examples. Mathematics of computation 75, 253 (2006), 151–165.
  • Wang et al. (2019) Minjie Wang, Lingfan Yu, Da Zheng, Quan Gan, Yu Gai, Zihao Ye, Mufei Li, ****g Zhou, Qi Huang, Chao Ma, et al. 2019. Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs. (2019).
  • Wu et al. (2019) Zifeng Wu, Chunhua Shen, and Anton Van Den Hengel. 2019. Wider or deeper: Revisiting the resnet model for visual recognition. Pattern Recognition 90 (2019), 119–133.
  • Yorke (1970) James A Yorke. 1970. Asymptotic stability for one dimensional differential-delay equations. Journal of Differential equations 7, 1 (1970), 189–202.
  • Yu et al. (2018) Bing Yu, Haoteng Yin, and Zhanxing Zhu. 2018. Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 3634–3640.
  • Yuan et al. (2021) Haitao Yuan, Guoliang Li, Zhifeng Bao, and Ling Feng. 2021. An effective joint prediction model for travel demands and traffic flows. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 348–359.
  • Zhang et al. (2019) Weibin Zhang, Yinghao Yu, Yong Qi, Feng Shu, and Yinhai Wang. 2019. Short-term traffic flow prediction based on spatio-temporal analysis and CNN deep learning. Transportmetrica A: Transport Science 15, 2 (2019), 1688–1711.
  • Zhou et al. (2020) Zhengyang Zhou, Yang Wang, Xike Xie, Lianliang Chen, and Chaochao Zhu. 2020. Foresee urban sparse traffic accidents: A spatiotemporal multi-granularity perspective. IEEE Transactions on Knowledge and Data Engineering 34, 8 (2020), 3786–3799.
  • Zhu et al. (2021) Qunxi Zhu, Yao Guo, and Wei Lin. 2021. Neural delay differential equations. arXiv preprint arXiv:2102.10801 (2021).

Appendix A Appendix

A.1. Time Complexity Analysis

The time complexity of STDDE is mainly associated with the ODE module, and the complexity of ODE depends on the solving algorithm. In this paper, we employ the Euler solver, so the complexity of aggregation is inversely proportional to the step length η𝜂\etaitalic_η. As we have T𝑇Titalic_T time slots, the aggregation needs to be calculated Tη𝑇𝜂\frac{T}{\eta}divide start_ARG italic_T end_ARG start_ARG italic_η end_ARG times. As the complexity of neighbor aggregation is O(E)𝑂𝐸O(E)italic_O ( italic_E ), the algorithm’s complexity finally becomes O(TEη)𝑂𝑇𝐸𝜂O(\frac{TE}{\eta})italic_O ( divide start_ARG italic_T italic_E end_ARG start_ARG italic_η end_ARG ).

A.2. Theoretical Analysis

Here we discuss the stability of the proposed DDE. We begin with introducing the basic definitions and lemmas.

Definition 2 ().

The logarithmic norm μ𝜇\muitalic_μ of a square matrix A𝐴Aitalic_A is defined as

(18) μp(A)=limδ0+I+δAp1δ,subscript𝜇𝑝𝐴subscript𝛿superscript0subscriptnorm𝐼𝛿𝐴𝑝1𝛿\mu_{p}(A)=\lim_{\delta\longrightarrow 0^{+}}\frac{||I+\delta A||_{p}-1}{% \delta},italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_A ) = roman_lim start_POSTSUBSCRIPT italic_δ ⟶ 0 start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_POSTSUBSCRIPT divide start_ARG | | italic_I + italic_δ italic_A | | start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT - 1 end_ARG start_ARG italic_δ end_ARG ,

where I𝐼Iitalic_I is the identity matrix of the same dimension as A𝐴Aitalic_A, and ||*||p||*||_{p}| | * | | start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT denotes an induced matrix norm.

The logarithmic norm is widely used in differential equation analysis (Ström, 1975; Söderlind, 2006), and plays an important role in our analysis.

Lemma 1 ().

(Ström, 1975) Let A=(aij)n×n𝐴subscript𝑎𝑖𝑗superscript𝑛𝑛A=(a_{ij})\in\mathbb{R}^{n\times n}italic_A = ( italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT, then

(19) μ1(A)subscript𝜇1𝐴\displaystyle\mu_{1}(A)italic_μ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_A ) =maxj[ajj+i,ij|aij|],absentsubscript𝑗subscript𝑎𝑗𝑗subscript𝑖𝑖𝑗subscript𝑎𝑖𝑗\displaystyle=\max_{j}\left[a_{jj}+\sum_{i,i\neq j}|a_{ij}|\right],= roman_max start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT [ italic_a start_POSTSUBSCRIPT italic_j italic_j end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i , italic_i ≠ italic_j end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | ] ,
(20) μ2(A)subscript𝜇2𝐴\displaystyle\mu_{2}(A)italic_μ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_A ) =12maxi[λi(A+AT)],absent12subscript𝑖subscript𝜆𝑖𝐴superscript𝐴𝑇\displaystyle=\frac{1}{2}\max_{i}\left[\lambda_{i}(A+A^{T})\right],= divide start_ARG 1 end_ARG start_ARG 2 end_ARG roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_A + italic_A start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ) ] ,
(21) μ(A)subscript𝜇𝐴\displaystyle\mu_{\infty}(A)italic_μ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_A ) =maxi[aii+j,ji|aij|].absentsubscript𝑖subscript𝑎𝑖𝑖subscript𝑗𝑗𝑖subscript𝑎𝑖𝑗\displaystyle=\max_{i}\left[a_{ii}+\sum_{j,j\neq i}|a_{ij}|\right].= roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT [ italic_a start_POSTSUBSCRIPT italic_i italic_i end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_j , italic_j ≠ italic_i end_POSTSUBSCRIPT | italic_a start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT | ] .

From theorem1, we see that μp(A)subscript𝜇𝑝𝐴\mu_{p}(A)italic_μ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ( italic_A ) is easy to calculate for p=1,𝑝1p=1,\inftyitalic_p = 1 , ∞, or estimate for p=2𝑝2p=2italic_p = 2, which brings convenience.

Definition 3 ().

A linear multi-delay differential equation is defined as

(22) dh(t)dt=A0h(t)+k=1KAkh(tτk),d𝑡d𝑡subscript𝐴0𝑡superscriptsubscript𝑘1𝐾subscript𝐴𝑘subscript𝑡subscript𝜏𝑘\frac{\mathrm{d}h(t)}{\mathrm{d}t}=A_{0}h(t)+\sum_{k=1}^{K}A_{k}h(t_{\tau_{k}}),divide start_ARG roman_d italic_h ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG = italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT italic_h ( italic_t ) + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_h ( italic_t start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ,

where h(t)=(h0(t),,hN(t)h(t)=(h_{0}(t),\cdots,h_{N}(t)italic_h ( italic_t ) = ( italic_h start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t ) , ⋯ , italic_h start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t ) is a vector, A0,Akn×nsubscript𝐴0subscript𝐴𝑘superscript𝑛𝑛A_{0},A_{k}\in\mathbb{R}^{n\times n}italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT are constant matrix, and h(tτk)=(h1(tτk1),,hN(tτkN)h(t_{\tau_{k}})=(h_{1}(t-\tau_{k1}),\cdots,h_{N}(t-\tau_{kN})italic_h ( italic_t start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = ( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t - italic_τ start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT ) , ⋯ , italic_h start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ( italic_t - italic_τ start_POSTSUBSCRIPT italic_k italic_N end_POSTSUBSCRIPT ).

Taking the simplest two-variable delay differential equations as an example,

(23) {du(t)dt=a1u(t)+b1v(tτ2),dv(t)dt=a2u(t)+b2v(tτ1),\left\{\begin{aligned} \frac{\mathrm{d}u(t)}{\mathrm{d}t}&=a_{1}u(t)+b_{1}v(t-% \tau_{2}),\\ \frac{\mathrm{d}v(t)}{\mathrm{d}t}&=a_{2}u(t)+b_{2}v(t-\tau_{1}),\end{aligned}\right.{ start_ROW start_CELL divide start_ARG roman_d italic_u ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG end_CELL start_CELL = italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_u ( italic_t ) + italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_v ( italic_t - italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL divide start_ARG roman_d italic_v ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG end_CELL start_CELL = italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_u ( italic_t ) + italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_v ( italic_t - italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , end_CELL end_ROW

by letting

(24) h(t)=(u(t),v(t))T,h(tτ)=(u(tτ1),v(tτ2))T,formulae-sequence𝑡superscript𝑢𝑡𝑣𝑡𝑇subscript𝑡𝜏superscript𝑢𝑡subscript𝜏1𝑣𝑡subscript𝜏2𝑇h(t)=(u(t),v(t))^{T},h(t_{\tau})=(u(t-\tau_{1}),v(t-\tau_{2}))^{T},italic_h ( italic_t ) = ( italic_u ( italic_t ) , italic_v ( italic_t ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT , italic_h ( italic_t start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) = ( italic_u ( italic_t - italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_v ( italic_t - italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ,
(25) A=(a100a2),B=(0b1b20),formulae-sequence𝐴matrixsubscript𝑎100subscript𝑎2𝐵matrix0subscript𝑏1subscript𝑏20A=\begin{pmatrix}a_{1}&0\\ 0&a_{2}\end{pmatrix},B=\begin{pmatrix}0&b_{1}\\ b_{2}&0\end{pmatrix},italic_A = ( start_ARG start_ROW start_CELL italic_a start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_a start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) , italic_B = ( start_ARG start_ROW start_CELL 0 end_CELL start_CELL italic_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL 0 end_CELL end_ROW end_ARG ) ,

we have

(26) dh(t)dt=Ay(t)+By(tτ).d𝑡d𝑡𝐴𝑦𝑡𝐵𝑦subscript𝑡𝜏\frac{\mathrm{d}h(t)}{\mathrm{d}t}=Ay(t)+By(t_{\tau}).divide start_ARG roman_d italic_h ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG = italic_A italic_y ( italic_t ) + italic_B italic_y ( italic_t start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ) .
Theorem 1 ().

The proposed DDE is bounded between two linear multi-delay systems.

Proof.

Without considering the control signal in Eqn. (9), we have

(27) 0dhi(t)dtgi(t)hi(t).0dsubscript𝑖𝑡d𝑡subscript𝑔𝑖𝑡subscript𝑖𝑡0\leq\frac{\mathrm{d}h_{i}(t)}{\mathrm{d}t}\leq g_{i}(t)-h_{i}(t).0 ≤ divide start_ARG roman_d italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG ≤ italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) .

The lower bound is easy, and we reformulate the upper bound as

(28) gi(t)hi(t)=hi(t)+cj𝒩iαijhj(tτij)subscript𝑔𝑖𝑡subscript𝑖𝑡subscript𝑖𝑡𝑐subscript𝑗subscript𝒩𝑖subscript𝛼𝑖𝑗subscript𝑗𝑡subscript𝜏𝑖𝑗g_{i}(t)-h_{i}(t)=-h_{i}(t)+c\sum_{j\in\mathcal{N}_{i}}\alpha_{ij}h_{j}(t-\tau% _{ij})italic_g start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) - italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) = - italic_h start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_t ) + italic_c ∑ start_POSTSUBSCRIPT italic_j ∈ caligraphic_N start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT italic_h start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ( italic_t - italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT )

Further, let H(t)=(h1(t),h2(t),,hn(t))T𝐻𝑡superscriptsubscript1𝑡subscript2𝑡subscript𝑛𝑡𝑇H(t)=(h_{1}(t),h_{2}(t),\cdots,h_{n}(t))^{T}italic_H ( italic_t ) = ( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t ) , italic_h start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_t ) , ⋯ , italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t ) ) start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, we have

(29) dH(t)dtIH(t)+k=1KAkH(tτk),d𝐻𝑡d𝑡𝐼𝐻𝑡superscriptsubscript𝑘1𝐾subscript𝐴𝑘𝐻subscript𝑡subscript𝜏𝑘\displaystyle\frac{\mathrm{d}H(t)}{\mathrm{d}t}\leq-IH(t)+\sum_{k=1}^{K}A_{k}% \cdot H(t_{\tau_{k}}),divide start_ARG roman_d italic_H ( italic_t ) end_ARG start_ARG roman_d italic_t end_ARG ≤ - italic_I italic_H ( italic_t ) + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⋅ italic_H ( italic_t start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ,

where Aksubscript𝐴𝑘A_{k}italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a matrix in which the i𝑖iitalic_i-th row contains at most one non-zero element of cαij𝑐subscript𝛼𝑖𝑗c\alpha_{ij}italic_c italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, and H(tτk)=(h1(tτk1),,hn(tτkn))𝐻subscript𝑡subscript𝜏𝑘subscript1subscript𝑡subscript𝜏𝑘1subscript𝑛subscript𝑡subscript𝜏𝑘𝑛H(t_{\tau_{k}})=(h_{1}(t_{\tau_{k1}}),\cdots,h_{n}(t_{\tau_{kn}}))italic_H ( italic_t start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) = ( italic_h start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) , ⋯ , italic_h start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_τ start_POSTSUBSCRIPT italic_k italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) ) is the re-group of delayed signals. K𝐾Kitalic_K denotes the maximum of node degrees, as each edge appears only once in the matrices. ∎

Definition 4 ().

The characteristic equation associated with the differential equation (22) is defined as

(30) det[zIA0k=1NAKexp(zTk)]=0,delimited-[]𝑧𝐼subscript𝐴0superscriptsubscript𝑘1𝑁subscript𝐴𝐾𝑧subscript𝑇𝑘0\det\left[zI-A_{0}-\sum_{k=1}^{N}A_{K}\exp(-zT_{k})\right]=0,roman_det [ italic_z italic_I - italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_A start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT roman_exp ( - italic_z italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] = 0 ,

where exp(zTk)=diag(ezτk1,ezτk2,,ezτkN)𝑧subscript𝑇𝑘diagsuperscript𝑒𝑧subscript𝜏𝑘1superscript𝑒𝑧subscript𝜏𝑘2superscript𝑒𝑧subscript𝜏𝑘𝑁\exp(-zT_{k})=\text{diag}(e^{-z\tau_{k1}},e^{-z\tau_{k2}},\cdots,e^{-z\tau_{kN% }})roman_exp ( - italic_z italic_T start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) = diag ( italic_e start_POSTSUPERSCRIPT - italic_z italic_τ start_POSTSUBSCRIPT italic_k 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , italic_e start_POSTSUPERSCRIPT - italic_z italic_τ start_POSTSUBSCRIPT italic_k 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , ⋯ , italic_e start_POSTSUPERSCRIPT - italic_z italic_τ start_POSTSUBSCRIPT italic_k italic_N end_POSTSUBSCRIPT end_POSTSUPERSCRIPT )

Lemma 2 ().

(Sun, 2006) If the real parts of all characteristic roots of equation (30) are less than zero, then the system is asymptotically stable.

Lemma 3 ().

(Sun, 2006) If the condition

(31) μ(A0)+k=1KAk0𝜇subscript𝐴0superscriptsubscript𝑘1𝐾normsubscript𝐴𝑘0\mu(A_{0})+\sum_{k=1}^{K}||A_{k}||\leq 0italic_μ ( italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT | | italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | ≤ 0

holds, then Lemma 2 holds, and the system is asymptotically stable.

Theorem 2 ().

The proposed DDE is asymptotically stable when the balance constant c1/K𝑐1𝐾c\leq 1/Kitalic_c ≤ 1 / italic_K.

Proof.

We prove the result starting from its upper bound and lower bound. The lower bound is clear to satisfy the condition in Lemma 3. We focus on the upper bound. We prove the result for norm p=𝑝p=\inftyitalic_p = ∞, and the result can be generalized to other norms easily. According to the theorem 1, we have

(32) μ(A0)subscript𝜇subscript𝐴0\displaystyle\mu_{\infty}(A_{0})italic_μ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) =μ(I)=1,absentsubscript𝜇𝐼1\displaystyle=\mu_{\infty}(-I)=-1,= italic_μ start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT ( - italic_I ) = - 1 ,
(33) Aksubscriptnormsubscript𝐴𝑘\displaystyle||A_{k}||_{\infty}| | italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT =maxij=1n|Akij|=maxicαij(k).absentsubscript𝑖superscriptsubscript𝑗1𝑛subscript𝐴subscript𝑘𝑖𝑗subscript𝑖𝑐subscript𝛼𝑖𝑗𝑘\displaystyle=\max_{i}\sum_{j=1}^{n}|A_{k_{ij}}|=\max_{i}c\alpha_{ij(k)}.= roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT | italic_A start_POSTSUBSCRIPT italic_k start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT | = roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c italic_α start_POSTSUBSCRIPT italic_i italic_j ( italic_k ) end_POSTSUBSCRIPT .

The second equation holds because there is at most one non-zero element in each row of Aksubscript𝐴𝑘A_{k}italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, and cαij(k)𝑐subscript𝛼𝑖𝑗𝑘c\alpha_{ij(k)}italic_c italic_α start_POSTSUBSCRIPT italic_i italic_j ( italic_k ) end_POSTSUBSCRIPT denotes the element in row i𝑖iitalic_i. As αijsubscript𝛼𝑖𝑗\alpha_{ij}italic_α start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is the normalized weight for graph convolution, we have

(34) Ak=maxicαij(k)c,subscriptnormsubscript𝐴𝑘subscript𝑖𝑐subscript𝛼𝑖𝑗𝑘𝑐||A_{k}||_{\infty}=\max_{i}c\alpha_{ij(k)}\leq c,| | italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT = roman_max start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_c italic_α start_POSTSUBSCRIPT italic_i italic_j ( italic_k ) end_POSTSUBSCRIPT ≤ italic_c ,

and when c<=1/K𝑐1𝐾c<=1/Kitalic_c < = 1 / italic_K,

(35) μ(A0)+k=1KAk1+Kc0,𝜇subscript𝐴0superscriptsubscript𝑘1𝐾normsubscript𝐴𝑘1𝐾𝑐0\mu(A_{0})+\sum_{k=1}^{K}||A_{k}||\leq-1+K\cdot c\leq 0,italic_μ ( italic_A start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) + ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT | | italic_A start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | | ≤ - 1 + italic_K ⋅ italic_c ≤ 0 ,

the condition (33) holds. Thus the lower and upper bound of DDE are both asymptotically stable (Yorke, 1970). ∎