Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner

Tong Niea,b, Guoyang Qina, Wei Mab,∗ and Jian Suna,∗
(June 13, 2024)

  Keywords: Implicit neural representations, Traffic data learning, Spatiotemporal traffic data, Traffic dynamics, Meta-learning

1 INTRODUCTION

The unpredictable elements involved in a vehicular traffic system, such as human behavior, weather conditions, energy supply and social economics, lead to a complex and high-dimensional dynamical transportation system. To better understand this system, Spatiotemporal Traffic Data (STTD) is often collected to describe its evolution over space and time. This data includes various sources such as vehicle trajectories, sensor-based time series, and dynamic mobility flow. The primary aim of STTD learning is to develop data-centric models that accurately depict traffic dynamics and can predict complex system behaviors.

Refer to caption
Figure 1: Representing spatiotemporal traffic data as implicit neural functions. (a) Traffic data at arbitrary spatiotemporal coordinates can be represented as a continuous function in an implicit space. (b) Coordinate-based MLPs map coordinates to traffic states. (c) With the resolution-independent property, our model can represent various spatiotemporal traffic data.

Despite its complexity, recent advances in STTD learning have found that the dynamics of the system evolve with some dominating patterns and can be captured by some low-dimensional structures. Notably, low-rankness is a widely studied pattern, and models based on it assist in reconstructing sparse data, detecting anomalies, revealing patterns, and predicting unknown system states. However, these models have two primary limitations: 1) they often require a grid-based input with fixed spatiotemporal dimensions, restricting them from accommodating varying spatial resolutions or temporal lengths; 2) the low-rank pattern modeling, fixed on one data source, may not generalize to different data sources. For instance, patterns identified in one data type, such as vehicle trajectories, may not be applicable to differently structured data, such as OD demand. These constraints mean that current STTD learning depends on data structures and sources. This limits the potential for a unified representation and emphasizes the need for a universally applicable method to link various types of STTD learning.

To address these limitations, we employ a novel technique called implicit neural representations (INRs) to learn the underlying dynamics of STTD. INRs use deep neural networks to discern patterns from continuous input (Sitzmann et al., , 2020, Tancik et al., , 2020). They function in a continuous space and take domain coordinates as input, predicting the corresponding quantity at queried coordinates. INRs learn patterns in implicit manifolds and fit processes that generate target data with functional representation. This differentiates them from low-rank models that depend on explicit patterns, enhancing their expressivity, and enabling them to learn dynamics implicitly. Consequently, they eliminate the need for fixed data dimensions and can adjust to traffic data of any scale or resolution, allowing us to model various STTD with a unified input. In this work, we exploit the advances of INRs and tailor them to incorporate the characteristics of STTD, resulting in a novel method that serves as a universal traffic data learner (refer to Fig. 1).

Our proof-of-concept has shown promising results through extensive testing using real-world data. The method is versatile, working across different scales - from corridor-level to network-level applications. It can also be generalized to various input dimensions, data domains, output resolutions, and network topologies. This study offers novel perspectives on STTD modeling and provides an extensive analysis of practical applications, contributing to the state-of-the-art. To our knowledge, this is the first time that INRs have been applied to STTD learning and have demonstrated effectiveness in a variety of real-world tasks. We anticipate this could form the basis for develo** foundational models for STTD.

2 METHODOLOGY

To formalize a universal data learner, we let MLPs be the parameterization θ𝜃\thetaitalic_θ. Concretely, the function representation is expressed as a continuous map** from the input domain to the traffic state of interest: Φθ(x,t):𝒳×𝒯𝒴:subscriptΦ𝜃𝑥𝑡maps-to𝒳𝒯𝒴\Phi_{\theta}(x,t):\mathcal{X}\times\mathcal{T}\mapsto\mathcal{Y}roman_Φ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_x , italic_t ) : caligraphic_X × caligraphic_T ↦ caligraphic_Y, where 𝒳N𝒳superscript𝑁\mathcal{X}\subseteq\mathbb{R}^{N}caligraphic_X ⊆ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT is the spatial domain, 𝒯+𝒯superscript\mathcal{T}\subseteq\mathbb{R}^{+}caligraphic_T ⊆ blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT is the temporal domain, and 𝒴𝒴\mathcal{Y}\subseteq\mathbb{R}caligraphic_Y ⊆ blackboard_R is the output domain. ΦθsubscriptΦ𝜃\Phi_{\theta}roman_Φ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT is a coordinate-based MLP (Fig. 1 (b)).

2.1 Encoding high-frequency components in function representation

High-frequency components can encode complex details about STTD. To alleviate the spectral bias of neural network towards low-frequency patterns, we adopt two advanced techniques to enable ΦθsubscriptΦ𝜃\Phi_{\theta}roman_Φ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT to learn high-frequency components. Given the spatial-temporal input coordinate 𝐯=(x,t)×+𝐯𝑥𝑡superscript\mathbf{v}=(x,t)\subseteq\mathbb{R}\times\mathbb{R}^{+}bold_v = ( italic_x , italic_t ) ⊆ blackboard_R × blackboard_R start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT, the frequency-enhanced MLP can be formulated as:

𝐡(1)=ReLU(𝐖(0)γ(𝐯)+𝐛(0)),𝐡(+1)=sin(ω0𝐖()𝐡()+𝐛()),Φ(𝐯)=𝐖(L)𝐡(L)+𝐛(L),formulae-sequencesuperscript𝐡1ReLUsuperscript𝐖0𝛾𝐯superscript𝐛0formulae-sequencesuperscript𝐡1subscript𝜔0superscript𝐖superscript𝐡superscript𝐛Φ𝐯superscript𝐖𝐿superscript𝐡𝐿superscript𝐛𝐿\mathbf{h}^{(1)}=\texttt{ReLU}(\mathbf{W}^{(0)}\gamma(\mathbf{v})+\mathbf{b}^{% (0)}),\leavevmode\nobreak\ \mathbf{h}^{(\ell+1)}=\sin(\omega_{0}\cdot\mathbf{W% }^{(\ell)}\mathbf{h}^{(\ell)}+\mathbf{b}^{(\ell)}),\leavevmode\nobreak\ \Phi(% \mathbf{v})=\mathbf{W}^{(L)}\mathbf{h}^{(L)}+\mathbf{b}^{(L)},bold_h start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT = ReLU ( bold_W start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT italic_γ ( bold_v ) + bold_b start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) , bold_h start_POSTSUPERSCRIPT ( roman_ℓ + 1 ) end_POSTSUPERSCRIPT = roman_sin ( italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ bold_W start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT bold_h start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT + bold_b start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT ) , roman_Φ ( bold_v ) = bold_W start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT bold_h start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT + bold_b start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT , (1)

where 𝐖()d()×d(+1),𝐛()d(+1)formulae-sequencesuperscript𝐖superscriptsubscript𝑑subscript𝑑1superscript𝐛superscriptsubscript𝑑1\mathbf{W}^{(\ell)}\in\mathbb{R}^{d_{(\ell)}\times d_{(\ell+1)}},\mathbf{b}^{(% \ell)}\in\mathbb{R}^{d_{(\ell+1)}}bold_W start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT ( roman_ℓ ) end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT ( roman_ℓ + 1 ) end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , bold_b start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT ( roman_ℓ + 1 ) end_POSTSUBSCRIPT end_POSTSUPERSCRIPT are layerwise parameters, and Φ(𝐯)doutΦ𝐯superscriptsubscript𝑑out\Phi(\mathbf{v})\in\mathbb{R}^{d_{\text{out}}}roman_Φ ( bold_v ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT out end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the predicted value. sin()\sin(\cdot)roman_sin ( ⋅ ) is the periodic activation function with frequency factor ω0subscript𝜔0\omega_{0}italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT (Sitzmann et al., , 2020). γ(𝐯)𝛾𝐯\gamma(\mathbf{v})italic_γ ( bold_v ) is the concatenated random Fourier features (CRF) (Tancik et al., , 2020) with different Fourier basis frequencies 𝐁kd/2×cinsubscript𝐁𝑘superscript𝑑2subscript𝑐in\mathbf{B}_{k}\in\mathbb{R}^{d/2\times c_{\text{in}}}bold_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d / 2 × italic_c start_POSTSUBSCRIPT in end_POSTSUBSCRIPT end_POSTSUPERSCRIPT sampled from the Gaussian 𝒩(0,σk2)𝒩0superscriptsubscript𝜎𝑘2\mathcal{N}(0,\sigma_{k}^{2})caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ):

γ(𝐯)=[sin(2π𝐁1𝐯),cos(2π𝐁1𝐯),,sin(2π𝐁Nf𝐯),cos(2π𝐁Nf𝐯)]𝖳dNf.𝛾𝐯superscript2𝜋subscript𝐁1𝐯2𝜋subscript𝐁1𝐯2𝜋subscript𝐁subscript𝑁𝑓𝐯2𝜋subscript𝐁subscript𝑁𝑓𝐯𝖳superscript𝑑subscript𝑁𝑓\gamma(\mathbf{v})=[\sin(2\pi\mathbf{B}_{1}\mathbf{v}),\cos(2\pi\mathbf{B}_{1}% \mathbf{v}),\dots,\sin(2\pi\mathbf{B}_{N_{f}}\mathbf{v}),\cos(2\pi\mathbf{B}_{% N_{f}}\mathbf{v})]^{\mathsf{T}}\in\mathbb{R}^{d{N_{f}}}.italic_γ ( bold_v ) = [ roman_sin ( 2 italic_π bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_v ) , roman_cos ( 2 italic_π bold_B start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_v ) , … , roman_sin ( 2 italic_π bold_B start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_v ) , roman_cos ( 2 italic_π bold_B start_POSTSUBSCRIPT italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_v ) ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT end_POSTSUPERSCRIPT . (2)

By setting a large number of frequency features Nfsubscript𝑁𝑓N_{f}italic_N start_POSTSUBSCRIPT italic_f end_POSTSUBSCRIPT and a series of scale parameters {σk2}subscriptsuperscript𝜎2𝑘\{\sigma^{2}_{k}\}{ italic_σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT }, we can sample a variety of frequency patterns in the input domain. The combination of these two strategies achieves high-frequency, low-dimensional regression, empowering the coordinate-based MLPs to learn complex details with high resolution.

2.2 Factorizing spatial-temporal variability

Using a single ΦθsubscriptΦ𝜃\Phi_{\theta}roman_Φ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT to model entangled spatiotemporal interactions can be challenging. Therefore, we decompose the spatiotemporal process into two dimensions using variable separation:

Φ(𝐯)=Φx(vx)Φt(vt)𝖳,Φx:𝒳,vxΦx(vx)dx,Φt:𝒯,vtΦt(vt)dt,\Phi(\mathbf{v})=\Phi_{x}(v_{x})\Phi_{t}(v_{t})^{\mathsf{T}},\Phi_{x}:\mathcal% {X}\mapsto\mathbb{R},\leavevmode\nobreak\ v_{x}\mapsto\Phi_{x}({v}_{x})\in% \mathbb{R}^{d_{x}},\leavevmode\nobreak\ \Phi_{t}:\mathcal{T}\mapsto\mathbb{R},% \leavevmode\nobreak\ {v}_{t}\mapsto\Phi_{t}({v}_{t})\in\mathbb{R}^{d_{t}},roman_Φ ( bold_v ) = roman_Φ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT , roman_Φ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT : caligraphic_X ↦ blackboard_R , italic_v start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ↦ roman_Φ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : caligraphic_T ↦ blackboard_R , italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ↦ roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT , (3)

where ΦxsubscriptΦ𝑥\Phi_{x}roman_Φ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and ΦtsubscriptΦ𝑡\Phi_{t}roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT are defined by Eq. (1). Eq. (3) is an implicit representation of matrix factorization model. But it can process data or functions that exist beyond the regular mesh grid of matrices. To further align the two components, we adopt a middle transform matrix 𝐌xtdx×dtsubscript𝐌𝑥𝑡superscriptsubscript𝑑𝑥subscript𝑑𝑡\mathbf{M}_{xt}\in\mathbb{R}^{d_{x}\times d_{t}}bold_M start_POSTSUBSCRIPT italic_x italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT × italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT to model their interactions in the hidden manifold, which yields: Φ(𝐯)=Φx(vx)𝐌xtΦt(vt)𝖳Φ𝐯subscriptΦ𝑥subscript𝑣𝑥subscript𝐌𝑥𝑡subscriptΦ𝑡superscriptsubscript𝑣𝑡𝖳\Phi(\mathbf{v})=\Phi_{x}({v}_{x})\mathbf{M}_{xt}\Phi_{t}({v}_{t})^{\mathsf{T}}roman_Φ ( bold_v ) = roman_Φ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ) bold_M start_POSTSUBSCRIPT italic_x italic_t end_POSTSUBSCRIPT roman_Φ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT.

2.3 Generalizable representation with meta-learning

Given a STTD instance, we can sample a set containing M𝑀Mitalic_M data pairs 𝐱={(𝐯i,𝐲i)}i=1M𝐱superscriptsubscriptsubscript𝐯𝑖subscript𝐲𝑖𝑖1𝑀\mathbf{x}=\{(\mathbf{v}_{i},\mathbf{y}_{i})\}_{i=1}^{M}bold_x = { ( bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT where 𝐯icinsubscript𝐯𝑖superscriptsubscript𝑐in\mathbf{v}_{i}\in\mathbb{R}^{c_{\text{in}}}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT in end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the input coordinate and 𝐲icoutsubscript𝐲𝑖superscriptsubscript𝑐out\mathbf{y}_{i}\in\mathbb{R}^{c_{\text{out}}}bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_c start_POSTSUBSCRIPT out end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the traffic state value. Then we can learn an INR using gradient descent over the loss minθ(θ;𝐱)=1Mi=1M𝐲iΦθ(𝐯i)22subscript𝜃𝜃𝐱1𝑀superscriptsubscript𝑖1𝑀superscriptsubscriptnormsubscript𝐲𝑖subscriptΦ𝜃subscript𝐯𝑖22\min_{\theta}\mathcal{L}(\theta;\mathbf{x})=\frac{1}{M}\sum_{i=1}^{M}\|\mathbf% {y}_{i}-\Phi_{\theta}(\mathbf{v}_{i})\|_{2}^{2}roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT caligraphic_L ( italic_θ ; bold_x ) = divide start_ARG 1 end_ARG start_ARG italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∥ bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - roman_Φ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. As can be seen, a single INR encodes a single data domain, but the learned INR cannot be generalized to represent other data instances and requires per-sample retraining. Given a series of data instances 𝒳={𝐱(n)}n=1N𝒳superscriptsubscriptsuperscript𝐱𝑛𝑛1𝑁\mathcal{X}=\{\mathbf{x}^{(n)}\}_{n=1}^{N}caligraphic_X = { bold_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT, we set a series of latent codes for each instance {ϕ(n)dlatent}n=1Nsuperscriptsubscriptsuperscriptitalic-ϕ𝑛superscriptsubscript𝑑latent𝑛1𝑁\{\phi^{(n)}\in\mathbb{R}^{d_{\text{latent}}}\}_{n=1}^{N}{ italic_ϕ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT latent end_POSTSUBSCRIPT end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT to account for the instance-specific data pattern and make ΦθsubscriptΦ𝜃\Phi_{\theta}roman_Φ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT a base network conditional on the latent code ϕitalic-ϕ\phiitalic_ϕ (Dupont et al., , 2022). We then perform per-sample modulations to the middle INR layers:

𝐡(+1)=sin(ω0𝐖()𝐡()+𝐛()+𝐬(n)),𝐬(n)=hω()(ϕ(n))=𝐖s()ϕ(n)+𝐛s(),formulae-sequencesuperscript𝐡1subscript𝜔0superscript𝐖superscript𝐡superscript𝐛superscript𝐬𝑛superscript𝐬𝑛superscriptsubscript𝜔superscriptitalic-ϕ𝑛subscriptsuperscript𝐖𝑠superscriptitalic-ϕ𝑛subscriptsuperscript𝐛𝑠\mathbf{h}^{(\ell+1)}=\sin(\omega_{0}\cdot\mathbf{W}^{(\ell)}\mathbf{h}^{(\ell% )}+\mathbf{b}^{(\ell)}+\mathbf{s}^{(n)}),\leavevmode\nobreak\ \mathbf{s}^{(n)}% =h_{\omega}^{(\ell)}(\phi^{(n)})=\mathbf{W}^{(\ell)}_{s}\phi^{(n)}+\mathbf{b}^% {(\ell)}_{s},bold_h start_POSTSUPERSCRIPT ( roman_ℓ + 1 ) end_POSTSUPERSCRIPT = roman_sin ( italic_ω start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ bold_W start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT bold_h start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT + bold_b start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT + bold_s start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) , bold_s start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT = italic_h start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT ( italic_ϕ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) = bold_W start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT + bold_b start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , (4)

where 𝐬(n)d()superscript𝐬𝑛superscriptsubscript𝑑\mathbf{s}^{(n)}\in\mathbb{R}^{d_{(\ell)}}bold_s start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT ( roman_ℓ ) end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is the shift modulation of instance n𝑛nitalic_n at layer \ellroman_ℓ, and hω()(|ωΘ):dlatentd()h_{\omega}^{(\ell)}(\cdot|\omega\in\Theta):\mathbb{R}^{d_{\text{latent}}}% \mapsto\mathbb{R}^{d_{(\ell)}}italic_h start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( roman_ℓ ) end_POSTSUPERSCRIPT ( ⋅ | italic_ω ∈ roman_Θ ) : blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT latent end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ↦ blackboard_R start_POSTSUPERSCRIPT italic_d start_POSTSUBSCRIPT ( roman_ℓ ) end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a shared linear hypernetwork layer to map the latent code to layerwise modulations. Then, the loss function of the generalizable implicit neural representations (GINRs) is given as:

minθ,ϕ(θ,{ϕ(n)}n=1N;𝒳)=𝔼𝐱𝒳[(θ,ϕ(n);𝐱(n)]=1NMn=1Ni=1M𝐲i(n)Φθ,hω(ϕ)(𝐯i(n);ϕ(n))22.\min_{\theta,\phi}\mathcal{L}(\theta,\{\phi^{(n)}\}_{n=1}^{N};\mathcal{X})=% \mathbb{E}_{\mathbf{x}\sim\mathcal{X}}[\mathcal{L}(\theta,\phi^{(n)};\mathbf{x% }^{(n)}]=\frac{1}{NM}\sum_{n=1}^{N}\sum_{i=1}^{M}\|\mathbf{y}^{(n)}_{i}-\Phi_{% \theta,h_{\omega}(\phi)}(\mathbf{v}_{i}^{(n)};\phi^{(n)})\|_{2}^{2}.roman_min start_POSTSUBSCRIPT italic_θ , italic_ϕ end_POSTSUBSCRIPT caligraphic_L ( italic_θ , { italic_ϕ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT } start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ; caligraphic_X ) = blackboard_E start_POSTSUBSCRIPT bold_x ∼ caligraphic_X end_POSTSUBSCRIPT [ caligraphic_L ( italic_θ , italic_ϕ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ; bold_x start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ] = divide start_ARG 1 end_ARG start_ARG italic_N italic_M end_ARG ∑ start_POSTSUBSCRIPT italic_n = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∥ bold_y start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - roman_Φ start_POSTSUBSCRIPT italic_θ , italic_h start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_ϕ ) end_POSTSUBSCRIPT ( bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ; italic_ϕ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (5)

To learn all codes, we adopt the meta-learning strategy to achieve efficient adaptation and stable optimization. Since conditional modulations 𝐬𝐬\mathbf{s}bold_s are processed as functions of ϕitalic-ϕ\phiitalic_ϕ, and each ϕitalic-ϕ\phiitalic_ϕ represents an individual instance, we can implicitly obtain these codes using an auto-decoding mechanism. For data n𝑛nitalic_n, this is achieved by an iterative gradient descent process: ϕ(n)ϕ(n)αϕ(n)(Φθ,hω(ϕ),{(𝐯i(n),𝐲i(n))}iM)superscriptitalic-ϕ𝑛superscriptitalic-ϕ𝑛𝛼subscriptsuperscriptitalic-ϕ𝑛subscriptΦ𝜃subscript𝜔italic-ϕsubscriptsuperscriptsubscript𝐯𝑖𝑛superscriptsubscript𝐲𝑖𝑛𝑖𝑀\phi^{(n)}\leftarrow\phi^{(n)}-\alpha\nabla_{\phi^{(n)}}\mathcal{L}(\Phi_{% \theta,h_{\omega}{(\phi)}},\{(\mathbf{v}_{i}^{(n)},\mathbf{y}_{i}^{(n)})\}_{i% \in M})italic_ϕ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ← italic_ϕ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT - italic_α ∇ start_POSTSUBSCRIPT italic_ϕ start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT end_POSTSUBSCRIPT caligraphic_L ( roman_Φ start_POSTSUBSCRIPT italic_θ , italic_h start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT ( italic_ϕ ) end_POSTSUBSCRIPT , { ( bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT , bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_n ) end_POSTSUPERSCRIPT ) } start_POSTSUBSCRIPT italic_i ∈ italic_M end_POSTSUBSCRIPT ), where α𝛼\alphaitalic_α is the learning rate, and the above process is repeated in several steps. To integrate the auto-decoding into the meta-learning procedure, inner-loop and outer-loop iterations are considered to alternatively update ΦθsubscriptΦ𝜃\Phi_{\theta}roman_Φ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT, and ϕitalic-ϕ\phiitalic_ϕ.

3 RESULTS

We conduct extensive experiments on real-world STTD covering scales from corridor to network, specifically including: (a) Corridor-level application: Highway traffic state estimation; (b-c) Grid-level application: Urban mesh-based flow estimation; and (d-f) Network-level application: Highway and urban network state estimation. We compare our model with SOTA low-rank models and evaluate its generalizability in different scenarios, such as different input domains, multiple resolutions, and distinct topologies. We also find that the encoding of high-frequency components is crucial for learning complex patterns (g-h). Fig. 2 briefly summarizes our results.

Refer to caption
Figure 2: Experiments on multiscale STTD. Full results can be found at (Nie et al., , 2024).

4 SUMMARY

We have developed a new method for learning spatiotemporal traffic data (STTD) using implicit neural representations. This involves parameterizing STTD as deep neural networks, with INRs trained to map coordinates directly to traffic states. The versatility of this representation allows it to model various STTD types, including vehicle trajectories, origin-destination flows, grid flows, highway networks, and urban networks. Thanks to the meta-learning paradigm, this approach can be generalized to a range of data instances. Experimental results from various real-world benchmarks show that our model consistently surpasses conventional low-rank models. It also demonstrates potential for generalization across different data structures and problem contexts.

References

  • Dupont et al., (2022) Dupont, Emilien, Kim, Hyunjik, Eslami, SM, Rezende, Danilo, & Rosenbaum, Dan. 2022. From data to functa: Your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204.
  • Nie et al., (2024) Nie, Tong, Qin, Guoyang, Ma, Wei, & Sun, Jian. 2024. Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner. arXiv preprint arXiv:2405.03185.
  • Sitzmann et al., (2020) Sitzmann, Vincent, Martel, Julien, Bergman, Alexander, Lindell, David, & Wetzstein, Gordon. 2020. Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems, 33, 7462–7473.
  • Tancik et al., (2020) Tancik, Matthew, Srinivasan, Pratul, Mildenhall, Ben, Fridovich-Keil, Sara, Raghavan, Nithin, Singhal, Utkarsh, Ramamoorthi, Ravi, Barron, Jonathan, & Ng, Ren. 2020. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33, 7537–7547.