Spatio-Temporal Field Neural Networks for Air Quality Inference

Yutong Feng1    Qiongyan Wang1    Yutong Xia2    Junlin Huang1    Siru Zhong1    Yuxuan Liang1,3111Corresponding author. Email: [email protected] 1Hong Kong University of Science and Technology (Guangzhou), China
2National University of Singapore, Singapore
3State Key Lab of Resources and Environmental Information System, China
{yfeng083,jhuang688,szhong691}@connect.hkust-gz.edu.cn,
{yutong.x,qiongyanwang,yuxliang}@outlook.com
Abstract

Air quality inference aims to utilize historical data from a limited number of observation sites to infer the air quality index at unknown locations. Considering data sparsity due to the high maintenance cost of stations, good inference algorithms can effectively save the cost and refine the data granularity. While spatio-temporal graph neural networks have made excellent progress on this problem, their non-Euclidean and discrete data structure modeling of reality limits its potential. In this work, we make the first attempt to combine two different spatio-temporal perspectives, fields and graphs, by proposing a new model, Spatio-Temporal Field Neural Network, and its corresponding new framework, Pyramidal Inference. Extensive experiments validate that our model achieves state-of-the-art performance in nationwide air quality inference in the Chinese Mainland, demonstrating the superiority of our proposed model and framework.

1 Introduction

Real-time monitoring of air quality, such as PM2.5, PM10, and \chemfigNO_2 concentrations, is crucial for air pollution control and protecting human health, with air pollution contributing to seven million deaths annually according to the WHO Vallero (2014). However, the deployment of air quality stations in urban areas is limited due to high costs, requiring around 200,000 USD for construction and 30,000 USD annually for maintenance Zheng et al. (2013). Additionally, these stations need significant land and dedicated personnel for upkeep, further limiting their prevalence in cities.

Refer to caption
Figure 1: Spatio-Temporal Graph vs. Spatio-Temporal Field.
Refer to caption
Figure 2: (a)-(d): User interface of our STFNN system for air quality inference. (e): An illustration of Ring Estimation.

In the past decade, substantial research endeavors have been directed towards air quality inference Han et al. (2023), seeking to infer real-time air quality in locations devoid of monitoring stations by leveraging data gleaned from existing sites, as shown in Figure 2(b)-(c). With recent advancements in deep learning, Graph Neural Networks (GNN) Kipf and Welling (2016a) have become dominant for non-Euclidean data representation, particularly in learning complex spatial correlations among air quality monitoring stations. Integrating GNNs with temporal learning modules (e.g., RNN Graves (2013), TCN Bai et al. (2018), ODE Liang et al. (2022)) has led to the development of Spatio-Temporal Graph Neural Networks (STGNN)Wang et al. (2020); ** et al. (2023), addressing the dynamic nature of air quality data across spatial and temporal dimension. STGNNs, exemplified in studies like Han et al. (2021); Hu et al. (2023b) offer superior representation extraction and flexibility in cross-domain data fusion.

Though promising, STGNNs simply treat air quality data as a Spatio-Temporal Graph (STG), as shown in Figure 1(a). However, these models overlook a crucial property – continuity, which manifests across both spatial and temporal dimensions. In reality, air quality readings of stations are sampled from a continuous Euclidean space and cannot be fully encapsulated by a discrete graph structure using GNNs. Meanwhile, the temporal modules (e.g., RNN, TCN) in STGNNs exhibit the discrete nature as well, rendering them incapable of capturing continuous-time dynamics within data. To better represent the continuous and evolving nature of real-world air quality phenomena, a more powerful approach is needed, surpassing the discrete representation of STGNNs.

In this paper, we draw inspiration from Field Theory McMullin (2002) and innovatively formulate air quality inference from a field perspective, where air quality data is a physical quantity that can be conceptualized by a new concept called Spatio-Temporal Fields (STF), as depicted in Figure 1(b). These fields encompass three dimensions (i.e., latitude, longitude, time), assigning a distinct value to each point in spacetime. In contrast to STGs, STFs are characterized by being regular, continuous, and unified222It implies that the field representation accounts for variations not only across different locations in space but also over different points in time, emphasizing the comprehensive treatment of both spatial and temporal aspects within the unified framework., offering a representation more aligned with reality. Under this perspective, we can transform the air quality inference problem to reconstruct STFs from available readings using coordinate-based neural networks, particularly Implicit Neural Representations (INR) Sitzmann et al. (2020); Xie et al. (2022).

While INRs effectively handle the continuity property of air quality data, they inevitably confront two primary challenges. Firstly, the generation process of air quality data is extremely complex and influenced by various factors (such as humidity and wind speed/direction), which poses a challenge for reconstructing the underlying STFs through neural representation methods. Secondly, empirical studies Xu et al. (2019); Sitzmann et al. (2020) verify that INRs always exhibit a bias towards learning low-frequency functions, which will disregard locally varying high-frequency information and higher-order derivatives even with dense supervision.

To this end, we for the first time present Spatio-Temporal Field Neural Networks (STFNN), opening new avenues for modeling spatio-temporal fields and achieving state-of-the-art performance in nationwide air quality inference in the Chinese Mainland. Targeting the first challenge, we pivot our focus from reconstructing the value of each entry in STFs to learning the derivative (i.e., gradient) of each entry. This strategic shift is inspired by learning the residual is often easier than learning the original value directly, as exemplified in ResNet He et al. (2016). Such vector field can not only show how the pollutant concentration varies across time and space but also the direction of diffusion. To tackle the second challenge, we endeavor to augment our STFNN with local context knowledge during air quality inference at a specific location. Specifically, we combine the STGNN’s capability to capture local spatio-temporal dependencies with STFNN’s ability to learn global spaito-temporal unified representations. This integration results in what we term Pyramid Inference, a hybrid framework that leverages the strengths of both models to achieve a more comprehensive inference of air quality dynamics with both high-frequency and low-frequency components. Overall, our contributions lie in three aspects:

  • A Field Perspective. We formulate air quality as spatio-temporal fields with the first shot. Compared to STGs, our STFs not only adeptly capture the continuity and Euclidean structure of air quality data, but also achieves a unified representation across both space and time.

  • Spatio-Temporal Field Neural Networks. We propose a groundbreaking network called STFNN to model STF data. STFNN pioneers an implicit representation of the STF’s gradient, deviating from conventional direct estimation approaches. Moreover, it preserves high-frequency information via Pyramid Inference.

  • Empirical Evidence. We conduct extensive experiments to evaluate the effectiveness of our STFNN. The results valiadte that STFNN outperforms prior arts by a significant margin and exhibits compelling properties. A system in Figure 2 has been deployed to show its practicality in the Chinese Mainland.

2 Preliminary

Definition 1 (Air Quality Reading) We use 𝐱tiDsuperscriptsubscript𝐱𝑡𝑖superscript𝐷\mathbf{x}_{t}^{i}\in\mathbb{R}^{D}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_D end_POSTSUPERSCRIPT and 𝐲tisuperscriptsubscript𝐲𝑡𝑖\mathbf{y}_{t}^{i}\in\mathbb{R}bold_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ∈ blackboard_R to denote the air quality readings and the concentration of PM2.5 from the i𝑖iitalic_i-th monitoring stations at time t𝑡titalic_t separately. Here D𝐷Ditalic_D encompasses various measurements, such as concentrations of other air pollutants (e.g., PM10, \chemfigNO_2), and meteorological properties (e.g. humidity, weather and wind speed). 𝐗t=(𝐱t1,𝐱t2,,𝐱tN)N×Dsubscript𝐗𝑡superscriptsubscript𝐱𝑡1superscriptsubscript𝐱𝑡2superscriptsubscript𝐱𝑡𝑁superscript𝑁𝐷\mathbf{X}_{t}=\left(\mathbf{x}_{t}^{1},\mathbf{x}_{t}^{2},\ldots,\mathbf{x}_{% t}^{N}\right)\in\mathbb{R}^{N\times D}bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT , bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , … , bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_D end_POSTSUPERSCRIPT denotes the observations of all stations at a specified time t𝑡titalic_t. 𝒳=(𝐗1,𝐗2,,𝐗T)T×N×D𝒳subscript𝐗1subscript𝐗2subscript𝐗𝑇superscript𝑇𝑁𝐷\mathcal{X}=\left(\mathbf{X}_{1},\mathbf{X}_{2},\ldots,\mathbf{X}_{T}\right)% \in\mathbb{R}^{T\times N\times D}caligraphic_X = ( bold_X start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_X start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_X start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_T × italic_N × italic_D end_POSTSUPERSCRIPT denotes the observations of all stations at all time. Similar definitions apply to 𝐘tsubscript𝐘𝑡\mathbf{Y}_{t}bold_Y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 𝒴𝒴\mathcal{Y}caligraphic_Y, mirroring 𝐗tsubscript𝐗𝑡\mathbf{X}_{t}bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 𝒳𝒳\mathcal{X}caligraphic_X, respectively.

Definition 2 (Coordinates) A coordinate 𝐜=[lng,lat,t]3𝐜𝑙𝑛𝑔𝑙𝑎𝑡𝑡superscript3\mathbf{c}=\left[lng,lat,t\right]\in\mathbb{R}^{3}bold_c = [ italic_l italic_n italic_g , italic_l italic_a italic_t , italic_t ] ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT is used to represent the spatial and temporal properties of an air quality reading or a location, including longitude, latitude, and timestamp. These coordinates are categorized into two types: source coordinate 𝐜srcsuperscript𝐜𝑠𝑟𝑐\mathbf{c}^{src}bold_c start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT, associated with readings or locations with existing air quality monitoring stations, while target coordinate 𝐜tarsuperscript𝐜𝑡𝑎𝑟\mathbf{c}^{tar}bold_c start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT, corresponding to unobserved locations requiring inference. Notably, 𝐜tisuperscriptsubscript𝐜𝑡𝑖\mathbf{c}_{t}^{i}bold_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT represents the coordinate of the corresponding 𝐱tisuperscriptsubscript𝐱𝑡𝑖\mathbf{x}_{t}^{i}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT and 𝐲tisuperscriptsubscript𝐲𝑡𝑖\mathbf{y}_{t}^{i}bold_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT. Parallel definitions apply to 𝐂tsubscript𝐂𝑡\mathbf{C}_{t}bold_C start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 𝒞𝒞\mathcal{C}caligraphic_C in relation to 𝐗tsubscript𝐗𝑡\mathbf{X}_{t}bold_X start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 𝒳𝒳\mathcal{X}caligraphic_X, respectively, which are not reiterated here.

Problem Definition The air quality inference problem addresses the utilization of historical data and real-time readings from a limited number of air quality monitoring stations to infer the real-time air quality anywhere, especially unobserved location. Traditional strategies Hou et al. (2022); Hu et al. (2023a) employ graphs to illustrate the relationship between stations and locations, and the task is translated into a recovery task for the masked nodes (target locations), as shown in Figure 3 (a). In this paper, we revisit the problem from the field perspective, as shown in Figure 3 (b). Specifically, our goal is to reconstruct a spatio-temporal field G𝐺Gitalic_G for air quality that is capable of map** any arbitrary coordinate, especially 𝐜tarsuperscript𝐜𝑡𝑎𝑟\mathbf{c}^{tar}bold_c start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT, to the corresponding concentration of PM2.5 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT. Additional parameters, such as 𝒳𝒳\mathcal{X}caligraphic_X, are allowed to enhance the inference process.

Refer to caption
Figure 3: Paradigms for air quality inference. (a) A spatio-temporal graph perspective. (b) A spatio-temporal field perspective.

3 Methodology

3.1 Global View: Spatio-Temporal Field

A Spatio-Temporal Field (STF) is a global modeling of air quality that encompasses all stations and observation times. The STF function, denoted as f():𝐜𝐪:𝑓𝐜𝐪f(\cdot):\mathbf{c}\longmapsto\mathbf{q}italic_f ( ⋅ ) : bold_c ⟼ bold_q, assigns a unique physical quantity 𝐪𝐪\mathbf{q}bold_q to each coordinate. When 𝐪𝐪\mathbf{q}bold_q is a scalar, f()𝑓f(\cdot)italic_f ( ⋅ ) represents a scalar field. Conversely, if 𝐪𝐪\mathbf{q}bold_q is a vector, with magnitude and direction, it denotes a vector field.

Specifically, our focus lies on a scalar field G:3:𝐺superscript3G:\mathbb{R}^{3}\rightarrow\mathbb{R}italic_G : blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT → blackboard_R for air quality inference, where G𝐺Gitalic_G maps the coordinates to the corresponding PM2.5 concentration. This representation facilitates a continuous and unified spacetime perspective, allowing for the inference of air quality at any location and time by inputting the coordinates.

Directly modeling G𝐺Gitalic_G is challenging due to its intricate complexity and nonlinearity. Alternatively, it is often more feasible to learn its derivative, which refers to the gradient field of G𝐺Gitalic_G in spacetime. We denote the gradient field as 𝐅G𝐅𝐺\mathbf{F}\triangleq\nabla Gbold_F ≜ ∇ italic_G which is a vector field. Notably, given a specific 𝐅𝐅\mathbf{F}bold_F, an array of G𝐺Gitalic_G solutions exists unless an initial value is specified. We use 𝐲srcsuperscript𝐲𝑠𝑟𝑐\mathbf{y}^{src}bold_y start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT and 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT to denote the PM2.5 concentration on 𝐜srcsuperscript𝐜𝑠𝑟𝑐\mathbf{c}^{src}bold_c start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT and 𝐜tarsuperscript𝐜𝑡𝑎𝑟\mathbf{c}^{tar}bold_c start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT, respectively. Our primary focus lies in inferring 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT since the true value of 𝐲srcsuperscript𝐲𝑠𝑟𝑐\mathbf{y}^{src}bold_y start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT is known and recorded while 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT remains undisclosed. To infer 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT, we utilize a 𝐲srcsuperscript𝐲𝑠𝑟𝑐\mathbf{y}^{src}bold_y start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT as the initial value and assume l𝑙litalic_l is a piecewise smooth curve in 3superscript3\mathbb{R}^{3}blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT that point from 𝐜srcsuperscript𝐜𝑠𝑟𝑐\mathbf{c}^{src}bold_c start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT to 𝐜tarsuperscript𝐜𝑡𝑎𝑟\mathbf{c}^{tar}bold_c start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT, then we have

𝐲tar=G(𝐜tar)=G(𝐜src)+lG(𝐫)𝑑𝐫=𝐲src+ab𝐅(𝐫(z))𝐫(z)𝑑zsuperscript𝐲𝑡𝑎𝑟𝐺superscript𝐜𝑡𝑎𝑟𝐺superscript𝐜𝑠𝑟𝑐subscript𝑙𝐺𝐫differential-d𝐫superscript𝐲𝑠𝑟𝑐superscriptsubscript𝑎𝑏𝐅𝐫𝑧superscript𝐫𝑧differential-d𝑧\begin{split}\mathbf{y}^{tar}&=G\left(\mathbf{c}^{tar}\right)=G\left(\mathbf{c% }^{src}\right)+\int_{l}\nabla G(\mathbf{r})\cdot d\mathbf{r}\\ &=\mathbf{y}^{src}+\int_{a}^{b}\mathbf{F}\big{(}\mathbf{r}(z)\big{)}\cdot% \mathbf{r}^{\prime}(z)dz\end{split}start_ROW start_CELL bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT end_CELL start_CELL = italic_G ( bold_c start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT ) = italic_G ( bold_c start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT ) + ∫ start_POSTSUBSCRIPT italic_l end_POSTSUBSCRIPT ∇ italic_G ( bold_r ) ⋅ italic_d bold_r end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = bold_y start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT + ∫ start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT bold_F ( bold_r ( italic_z ) ) ⋅ bold_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z ) italic_d italic_z end_CELL end_ROW (1)

where \cdot is the dot product, and 𝐫:[a,b]l:𝐫𝑎𝑏𝑙\mathbf{r}:\left[a,b\right]\rightarrow lbold_r : [ italic_a , italic_b ] → italic_l represents the position vector. The endpoints of l𝑙litalic_l are given by 𝐫(a)𝐫𝑎\mathbf{r}(a)bold_r ( italic_a ) and 𝐫(b)𝐫𝑏\mathbf{r}(b)bold_r ( italic_b ), with a<b𝑎𝑏a<bitalic_a < italic_b. Seeking an Implicit Neural Representation (INR) becomes the objective to fit 𝐅𝐅\mathbf{F}bold_F as 𝐅𝐅\mathbf{F}bold_F is usually intricate to the extent that it cannot be explicitly formulated.

3.2 Local View: Spatio-Temporal Graph

The formulation presented in Eq. (1) ensures the recoverability of 𝐜tarsuperscript𝐜𝑡𝑎𝑟\mathbf{c}^{tar}bold_c start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT across arbitrary coordinates through curve integration, utilizing solely a single initial value. However, this approach yields an excessively coarse representation of the entire STF, resulting in the loss of numerous local details and high-frequency components The impact is particularly pronounced when 𝐲srcsuperscript𝐲𝑠𝑟𝑐\mathbf{y}^{src}bold_y start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT is situated at a considerable distance from 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT as the increase in the length of l𝑙litalic_l introduces a significant cumulative error. In response to this limitation, we leverage the potent learning capabilities of STGNN to capture local spatio-temporal correlations effectively. We employ the local spatio-temporal graph (STG) to model the spatio-temporal dependencies of the given coordinates and their neighboring air monitoring stations with their histories. The corresponding design can be found in the foundational work Song et al. (2020), and it is not necessary to reiterate it here.

Refer to caption
Figure 4: Implementation of STFNN

3.3 Hybrid Framework: Pyramidal Inference

We intend to integrate the continuous and uniform global modeling of spacetime provided by STF with the local detailing capabilities of STG, thereby establishing a hybrid framework that leverages the strengths of both approaches. Within the local STG, the estimation of 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT is achieved by leveraging information from its neighboring nodes through Eq. (1). By calculating estimates of 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT from these neighbors and assigning a learnable weight wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to each estimation (i=1Kwi=1superscriptsubscript𝑖1𝐾subscript𝑤𝑖1\sum_{i=1}^{K}w_{i}=1∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 1), we enhance the precision of the inference result for a specific coordinate. This operation can be formulated as

𝐲^tar=i=1K[wi(𝐲isrc+li𝐅(𝐫)𝑑𝐫)]=i=1K[wi(𝐲isrc+aibi𝐅(𝐫(z))𝐫(z)𝑑z)]superscript^𝐲𝑡𝑎𝑟superscriptsubscript𝑖1𝐾delimited-[]subscript𝑤𝑖superscriptsubscript𝐲𝑖𝑠𝑟𝑐subscriptsubscript𝑙𝑖𝐅𝐫differential-d𝐫superscriptsubscript𝑖1𝐾delimited-[]subscript𝑤𝑖superscriptsubscript𝐲𝑖𝑠𝑟𝑐superscriptsubscriptsubscript𝑎𝑖subscript𝑏𝑖𝐅𝐫𝑧superscript𝐫𝑧differential-d𝑧\begin{split}\hat{\mathbf{y}}^{tar}&=\sum_{i=1}^{K}\left[w_{i}\cdot\left(% \mathbf{y}_{i}^{src}+\int_{l_{i}}\mathbf{F}(\mathbf{r})\cdot d\mathbf{r}\right% )\right]\\ &=\sum_{i=1}^{K}\left[w_{i}\cdot\left(\mathbf{y}_{i}^{src}+\int_{a_{i}}^{b_{i}% }\mathbf{F}\big{(}\mathbf{r}(z)\big{)}\cdot\mathbf{r}^{\prime}(z)dz\right)% \right]\end{split}start_ROW start_CELL over^ start_ARG bold_y end_ARG start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT [ italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT + ∫ start_POSTSUBSCRIPT italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_F ( bold_r ) ⋅ italic_d bold_r ) ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT [ italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT + ∫ start_POSTSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_F ( bold_r ( italic_z ) ) ⋅ bold_r start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ( italic_z ) italic_d italic_z ) ] end_CELL end_ROW (2)

where 𝐲^tarsuperscript^𝐲𝑡𝑎𝑟\hat{\mathbf{y}}^{tar}over^ start_ARG bold_y end_ARG start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT is the joint estimation of 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT by neighbors. wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐲isrcsuperscriptsubscript𝐲𝑖𝑠𝑟𝑐\mathbf{y}_{i}^{src}bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT represent the weight and PM2.5 concentration of the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT neighbor, respectively. lisubscript𝑙𝑖l_{i}italic_l start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the integral path in 3superscript3\mathbb{R}^{3}blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT that points from the coordinates of the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT neighbor to the target coordinate.

We call the inference strategy represented by Eq. (2) Pyramidal Inference, which is the framework of the STFNN we propose. To demonstrate its sophistication, we deconstruct Eq. (2) into two steps:

𝐘^tar=[𝐲^1𝐲^K]=[𝐲1src+𝒞i𝐅(𝐫)𝑑𝐫𝐲Ksrc+𝒞K𝐅(𝐫)𝑑𝐫]superscript^𝐘𝑡𝑎𝑟delimited-[]subscript^𝐲1subscript^𝐲𝐾delimited-[]superscriptsubscript𝐲1𝑠𝑟𝑐subscriptsubscript𝒞𝑖𝐅𝐫differential-d𝐫superscriptsubscript𝐲𝐾𝑠𝑟𝑐subscriptsubscript𝒞𝐾𝐅𝐫differential-d𝐫\hat{\mathbf{Y}}^{tar}=\left[\begin{array}[]{c}\hat{\mathbf{y}}_{1}\\ \vdots\\ \hat{\mathbf{y}}_{K}\end{array}\right]=\left[\begin{array}[]{c}\mathbf{y}_{1}^% {src}+\int_{\mathcal{C}_{i}}\mathbf{F}(\mathbf{r})\cdot d\mathbf{r}\\ \vdots\\ \mathbf{y}_{K}^{src}+\int_{\mathcal{C}_{K}}\mathbf{F}(\mathbf{r})\cdot d% \mathbf{r}\end{array}\right]over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] = [ start_ARRAY start_ROW start_CELL bold_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT + ∫ start_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_F ( bold_r ) ⋅ italic_d bold_r end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL bold_y start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT + ∫ start_POSTSUBSCRIPT caligraphic_C start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_F ( bold_r ) ⋅ italic_d bold_r end_CELL end_ROW end_ARRAY ] (3)

and

𝐲^tar=[w1wK]𝐘^tar=𝐖T𝐘^tar,superscript^𝐲𝑡𝑎𝑟delimited-[]subscript𝑤1subscript𝑤𝐾superscript^𝐘𝑡𝑎𝑟superscript𝐖𝑇superscript^𝐘𝑡𝑎𝑟\hat{\mathbf{y}}^{tar}=\left[\begin{array}[]{ccc}w_{1}&\ldots&w_{K}\end{array}% \right]\cdot\hat{\mathbf{Y}}^{tar}=\mathbf{W}^{T}\cdot\hat{\mathbf{Y}}^{tar},over^ start_ARG bold_y end_ARG start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT = [ start_ARRAY start_ROW start_CELL italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL start_CELL … end_CELL start_CELL italic_w start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY ] ⋅ over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT = bold_W start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ⋅ over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT , (4)

where 𝐲^isubscript^𝐲𝑖\hat{\mathbf{y}}_{i}over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the estimate of 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT by the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT neighbor, 𝐖𝐖\mathbf{W}bold_W and 𝐘^tarsuperscript^𝐘𝑡𝑎𝑟\hat{\mathbf{Y}}^{tar}over^ start_ARG bold_Y end_ARG start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT represent the vector form of wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐲^isubscript^𝐲𝑖\hat{\mathbf{y}}_{i}over^ start_ARG bold_y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, respectively. These two operations can be viewed as follows. In Eq. (3), a curve integral is used over the gradient field to estimate 𝐲tarsuperscript𝐲𝑡𝑎𝑟\mathbf{y}^{tar}bold_y start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT from each neighbor in a continuous and spatio-temporally uniform way, which takes advantage of the STF. In Eq. (4), the information from the neighbors is aggregated through 𝐖𝐖\mathbf{W}bold_W, which considers the spatio-temporal dependencies between the nodes and takes advantage of the STG. In this way, the Pyramidal Inference framework combines the two different spacetime perspectives into a distinctive new paradigm.

4 Implementation

Upon introducing the formulation of Pyramidal Inference, we proceed to its implementation through a meticulously designed model architecture, depicted in Figure 4. The model comprises three pivotal components:

  • Spatio-Temporal Encoding. This component transforms coordinates into coded vectors endowed with representational meaning, enhancing the network’s ability to comprehend and leverage the spatio-temporal characteristics of coordinates.

  • Ring Estimation: Implementation of Eq. (3), map** the encoded vector to the gradient of the STF. This process yields each neighbor’s estimate of the PM2.5 concentration for the target coordinates through a path integral.

  • Neighbor Aggregation: Implementation of Eq. (4), utilizing the coded vectors of neighbors and target coordinates as inputs to derive the estimated weights for each neighbor concerning the target coordinates.

In the following parts, we will provide a detailed exposition of each module, elucidating their functionalities step by step.

4.1 Spatio-Temporal Encoding

We revisit the previously introduced local STG, which amalgamates nodes across different time steps into a unified graph, potentially obscuring the inherent temporal properties of individual nodes. In essence, this local STG places nodes from diverse time steps into a shared environment without discerning their temporal distinctions. However, this issue can be mitigated through meticulous positional coding of nodes Gehring et al. (2017); Song et al. (2020).

We use 𝐩10𝐩superscript10\mathbf{p}\in\mathbb{R}^{10}bold_p ∈ blackboard_R start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT to denote the coding vector, essential for accurately describing the spatio-temporal characteristics of a node or coordinate. This vector is expressed as the concatenation 𝐩=[𝐩S,𝐩T]𝐩subscript𝐩𝑆subscript𝐩𝑇\mathbf{p}=\left[\mathbf{p}_{S},\mathbf{p}_{T}\right]bold_p = [ bold_p start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT , bold_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ], where 𝐩Ssubscript𝐩𝑆\mathbf{p}_{S}bold_p start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT represents spatial coding and 𝐩Tsubscript𝐩𝑇\mathbf{p}_{T}bold_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT represents temporal coding. In the spatial dimension, a node’s properties can be captured by its absolute position, represented by z-normalized longitude lngz𝑙𝑛subscript𝑔𝑧lng_{z}italic_l italic_n italic_g start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT and latitude latz𝑙𝑎subscript𝑡𝑧lat_{z}italic_l italic_a italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT, forming 𝐩S=[lngz,latz]subscript𝐩𝑆𝑙𝑛subscript𝑔𝑧𝑙𝑎subscript𝑡𝑧\mathbf{p}_{S}=\left[lng_{z},lat_{z}\right]bold_p start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = [ italic_l italic_n italic_g start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT , italic_l italic_a italic_t start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ]. For encoding temporal information 𝐩Tsubscript𝐩𝑇\mathbf{p}_{T}bold_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, sinusoidal functions with different periods are employed, reflecting the periodic nature of temporal phenomena. We utilize a set of periods T={1a,7a,30.5a,365a}T1𝑎7𝑎30.5𝑎365𝑎\textbf{T}=\left\{1a,7a,30.5a,365a\right\}T = { 1 italic_a , 7 italic_a , 30.5 italic_a , 365 italic_a }, with a𝑎aitalic_a as the scaling index, to represent days, weeks, months, and years. The temporal coding 𝐩Tsubscript𝐩𝑇\mathbf{p}_{T}bold_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is then represented as

𝐩(T,i)={sin(2πt/𝐓int(i/2)+1)imod2=0cos(2πt/𝐓int(i/2)+1)imod20subscript𝐩𝑇𝑖cases𝑠𝑖𝑛2𝜋𝑡subscript𝐓𝑖𝑛𝑡𝑖21modulo𝑖20𝑐𝑜𝑠2𝜋𝑡subscript𝐓𝑖𝑛𝑡𝑖21modulo𝑖20\mathbf{p}_{(T,i)}=\begin{cases}sin\left(2\pi t/\mathbf{T}_{int(i/2)+1}\right)% &\text{$i\bmod 2=0$}\\ cos\left(2\pi t/\mathbf{T}_{int(i/2)+1}\right)&\text{$i\bmod 2\neq 0$}\end{cases}bold_p start_POSTSUBSCRIPT ( italic_T , italic_i ) end_POSTSUBSCRIPT = { start_ROW start_CELL italic_s italic_i italic_n ( 2 italic_π italic_t / bold_T start_POSTSUBSCRIPT italic_i italic_n italic_t ( italic_i / 2 ) + 1 end_POSTSUBSCRIPT ) end_CELL start_CELL italic_i roman_mod 2 = 0 end_CELL end_ROW start_ROW start_CELL italic_c italic_o italic_s ( 2 italic_π italic_t / bold_T start_POSTSUBSCRIPT italic_i italic_n italic_t ( italic_i / 2 ) + 1 end_POSTSUBSCRIPT ) end_CELL start_CELL italic_i roman_mod 2 ≠ 0 end_CELL end_ROW (5)

where 𝐩(T,i)subscript𝐩𝑇𝑖\mathbf{p}_{(T,i)}bold_p start_POSTSUBSCRIPT ( italic_T , italic_i ) end_POSTSUBSCRIPT denotes the value of the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT dimension (1i81𝑖81\leq i\leq 81 ≤ italic_i ≤ 8) of 𝐩Tsubscript𝐩𝑇\mathbf{p}_{T}bold_p start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, int(i/2)𝑖𝑛𝑡𝑖2int(i/2)italic_i italic_n italic_t ( italic_i / 2 ) denotes dividing i𝑖iitalic_i by 2 and rounding down, and 𝐓int(i/2)+1subscript𝐓𝑖𝑛𝑡𝑖21\mathbf{T}_{int(i/2)+1}bold_T start_POSTSUBSCRIPT italic_i italic_n italic_t ( italic_i / 2 ) + 1 end_POSTSUBSCRIPT is the int(i/2)+1𝑖𝑛𝑡𝑖21int(i/2)+1italic_i italic_n italic_t ( italic_i / 2 ) + 1 period of 𝐓𝐓\mathbf{T}bold_T.

4.2 Ring Estimation

Motivation. We employ the continuous approach of curve integration over the gradient to determine the PM2.5 concentration at the target coordinate in Eq. (3). However, due to inherent limitations in numerical accuracy within computing systems, achieving true continuity in STF becomes unattainable. Therefore, we have adopted an incremental approach. We set the integral path to be a straight line from the neighbors’ coordinate 𝐜srcsuperscript𝐜𝑠𝑟𝑐\mathbf{c}^{src}bold_c start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT to the target coordinate 𝐜tarsuperscript𝐜𝑡𝑎𝑟\mathbf{c}^{tar}bold_c start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT for convenience, then the unit direction vector 𝐫𝐫\vec{\mathbf{r}}over→ start_ARG bold_r end_ARG in the path can be written as 𝐫(𝐜tar𝐜src)/(𝐜tar𝐜src)𝐫superscript𝐜𝑡𝑎𝑟superscript𝐜𝑠𝑟𝑐normsuperscript𝐜𝑡𝑎𝑟superscript𝐜𝑠𝑟𝑐\vec{\mathbf{r}}\triangleq\left(\mathbf{c}^{tar}-\mathbf{c}^{src}\right)\big{/% }\left(\|\mathbf{c}^{tar}-\mathbf{c}^{src}\|\right)over→ start_ARG bold_r end_ARG ≜ ( bold_c start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT - bold_c start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT ) / ( ∥ bold_c start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT - bold_c start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT ∥ ). After that, we replace the integral operation with a summation operation and modify Eq. (3) to a discrete form

𝐲^tar=i=1K[wi(𝐲isrc+j=1m𝐃i,j𝐫i)]superscript^𝐲𝑡𝑎𝑟superscriptsubscript𝑖1𝐾delimited-[]subscript𝑤𝑖superscriptsubscript𝐲𝑖𝑠𝑟𝑐superscriptsubscript𝑗1𝑚subscript𝐃𝑖𝑗subscript𝐫𝑖\begin{split}\hat{\mathbf{y}}^{tar}&=\sum_{i=1}^{K}\left[w_{i}\cdot\left(% \mathbf{y}_{i}^{src}+\sum_{j=1}^{m}\mathbf{D}_{i,j}\cdot\vec{\mathbf{r}_{i}}% \right)\right]\\ \end{split}start_ROW start_CELL over^ start_ARG bold_y end_ARG start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT end_CELL start_CELL = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT [ italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ ( bold_y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT + ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT bold_D start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ⋅ over→ start_ARG bold_r start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ) ] end_CELL end_ROW (6)

where m𝑚mitalic_m represents the step size of the summation and 𝐜isrcsuperscriptsubscript𝐜𝑖𝑠𝑟𝑐\mathbf{c}_{i}^{src}bold_c start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT is the coordinate of the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT node in the local STG. 𝐃i,j3subscript𝐃𝑖𝑗superscript3\mathbf{D}_{i,j}\in\mathbb{R}^{3}bold_D start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT represents the difference at the jthsubscript𝑗𝑡j_{th}italic_j start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT step of the ithsubscript𝑖𝑡i_{th}italic_i start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT node, which is the discrete approximation of the gradient. Our objective is to build a module for estimating 𝐃i,jsubscript𝐃𝑖𝑗\mathbf{D}_{i,j}bold_D start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT, which is the only unknown in Eq. (6).

Overview. Towards this objective, we introduce a pivotal module named Ring Estimation, designed for the joint estimation of the differences [𝐃,j]:K=[𝐃1,j,,𝐃K,j]K×3subscriptdelimited-[]subscript𝐃𝑗:absent𝐾subscript𝐃1𝑗subscript𝐃𝐾𝑗superscript𝐾3\left[\mathbf{D}_{\cdot,j}\right]_{:K}=\left[\mathbf{D}_{1,j},\cdots,\mathbf{D% }_{K,j}\right]\in\mathbb{R}^{K\times 3}[ bold_D start_POSTSUBSCRIPT ⋅ , italic_j end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT : italic_K end_POSTSUBSCRIPT = [ bold_D start_POSTSUBSCRIPT 1 , italic_j end_POSTSUBSCRIPT , ⋯ , bold_D start_POSTSUBSCRIPT italic_K , italic_j end_POSTSUBSCRIPT ] ∈ blackboard_R start_POSTSUPERSCRIPT italic_K × 3 end_POSTSUPERSCRIPT at the jthsubscript𝑗𝑡j_{th}italic_j start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT step of all neighbors. We posit that simultaneous estimation of [𝐃,j]:K:delimited-[]subscript𝐃𝑗𝐾\left[\mathbf{D}_{\cdot,j}\right]{:K}[ bold_D start_POSTSUBSCRIPT ⋅ , italic_j end_POSTSUBSCRIPT ] : italic_K enhances inference efficiency and captures correlations between them, thereby reducing estimation errors compared to individually estimating 𝐃i,jsubscript𝐃𝑖𝑗\mathbf{D}_{i,j}bold_D start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT for a single neighbor K𝐾Kitalic_K times. Specifically, the Ring Estimation module divides the polygon surrounded by K𝐾Kitalic_K neighbors around the target coordinate into m𝑚mitalic_m ring zones from the outermost to the innermost, with total mK𝑚𝐾mKitalic_m italic_K of transition nodes (coordinates) uniformly spaced along the inference path. The inner edge of the jthsubscript𝑗𝑡j_{th}italic_j start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT ring zone serves as the outer edge for the (j+1)thsubscript𝑗1𝑡(j+1)_{th}( italic_j + 1 ) start_POSTSUBSCRIPT italic_t italic_h end_POSTSUBSCRIPT zone. Like the target coordinate, the transition nodes lack features and labels (PM2.5 concentration). They serve as the intermediary states and springboards in the process of estimating 𝐲^tarsuperscript^𝐲𝑡𝑎𝑟\hat{\mathbf{y}}^{tar}over^ start_ARG bold_y end_ARG start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT. By increasing the value of m𝑚mitalic_m, the Ring Estimation block facilitates the inference in an approximately continuous manner.

4.3 Neighbor Aggregation

It is advisable to assign varying weights 𝐖=[w1,,wK]𝐖𝑤1subscript𝑤𝐾\mathbf{W}=\left[w1,\ldots,w_{K}\right]bold_W = [ italic_w 1 , … , italic_w start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] to the estimations of the target nodes based on the different spatio-temporal scenarios in which they are situated. To this end, we present the Neighbor Aggregation module, which takes into account the coding of the coordinates of the neighbors and the target coordinate and employs end-to-end learning to compute the estimation weights wisubscript𝑤𝑖w_{i}italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT of each neighbor on the target node. To obtain 𝐖𝐖\mathbf{W}bold_W, we first multiplied the output by WN10×1subscript𝑊𝑁superscript101W_{N}\in\mathbb{R}^{10\times 1}italic_W start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 10 × 1 end_POSTSUPERSCRIPT to transform it into a Logit score. Then, we applied the Softmax𝑆𝑜𝑓𝑡𝑚𝑎𝑥Softmaxitalic_S italic_o italic_f italic_t italic_m italic_a italic_x operation to ensure that the weights sum up to one. In this end, the formulation of the Neighbor Aggregation can be written as

𝐖=Softmax(WNDecoder(𝐏src,𝐏tar))𝐖𝑆𝑜𝑓𝑡𝑚𝑎𝑥subscript𝑊𝑁𝐷𝑒𝑐𝑜𝑑𝑒𝑟superscript𝐏𝑠𝑟𝑐superscript𝐏𝑡𝑎𝑟\mathbf{W}=Softmax\left(W_{N}\cdot Decoder(\mathbf{P}^{src},\mathbf{P}^{tar})\right)bold_W = italic_S italic_o italic_f italic_t italic_m italic_a italic_x ( italic_W start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ⋅ italic_D italic_e italic_c italic_o italic_d italic_e italic_r ( bold_P start_POSTSUPERSCRIPT italic_s italic_r italic_c end_POSTSUPERSCRIPT , bold_P start_POSTSUPERSCRIPT italic_t italic_a italic_r end_POSTSUPERSCRIPT ) ) (7)

5 Experiments

In this section, we delve into our experimental methodology aimed at evaluating the performance and validating the efficacy of the STFNN. Specifically, our experiments are designed to explore the following research questions, elucidating key aspects of our approach and its applicability in real-world scenarios:

  • RQ1: How does STFNN’s approach, focusing on inferring concentration gradients for indirect concentration value inference, outperform traditional methods in terms of accuracy and effectiveness?

  • RQ2: What specific contributions do the individual components of STFNN make to its effectiveness in inferring air pollutant concentrations?

  • RQ3: How do variations in each hyperparameter impact the overall performance of STFNN?

  • RQ4: Do the three-dimensional hidden states learned by the model accurately represents the gradient of the spatio-temporal field?

  • RQ5: Can our model demonstrate proficient performance in inferring concentrations of various air pollutants, including \chemfigNO_2?

Model Year #Param(M) Mask Ratio = 25% Mask Ratio = 50% Mask Ratio = 75%
MAE 𝚫𝚫\boldsymbol{\Delta}bold_Δ RMSE MAPE MAE 𝚫𝚫\boldsymbol{\Delta}bold_Δ RMSE MAPE MAE 𝚫𝚫\boldsymbol{\Delta}bold_Δ RMSE MAPE
KNN 1967 - - 30.50 +146.0% 65.40 1.36 30.25 +145.5% 72.23 0.71 34.07 +194.0% 74.55 0.64
RF 2001 29.22 +135.6% 68.95 0.76 29.71 +141.2% 71.61 0.75 29.82 +157.3% 70.99 0.74
MCAM 2021 0.408 23.94 +93.1% 36.25 0.95 25.01 +103.0% 37.94 0.92 25.19 +117.3% 37.82 1.04
SGNP 2019 0.114 0.108 23.60 +90.3% 37.58 0.83 24.06 +95.3% 37.08 0.93 21.68 +87.1% 33.68 0.84
STGNP 2022 23.21 +87.2% 38.13 0.62 21.95 +78.2% 37.13 0.67 19.58 +68.9% 31.95 0.69
VAE 2013 0.011 0.073 0.073 28.49 +129.8% 67.11 0.94 28.92 +134.7% 69.67 0.94 29.00 +150.2% 69.11 0.93
GAE 2016 12.63 +1.9% 23.80 0.46 12.78 +3.7% 24.11 0.46 12.57 +8.5% 23.73 0.46
GraphMAE 2022 12.40 - 23.20 0.46 12.32 - 23.11 0.46 11.59 - 21.51 0.43
STFNN - 0.208 11.14 -10.2% 19.75 0.39 11.32 -8.1% 19.91 0.42 11.27 -2.8% 19.86 0.41
Table 1: Model comparison on the nationwide dataset. The parameter count, denoted as #Param, is in the order of million (M). The symbol ΔΔ\Deltaroman_Δ represents the reduction in MAE compared to GraphMAE. The mask ratio represents the proportion of unobserved nodes to all nodes.

5.1 Experimental Settings

5.1.1 Datasets

The study obtained a nationwide air quality dataset Liang et al. (2023) from January 1st, 2018, to December 31st, 2018. This dataset includes air quality and meteorological data. The input data can be divided into two classes: continuous and categorical data. Continuous data includes critical parameters such as air pollutant concentrations (e.g., PM2.5, CO), temperature, wind speed, and others. Categorical data encompasses weather, wind direction, and time. In the event of an unanticipated occurrence, such as a power outage, some data may be unavailable.

5.2 Baselines for Comparison

We compare our STFNN with the following baselines that belong to the following four categories:

  • Statistical models: KNN Guo et al. (2003) utilizes non-parametric, instance-based learning, inferring air quality by considering data from the nearest neighbors. Random Forest (RF) Fawagreh et al. (2014) aggregates interpolation from diverse decision trees, each trained on different dataset subsets, providing robust results.

  • Neural Network based models: MCAM Han et al. (2021) introduces multi-channel attention blocks capturing static and dynamic correlations.

  • Neural Processes based models: SGNP, a modification of Sequential Neural Processes (SNP) Singh et al. (2019), incorporates a cross-set graph network before aggregation, enhancing air quality inference. STGNP Hu et al. (2023a) employing a Bayesian graph aggregator for context aggregation considering uncertainties and graph structure.

  • AutoEncoder based models: VAE Kingma and Welling (2022) applies variational inference to air quality inference, utilizing reconstruction for target node inference. GAE Kipf and Welling (2016b) reconstructs node features within a graph structure, while GraphMAE Hou et al. (2022) introduces a masking strategy for innovative node feature reconstruction.

5.3 Hyperparameters & Setting

To mitigate their impact, instances exceeding a 50% threshold of missing data at any given time were prudently omitted from our analysis. Our dataset was carefully partitioned into three segments: a 60% training set, a 20% validation set, and a 10% test set. During training, in each epoch, we randomly select stations with ratio α𝛼\alphaitalic_α, mask their features and historical information, and let them act as the target node. We ignore all locations where PM2.5 (or \chemfigNO_2 in the case of RQ5) is missing. The model is trained with an Adam optimizer, starting with a learning rate of 1E-3, reduced by half every 40 epochs during the 200 training epochs. The batch size for training is set to 32. The hidden dimension of MLP and all the Transformer-Decoder networks is fixed at 64. For Ring Estimation, the neighbor number is set to 6, incorporating the past 6 timesteps, and the iteration step m𝑚mitalic_m is defined as 16.

5.4 Model Comparison (RQ1)

In addressing RQ1, we conduct a meticulous comparative analysis among models based on the evaluation metrics of MAE, RMSE, and MAPE. The empirical outcomes derived from this analysis are systematically presented across the expansive spectrum of the nationwide air quality dataset, meticulously documented within Table 1.

The results indicate that STFNN consistently demonstrates enhanced efficacy across various evaluation metrics, surpassing existing baseline models. Table 1 shows that our approach reduces MAE under three different mask ratios (25%, 50%, and 75%) in comparison to GraphMAE, establishing a new State-of-the-Art (SOTA) in nationwide PM2.5 concentration inference in the Chinese Mainland. In our view, there are three main reasons for this. First, the gradient field is a better representation of reality. Second, the spatial and temporal modules of STFNN capture both types of information together, avoiding bias or information loss. Finally, our Pyramidal Inference framework captures global and local spatio-temporal properties, which helps us model the pollutant concentration field more accurately.

Refer to caption
Figure 5: (a) ablation study (b) the variation of curl

5.5 Ablation Study (RQ2)

To assess the contributions of individual components to the performance of our model and address RQ2, we conducted ablation studies. The findings from these studies are presented in Figure 5 (a).

Effects of meteorological features. To analyze the impact of meteorological features on the accuracy of the final model, we removed them from the raw data. Therefore, the gradient was obtained solely from the spatio-temporal coordinates of neighboring stations fed into the gradient encoder. The figure demonstrates that removing the meteorological features resulted in some improvement in the model’s mean absolute error (MAE), which still outperformed all baseline models.

Effects of Neighbor Aggregation. To investigate the impact of a dynamic and learnable implicit graph structure on the model, we substituted the model’s Neighbor Aggregation module with IDW and SES, a non-parametric approach inspired by Zheng et al Zheng et al. (2013). This approach employs implicit graph relations that are static. The results depicted in Figure 5 (a) demonstrate that utilizing the Neighbor Aggregation module results in a significant decrease in MAE.

5.6 Hyperparameters Study (RQ3)

In this section, we comprehensively explore the effects of various hyperparameters on the model’s performance, thereby addressing RQ3.

Effects of Hidden & FFD Dimension. We adjusted the hidden layer dimension of the Spatio-Temporal Encoding module and the forward propagation of the Transformer-Decoder structure used by the Ring Estimation and Neighbor Aggregation modules from 16 to 64. The results in Figure 6 (a-b) show that adjusting the hidden layer dimension has little effect on the absolute values of MAE and RMSE, but it significantly decreases MAPE.

Effects of Step Size. We vary the value of the accumulation step size of the Ring Estimation in {2,4,8,16}24816\left\{2,4,8,16\right\}{ 2 , 4 , 8 , 16 }. The result is shown in Figure 6 (c). It has been observed that as the step size increases, the model’s performance initially declines before improving. Additionally, when comparing 2 steps to 16 steps, we note that the model’s training time per round is approximately 50% longer for 16 steps.

Effects of Neighbors Number. We vary the number of neighbors from 2 to 8, and the result is shown in Figure 6 (d). We observe that the performance of our model improved as the number of neighbors increased. Notably, our proposed STFNN exhibited excellent performance even with a small number of neighbors. Due to its ability to learn global spatio-temporal patterns, the model can use global information for inference even in scenarios where there are only a few neighbors present.

5.7 Interpretability (RQ4)

To confirm that the network’s learned vector is the gradient of the spatio-temporal field, we calculate the curl variation with the number of training epochs. We use Yang et al.’s method Yang et al. (2023) to quantify the curl and present the experimental results in Figure 5 (b). It is evident that the curl of the vector field obtained by the network decreases as the training progresses, indicating successful learning of the gradient field.

Model Mask Ratio = 25% Mask Ratio = 50% Mask Ratio = 75%
MAE RMSE MAE RMSE MAE RMSE
KNN 18.10 62.51 18.47 64.22 20.18 62.86
RF 16.90 61.25 17.60 64.91 17.36 63.70
MCAM 18.25 27.80 17.75 27.42 21.17 29.41
SGNP 17.66 25.43 19.17 26.36 16.57 24.11
STGNP 16.43 27.85 15.62 26.06 15.70 26.23
VAE 29.85 112.81 31.43 119.59 30.82 117.33
GAE 12.80 30.16 12.77 30.16 12.77 30.00
GraphMAE 12.76 30.25 12.60 29.53 12.48 29.30
STFNN 11.34 23.65 11.52 24.93 11.97 25.81
Table 2: Experiment result on NO2

5.8 Generalizability (RQ5)

Our model not only excels in inferring PM2.5 but also establishes a new benchmark, achieving the SOTA in inferring the concentration of \chemfigNO_2, as shown in Table 2. This noteworthy outcome underscores the versatility of our model across distinct air quality parameters. In comparison to the baseline, our model demonstrates a significant advantage, showcasing its capability to handle diverse pollutants effectively and outperforming established methods in inference for \chemfigNO_2.

Refer to caption
Figure 6: Hyperparameter Study

6 Related Works

Traditional methods Hasenfratz et al. (2014); Jumaah et al. (2019) rely on linear spatial assumptions. However, these models only consider simple spatial relationships and do not adapt to complex changes in air quality. In recent times, there’s been a growing interest in studying Spatio-Temporal Graph to understand the intricate relationship that involves both spatial and temporal for air quality inference. STGNNs Jiang et al. (2021); Salim and Haque (2015); Wang et al. (2020); Sun et al. (2020); Wang et al. (2021), which integrate the strengths of GNNs, have emerged as the leading approach for uncovering intricate relationships in STG data. Some follow-ups Li et al. (2017); Yu et al. (2017); Geng et al. (2019) introduce temporal components such as Recurrent Neural Networks (RNN) Graves (2013) and Temporal Convolutional Networks (TCN) Bai et al. (2018) to better address the spatio-temporal dependencies. However, The limitation of STGNNs lies in their lack of consideration for contiguity and Euclidean spatial structures.

7 Conclusion

In this work, we introduced a novel perspective for air quality inference, framing it as a problem of reconstructing Spatio-Temporal Fields (STFs) to better capture the continuous and unified nature of air quality data. Our proposed Spatio-Temporal Field Neural Network (STFNN) breaks away from the limitations of Spatio-Temporal Graph Neural Networks (STGNNs) by focusing on implicit representations of gradients, offering a more faithful representation of the dynamic evolution of air quality phenomena.

Acknowledgements

This work is supported by a grant from State Key Laboratory of Resources and Environmental Information System. This study is also funded by the Guangzhou-HKUST(GZ) Joint Funding Program (No. 2024A03J0620).

References

  • Bai et al. [2018] Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.
  • Fawagreh et al. [2014] Khaled Fawagreh, Mohamed Medhat Gaber, and Eyad Elyan. Random forests: from early developments to recent advancements. Systems Science & Control Engineering: An Open Access Journal, 2(1):602–609, 2014.
  • Gehring et al. [2017] Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. In International conference on machine learning, pages 1243–1252. PMLR, 2017.
  • Geng et al. [2019] Xu Geng, Yaguang Li, Leye Wang, Lingyu Zhang, Qiang Yang, Jie** Ye, and Yan Liu. Spatiotemporal multi-graph convolution network for ride-hailing demand forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 3656–3663, 2019.
  • Graves [2013] Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
  • Guo et al. [2003] Gongde Guo, Hui Wang, David Bell, Yaxin Bi, and Kieran Greer. Knn model-based approach in classification. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE: OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, November 3-7, 2003. Proceedings, pages 986–996. Springer, 2003.
  • Han et al. [2021] Qilong Han, Dan Lu, and Rui Chen. Fine-grained air quality inference via multi-channel attention model. In IJCAI, pages 2512–2518, 2021.
  • Han et al. [2023] **dong Han, Weijia Zhang, Hao Liu, and Hui Xiong. Machine learning for urban air quality analytics: A survey. arXiv preprint arXiv:2310.09620, 2023.
  • Hasenfratz et al. [2014] David Hasenfratz, Olga Saukh, Christoph Walser, Christoph Hueglin, Martin Fierz, and Lothar Thiele. Pushing the spatio-temporal resolution limit of urban air pollution maps. In 2014 IEEE International Conference on Pervasive Computing and Communications (PerCom), pages 69–77. IEEE, 2014.
  • He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  • Hou et al. [2022] Zhenyu Hou, Xiao Liu, Yukuo Cen, Yuxiao Dong, Hongxia Yang, Chunjie Wang, and Jie Tang. Graphmae: Self-supervised masked graph autoencoders, 2022.
  • Hu et al. [2023a] Junfeng Hu, Yuxuan Liang, Zhencheng Fan, Hongyang Chen, Yu Zheng, and Roger Zimmermann. Graph neural processes for spatio-temporal extrapolation. arXiv preprint arXiv:2305.18719, 2023.
  • Hu et al. [2023b] Junfeng Hu, Yuxuan Liang, Zhencheng Fan, Li Liu, Yifang Yin, and Roger Zimmermann. Decoupling long-and short-term patterns in spatiotemporal inference. IEEE Transactions on Neural Networks and Learning Systems, 2023.
  • Jiang et al. [2021] Renhe Jiang, Du Yin, Zhaonan Wang, Yizhuo Wang, Jiewen Deng, Hangchen Liu, Zekun Cai, **liang Deng, Xuan Song, and Ryosuke Shibasaki. Dl-traff: Survey and benchmark of deep learning models for urban traffic prediction. In Proceedings of the 30th ACM international conference on information & knowledge management, pages 4515–4525, 2021.
  • ** et al. [2023] Guangyin **, Yuxuan Liang, Yuchen Fang, **cai Huang, Junbo Zhang, and Yu Zheng. Spatio-temporal graph neural networks for predictive learning in urban computing: A survey. arXiv preprint arXiv:2303.14483, 2023.
  • Jumaah et al. [2019] Huda Jamal Jumaah, Mohammed Hashim Ameen, Bahareh Kalantar, Hossein Mojaddadi Rizeei, and Sarah Jamal Jumaah. Air quality index prediction using idw geostatistical technique and ols-based gis technique in kuala lumpur, malaysia. Geomatics, Natural Hazards and Risk, 10(1):2185–2199, 2019.
  • Kingma and Welling [2022] Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2022.
  • Kipf and Welling [2016a] Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907, 2016.
  • Kipf and Welling [2016b] Thomas N. Kipf and Max Welling. Variational graph auto-encoders, 2016.
  • Li et al. [2017] Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926, 2017.
  • Liang et al. [2022] Yuxuan Liang, Kun Ouyang, Yiwei Wang, Zheyi Pan, Yifang Yin, Hongyang Chen, Junbo Zhang, Yu Zheng, David S Rosenblum, and Roger Zimmermann. Mixed-order relation-aware recurrent neural networks for spatio-temporal forecasting. IEEE Transactions on Knowledge and Data Engineering, 2022.
  • Liang et al. [2023] Yuxuan Liang, Yutong Xia, Songyu Ke, Yiwei Wang, Qingsong Wen, Junbo Zhang, Yu Zheng, and Roger Zimmermann. Airformer: Predicting nationwide air quality in china with transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 14329–14337, 2023.
  • McMullin [2002] Ernan McMullin. The origins of the field concept in physics. Physics in Perspective, 4:13–39, 2002.
  • Salim and Haque [2015] Flora Salim and Usman Haque. Urban computing in the wild: A survey on large scale participation and citizen engagement with ubiquitous computing, cyber physical systems, and internet of things. International Journal of Human-Computer Studies, 81:31–48, 2015.
  • Singh et al. [2019] Gautam Singh, Jaesik Yoon, Youngsung Son, and Sung** Ahn. Sequential neural processes, 2019.
  • Sitzmann et al. [2020] Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462–7473, 2020.
  • Song et al. [2020] Chao Song, Youfang Lin, Shengnan Guo, and Huaiyu Wan. Spatial-temporal synchronous graph convolutional networks: A new framework for spatial-temporal network data forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 914–921, 2020.
  • Sun et al. [2020] Junkai Sun, Junbo Zhang, Qiaofei Li, Xiuwen Yi, Yuxuan Liang, and Yu Zheng. Predicting citywide crowd flows in irregular regions using multi-view graph convolutional networks. IEEE Transactions on Knowledge and Data Engineering, 34(5):2348–2359, 2020.
  • Vallero [2014] Daniel A Vallero. Fundamentals of air pollution. Academic press, 2014.
  • Wang et al. [2020] Senzhang Wang, Jiannong Cao, and S Yu Philip. Deep learning for spatio-temporal data mining: A survey. IEEE transactions on knowledge and data engineering, 34(8):3681–3700, 2020.
  • Wang et al. [2021] Huandong Wang, Qiaohong Yu, Yu Liu, Depeng **, and Yong Li. Spatio-temporal urban knowledge graph enabled mobility prediction. Proceedings of the ACM on interactive, mobile, wearable and ubiquitous technologies, 5(4):1–24, 2021.
  • Xie et al. [2022] Yiheng Xie, Towaki Takikawa, Shunsuke Saito, Or Litany, Shiqin Yan, Numair Khan, Federico Tombari, James Tompkin, Vincent Sitzmann, and Srinath Sridhar. Neural fields in visual computing and beyond. In Computer Graphics Forum, volume 41, pages 641–676. Wiley Online Library, 2022.
  • Xu et al. [2019] Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, and Zheng Ma. Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523, 2019.
  • Yang et al. [2023] ** Zhou. Neural vector fields: Generalizing distance vector fields by codebooks and zero-curl regularization. arXiv preprint arXiv:2309.01512, 2023.
  • Yu et al. [2017] Bing Yu, Haoteng Yin, and Zhanxing Zhu. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875, 2017.
  • Zheng et al. [2013] Yu Zheng, Furui Liu, and Hsun-** Hsieh. U-air: When urban air quality inference meets big data. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1436–1444, 2013.