FlightPatchNet: Multi-Scale Patch Network with Differential Coding for Flight Trajectory Prediction

Lan Wu, Xuebin Wang , Ruijuan Chu, Guangyi Liu, Yingchun Chen
**g Zhang , Linyu Wang
Information Engineering University
{lanwundsc,xuebinwang2024,liuguangyi1982,springer_2002}@163.com
{linyu18,zhurj18}@mails.jlu.edu.cn
,[email protected]
coauthorcorresponding authorcorresponding author
Abstract

Accurate multi-step flight trajectory prediction plays an important role in Air Traffic Control, which can ensure the safety of air transportation. Two main issues limit the flight trajectory prediction performance of existing works. The first issue is the negative impact on prediction accuracy caused by the significant differences in data range. The second issue is that real-world flight trajectories involve underlying temporal dependencies, and existing methods fail to reveal the hidden complex temporal variations and only extract features from one single time scale. To address the above issues, we propose FlightPatchNet, a multi-scale patch network with differential coding for flight trajectory prediction. Specifically, FlightPatchNet first utilizes the differential coding to encode the original values of longitude and latitude into first-order differences and generates embeddings for all variables at each time step. Then, a global temporal attention is introduced to explore the dependencies between different time steps. To fully explore the diverse temporal patterns in flight trajectories, a multi-scale patch network is delicately designed to serve as the backbone. The multi-scale patch network exploits stacked patch mixer blocks to capture inter- and intra-patch dependencies under different time scales, and further integrates multi-scale temporal features across different scales and variables. Finally, FlightPatchNet ensembles multiple predictors to make direct multi-step prediction. Extensive experiments on ADS-B datasets demonstrate that our model outperforms the competitive baselines. Code is available at: https://github.com/FlightTrajectoryResearch/FlightPatchNet.

1 Introduction

Flight Trajectory Prediction (FTP) is an essential task in the Air Traffic Control (ATC) procedure, which can be applied to various scenarios such as air traffic flow prediction Abadi6878453 ; LIN2019105113 , aircraft conflict detection AdeepGaussian , and arrival time estimation WANG2018280 . Accurate FTP can ensure the safety of air transportation and improve real-time airspace management LIN8846596 ; Shi9136843 . Generally, FTP tasks can be divided into three categories: long-term Jeong8190764 ; Runle7999617 , medium-term Yuan7554828 ; Chen2016ShortmediumtermPF , and short-term huang2017short ; duan2018unified . Among them, short-term trajectory prediction has the greatest impact on ATC and is increasingly in demand for air transportation. In this paper, we mainly focus on the short-term FTP task, which aims to predict future flight trajectories based on historical observations.

In ATC domain, multi-step trajectory prediction can provide more practical applications than single-step prediction LIN8846596 . It can be divided into Iterated Multi-Step (IMS) prediction and Direct Multi-Step (DMS) prediction. IMS-based methods Yan6972562 ; Zhang2023FlightTP ; Guo2023FlightBERT make multi-step prediction recursively, which learns a single-step model and iteratively applies the predicted values as observations to forecast the next trajectory point. Due to the error accumulation problem and the step-by-step prediction scheme, this type of methods usually fails in multi-step prediction and cannot meet the real-time requirement. By contrast, DMS-based methods Guo2023FlightBERT++ ; wuhan2023bi can directly generate future trajectory points at once, which can tackle the error accumulation problem and improve prediction efficiency. Therefore, this paper performs the short-term FTP task in DMS way.

However, two main issues are not well addressed in existing works Yan6972562 ; Zhang2023FlightTP ; Guo2023FlightBERT++ ; wuhan2023bi , limiting the trajectory prediction performance. The first issue is the negative impact on prediction accuracy caused by the significant differences in data range. In general, longitude and latitude are denoted by degree but altitude is by meter. Since one degree is approximately 111 kilometers, the data range of longitude and latitude are extremely different from that of altitude. Some previous works CNN-LSTM9145522 ; LSTM8489734 directly utilized normalization algorithms to scale variables into the same range, e.g., from 0 to 1. However, the actual prediction errors may be unacceptable for FTP task when evaluated in raw data range. FlightBERT Guo2023FlightBERT and FlightBERT++ Guo2023FlightBERT++ proposed binary encoding representation to convert variables from rounded decimal numbers to binary vectors, which regards the FTP task as multiple binary classification problem. Although BE representation can avoid the vulnerability caused by normalization algorithms, one serious limitation is introduced: a high bit misclassification in binary will lead to a large absolute error in decimal.

Refer to caption
(a) original series
Refer to caption
(b) first-order difference series
Figure 1: The original and first-order difference series in real-world flight trajectories.

The second issue is that real-world flight trajectories involve underlying temporal dependencies, and existing methods Shi9136843 ; Guo2023FlightBERT ; Guo2023FlightBERT++ fail to reveal the hidden complex temporal variations and only extract features from one single time scale. As shown in Figure 1, the original series of longitude and latitude are over-smoothing and obscure abundant temporal variations, which can be clearly observed from first-order difference series. Besides, the temporal variation patterns of longitude and latitude are quite distinct from those of altitude which have an obvious global trend but suffer from intense local fluctuations. For example, slight turbulence can exert a significant influence on the altitude but produce a negligible effect on the longitude and latitude. A single-scale model cannot simultaneously capture both local temporal details and global trends. This calls for powerful multi-scale temporal modeling capacity. Furthermore, if the learned multi-scale temporal patterns are simply aggregated, the model is failed to focus on contributed patterns. Meanwhile, it is essential to explore relationships across variables, e.g., the velocity at current time step directly effects the location at next time step. Thus, scale-wise correlations and inter-variable relationships should be fully considered when modeling the multi-scale temporal patterns.

Based on above analysis, this paper proposes a multi-scale patch network with differential coding (FlightPatchNet) to address above issues. Specifically, we utilize the differential coding to encode the original values of longitude and latitude into first-order differences and retain original values of other variables as inputs. Due to the dependencies between nearby and distant time steps, we introduce global temporal embedding to explore the correlations between time steps. Then, a multi-scale patch network is proposed to enable the ability of powerful and complete temporal modeling. The multi-scale patch network divide the trajectory series into patches of different sizes, and exploits stacked patch mixer blocks to capture global trends across patches and local details within patches. To further promote the multi-scale temporal modeling capacity, a multi-scale aggregator is introduced to capture scale-wise correlations and inter-variable relationships. Finally, FlightPatchNet ensembles multiple predictors to make direct multi-step forecasting, which can benefit from complementary multi-scale temporal features and improve the generalization ability. The main contributions are summarized as follows:

  • We utilize differential coding to effectively reduce the differences in data range and reveal the underlying temporal variations in real-world flight trajectories. Our empirical studies show that using differential values of longitude and latitude can greatly improve prediction accuracy.

  • We propose FlightPatchNet to fully explore underlying multi-scale temporal patterns. A multi-scale patch network is designed to capture inter- and intra-patch dependencies under different time scales, and integrate multi-scale temporal features across scales and variables. To our knowledge, this is the first work that introduces multi-scale modeling for flight trajectory prediction.

  • We conduct extensive experiments on a real-world dataset. The experiment results demonstrate that our proposed model significantly outperforms the most competitive baselines.

2 Related Work

Kinetics-and-Aerodynamics Methods

The Kinetics-and-Aerodynamics methods thipphavong2013adaptive ; soler2015multiphase ; benavides2014implementation ; tang20154d divide the entire flight process into several phases, and establish motion equations for each phase to formulate the flight status. For example, wang2009prediction adopted basic flight models to construct horizontal, vertical, and velocity profiles based on the characteristics of different flight phases. Zhi**g7867472 combined the dynamics-and-kinematics models and grayscale theory to predict future trajectories. The grayscale theory can address the parameter missing problem in dynamics-and-kinematics models and improve the prediction performance. Due to numerous unknown and time-varying flight parameters of aircraft, these fixed-parameter methods cannot accurately describe the flight status, leading to a poor performance and limited application scenarios.

State-Estimation Methods

The Kalman Filter and its variants Yan6972562 ; wang20144d ; xi2008simulation are the typical single-model state-estimation algorithms for FTP task, which applies the predefined state equations to estimate the next flight status based on the current observation. For example, xi2008simulation applied the Kalman Filter to track discrete flight trajectories by calculating a continuous state transition matrix. However, single-model algorithms cannot adapt to the complex ATC environment. To address this issue, Interactive Multi Model algorithms hwang2003flight ; li2005survey have been proposed and successfully applied for trajectory analysis. Although multi-model algorithms can achieve better prediction performance, the computational complexity is high and cannot satisfy the real-time requirement.

Deep Learning Methods

With the rapid development of deep learning, there has been a surge of deep learning methods for FTP task Guo2023FlightBERT ; Guo2023FlightBERT++ ; pang2022bayesian ; xu2021multi ; Sahadevan . These learning-based approaches can extract high-dimensional features from raw data, which have achieved a more magnificent performance compared to previous methods. For example, Sahadevan used a Bi-directional Long-Short-Term-Memory (Bi-LSTM) network to explore both forward and backward dependencies in the sequential trajectory data. FlightBERT Guo2023FlightBERT employed binary encoding to represent the attributes of the trajectory points and considered the FTP task as a multi binary classification problem. However, these works predict the future trajectory recursively and suffer from serious error accumulation. Recently, FlightBERT++ Guo2023FlightBERT++ has been introduced for DMS prediction, which considers the prior horizon information and directly predicts the differential values between adjacent points.

3 Methodology

FTP task can be formulated as Multivariate Time Series (MTS) forecasting problem. Formally, given a sequence of historical observations 𝐗={𝐱1,,𝐱L}C×L𝐗subscript𝐱1subscript𝐱𝐿superscript𝐶𝐿\mathbf{X}=\left\{\mathbf{x}_{1},...,\mathbf{x}_{L}\right\}\in\mathbb{R}^{C% \times L}bold_X = { bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_x start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_L end_POSTSUPERSCRIPT, where C𝐶Citalic_C is the state dimension, L𝐿Litalic_L is the look-back window size and 𝐱tC×1subscript𝐱𝑡superscript𝐶1\mathbf{x}_{t}\in\mathbb{R}^{C\times 1}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × 1 end_POSTSUPERSCRIPT denotes the flight state at time step t𝑡titalic_t. The task is to predict future T𝑇Titalic_T time steps 𝐘^={𝐱^L+1,,𝐱^L+T}C×T^𝐘subscript^𝐱𝐿1subscript^𝐱𝐿𝑇superscriptsuperscript𝐶𝑇\hat{\mathbf{Y}}=\left\{\hat{\mathbf{x}}_{L+1},...,\hat{\mathbf{x}}_{L+T}% \right\}\in\mathbb{R}^{C^{\prime}\times T}over^ start_ARG bold_Y end_ARG = { over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_L + 1 end_POSTSUBSCRIPT , … , over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_L + italic_T end_POSTSUBSCRIPT } ∈ blackboard_R start_POSTSUPERSCRIPT italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT × italic_T end_POSTSUPERSCRIPT, where Csuperscript𝐶C^{\prime}italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is the predicted state dimension. Specifically, in this work, the flight state 𝐱tsubscript𝐱𝑡\mathbf{x}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT represents longitude, latitude, altitude, and velocities along the previous three dimensions, i.e.,𝐱t=(Lont,Latt,Altt,Vxt,Vyt,Vzt)subscript𝐱𝑡superscript𝐿𝑜subscript𝑛𝑡𝐿𝑎subscript𝑡𝑡𝐴𝑙subscript𝑡𝑡𝑉subscript𝑥𝑡𝑉subscript𝑦𝑡𝑉subscript𝑧𝑡top\mathbf{x}_{t}=(Lon_{t},Lat_{t},Alt_{t},Vx_{t},Vy_{t},Vz_{t})^{\top}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_L italic_o italic_n start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_L italic_a italic_t start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_A italic_l italic_t start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_V italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_V italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_V italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

Refer to caption
Figure 2: FlightPatchNet architecture. (a) Global Temporal Embedding to explore the correlations between different time steps. (b) Multi-Scale Patch Network to capture inter- and intra-patch dependencies under different time scales and integrate temporal features across scales and variables. (c) Predictors to exploit complementary temporal features and make direct multi-step predictions.

The overall architecture of FlightPatchNet is shown in Figure 2, which consists of Global Temporal Embedding, Multi-Scale Patch Network, and Predictors. Global Temporal Embedding first utilizes differential coding to transform the original values of longitude and latitude into first-order differences and embeds all variables of the same time step into temporal tokens. A global temporal attention is then introduced to capture the inherent dependencies between different tokens. Multi-Scale Patch Network is proposed to serve as the backbone which is composed of stacked patch mixer blocks and a multi-scale aggregator. Stacked patch mixer blocks divide trajectory series into patches of different sizes from large scale to small scale. Based on divided patches, each patch mixer block exploits a patch encoder and decoder to capture inter- and intra-patch dependencies, endowing our model with powerful temporal modeling capability. To further integrate multi-scale temporal patterns, a multi-scale aggregator is incorporated into the network to capture scale-wise correlations and inter-variable relationships. Predictors provide direct multi-step trajectory forecasting and each predictor is a fully connected network. All the predictor results are aggregated to generate the final prediction.

3.1 Global Temporal Embedding

Differential Coding

In the context of WGS84 Coordinate System, the longitude and latitude are limited to the intervals [0,±180]0plus-or-minussuperscript180[0,\pm 180^{\circ}][ 0 , ± 180 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ] and [0,±90]0plus-or-minussuperscript90[0,\pm 90^{\circ}][ 0 , ± 90 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ] respectively, while the altitude can span from 0 up to tens of thousands of meters. The significant differences of data range caused by physical units may impair the trajectory prediction performance. Generally, normalization algorithms are applied to address this issue. However, the normalized prediction errors should be transformed into raw data range to evaluate the actual performance. For example, if the absolute prediction error of longitude is 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT after using Min-Max normalization algorithm, the actual prediction error is 0.036° (approximate 4000 meters), which is unacceptable for FTP task. Moreover, as shown in Figure 1, the original series of longitude and latitude are over-smoothing and only reflect the overall flight trend over a period. If temporal patterns are learned from the original values of longitude and latitude, the model is failed to explore the implicit semantic information and cannot focus on short-term temporal variations in flight trajectories.

To address the above issues, we utilize first-order differences for longitude and latitude while original values for other variables, then the differential values are transformed into meters. This process can be formulated as:

{Δ𝐿𝑜𝑛=(𝐿𝑜𝑛t𝐿𝑜𝑛t1)×πR180cosθ(m)Δ𝐿𝑎𝑡=(𝐿𝑎𝑡t𝐿𝑎𝑡t1)×πR180(m)\left\{\begin{aligned} \Delta_{\mathit{Lon}}&=(\mathit{Lon}_{t}-\mathit{Lon}_{% t-1})\times\frac{\pi R}{180}cos\theta(m)\\ \Delta_{\mathit{Lat}}&=(\mathit{Lat}_{t}-\mathit{Lat}_{t-1})\times\frac{\pi R}% {180}(m)\\ \end{aligned}\right.{ start_ROW start_CELL roman_Δ start_POSTSUBSCRIPT italic_Lon end_POSTSUBSCRIPT end_CELL start_CELL = ( italic_Lon start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_Lon start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) × divide start_ARG italic_π italic_R end_ARG start_ARG 180 end_ARG italic_c italic_o italic_s italic_θ ( italic_m ) end_CELL end_ROW start_ROW start_CELL roman_Δ start_POSTSUBSCRIPT italic_Lat end_POSTSUBSCRIPT end_CELL start_CELL = ( italic_Lat start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_Lat start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) × divide start_ARG italic_π italic_R end_ARG start_ARG 180 end_ARG ( italic_m ) end_CELL end_ROW (1)

where θ𝜃\thetaitalic_θ is the latitude at the time step t𝑡titalic_t and R𝑅Ritalic_R is the radius of the earth. By using differential coding for longitude and latitude, the differences in data range are effectively reduced. For example, in our dataset, the range of latitude in original data is about [46,70]superscript46superscript70[-46^{\circ},70^{\circ}][ - 46 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , 70 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ] and that in differential data is about [3860m,3860m]3860𝑚3860𝑚[-3860m,3860m][ - 3860 italic_m , 3860 italic_m ], which spans a similar data range as the altitude. Besides, compared to the original sequences, the differential series can explicitly reflect the underlying temporal variations, which is essential for short-term temporal modeling. Note that we utilize the original values of altitude as inputs rather than differential values. One important reason is that altitude is more susceptible to noise, failing to reflect the actual temporal variations. To this end, the flight state at time step t𝑡titalic_t becomes 𝐱t=(Δ𝐿𝑜𝑛,Δ𝐿𝑎𝑡,Altt,Vxt,Vyt,Vzt)subscript𝐱𝑡superscriptsubscriptΔ𝐿𝑜𝑛subscriptΔ𝐿𝑎𝑡𝐴𝑙subscript𝑡𝑡𝑉subscript𝑥𝑡𝑉subscript𝑦𝑡𝑉subscript𝑧𝑡top\mathbf{x}_{t}=(\Delta_{\mathit{Lon}},\Delta_{\mathit{Lat}},Alt_{t},Vx_{t},Vy_% {t},Vz_{t})^{\top}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( roman_Δ start_POSTSUBSCRIPT italic_Lon end_POSTSUBSCRIPT , roman_Δ start_POSTSUBSCRIPT italic_Lat end_POSTSUBSCRIPT , italic_A italic_l italic_t start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_V italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_V italic_y start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_V italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

Global Temporal Attention

Given the trajectory series 𝐗C×L𝐗superscript𝐶𝐿\mathbf{X}\in\mathbb{R}^{C\times L}bold_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_L end_POSTSUPERSCRIPT, we first project flight state at each time step into d𝑑ditalic_d dimension to generate temporal embeddings 𝐓0L×dsuperscript𝐓0superscript𝐿𝑑\mathbf{T}^{0}\in\mathbb{R}^{L\times d}bold_T start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_L × italic_d end_POSTSUPERSCRIPT. Then, we apply multi-head self-attention (MSA) Vaswani2017AttentionIA on the dimension L𝐿Litalic_L to capture the dependencies across all time steps. After attention, the embedding at each time step is enriched with temporal information from other time steps. This process is formulated as:

𝐓0superscript𝐓0\displaystyle\mathbf{T}^{0}bold_T start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT =𝑇𝑖𝑚𝑒𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔(𝐗)absent𝑇𝑖𝑚𝑒𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔superscript𝐗top\displaystyle=\mathit{TimeEmbedding}(\mathbf{X}^{\top})= italic_TimeEmbedding ( bold_X start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) (2)
𝐓isuperscript𝐓𝑖\displaystyle\mathbf{T}^{i}bold_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =𝐿𝑎𝑦𝑒𝑟𝑁𝑜𝑟𝑚(𝐓i1+𝑀𝑆𝐴(𝐓i1),i=1,,l\displaystyle=\mathit{LayerNorm}(\mathbf{T}^{i-1}+\mathit{MSA}(\mathbf{T}^{i-1% }),i=1,\dots,l= italic_LayerNorm ( bold_T start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT + italic_MSA ( bold_T start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ) , italic_i = 1 , … , italic_l
𝐓isuperscript𝐓𝑖\displaystyle\mathbf{T}^{i}bold_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =𝐿𝑎𝑦𝑒𝑟𝑁𝑜𝑟𝑚(𝐓i+FC(𝐓i),i=1,,l\displaystyle=\mathit{LayerNorm}(\mathbf{T}^{i}+FC(\mathbf{T}^{i}),i=1,\dots,l= italic_LayerNorm ( bold_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_F italic_C ( bold_T start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , italic_i = 1 , … , italic_l
𝐙𝐙\displaystyle\mathbf{Z}bold_Z =(𝐿𝑖𝑛𝑒𝑎𝑟(𝐓l))absentsuperscript𝐿𝑖𝑛𝑒𝑎𝑟superscript𝐓𝑙top\displaystyle={(\mathit{Linear}(\mathbf{T}^{l}}))^{\top}= ( italic_Linear ( bold_T start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT

where l𝑙litalic_l is the number of attention layers, LayerNorm𝐿𝑎𝑦𝑒𝑟𝑁𝑜𝑟𝑚LayerNormitalic_L italic_a italic_y italic_e italic_r italic_N italic_o italic_r italic_m denotes the layer normalization ba2016layer which has been widely adopted to address non-stationary issues, MSA𝑀𝑆𝐴MSAitalic_M italic_S italic_A is the multi-head self-attention layer, FC𝐹𝐶FCitalic_F italic_C denotes a fully-connected layer and Linear𝐿𝑖𝑛𝑒𝑎𝑟Linearitalic_L italic_i italic_n italic_e italic_a italic_r projects the embedding of each time step to dimension C𝐶Citalic_C, i.e., dCsuperscript𝑑superscript𝐶\mathbb{R}^{d}\rightarrow\mathbb{R}^{C}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT.

3.2 Multi-Scale Patch Network

Considering different temporal patterns prefer diverse time scales, the multi-scale patch network first utilizes a stack of K𝐾Kitalic_K patch mixer blocks to capture underlying temporal patterns from large scale to small scale. A large time scale can reflect the slow-varying flight trends, while a smaller scale can retain fine-grained local details. To further promote the collaboration of diverse temporal features, a multi-scale aggregator is introduced to consider the contributed scales and dominant variables. Such a multi-scale network equips our model with the powerful and complete temporal modeling capability, and helps preserve all kinds of multi-scale characteristics.

3.2.1 Patch Mixer Block

Patching

Only considering one single time step is insufficient for FTP task, since it contains limited semantic information and cannot accurately reflect the flight trajectory variations. Inspired by 2021An ; Yuqietal-2023-PatchTST , the trajectory representation 𝐙C×L𝐙superscript𝐶𝐿\mathbf{Z}\in\mathbb{R}^{C\times L}bold_Z ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_L end_POSTSUPERSCRIPT is segmented into several non-overlap** patches along the temporal dimension, generating a sequence of patches 𝐙pC×P×Nsubscript𝐙𝑝superscript𝐶𝑃𝑁\mathbf{Z}_{p}\in\mathbb{R}^{C\times P\times N}bold_Z start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_P × italic_N end_POSTSUPERSCRIPT, where P𝑃Pitalic_P is the length of each patch, N𝑁Nitalic_N represents the number of patches, and N=LP𝑁𝐿𝑃N=\left\lceil\frac{L}{P}\right\rceilitalic_N = ⌈ divide start_ARG italic_L end_ARG start_ARG italic_P end_ARG ⌉. The patching process is formulated as:

𝐙p=Reshape(ZeroPadding(𝐙))subscript𝐙𝑝𝑅𝑒𝑠𝑎𝑝𝑒𝑍𝑒𝑟𝑜𝑃𝑎𝑑𝑑𝑖𝑛𝑔𝐙{\mathbf{Z}}_{p}={Reshape}({ZeroPadding}(\mathbf{Z}))bold_Z start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = italic_R italic_e italic_s italic_h italic_a italic_p italic_e ( italic_Z italic_e italic_r italic_o italic_P italic_a italic_d italic_d italic_i italic_n italic_g ( bold_Z ) ) (3)

where ZeroPadding()𝑍𝑒𝑟𝑜𝑃𝑎𝑑𝑑𝑖𝑛𝑔ZeroPadding(\cdot)italic_Z italic_e italic_r italic_o italic_P italic_a italic_d italic_d italic_i italic_n italic_g ( ⋅ ) refers to padding series with zeros in the beginning to ensure the length is divisible by P𝑃Pitalic_P.

Patch Encoder-Decoder

Based on the divided patches 𝐙psubscript𝐙𝑝\mathbf{Z}_{p}bold_Z start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT, we utilize a patch encoder and decoder to capture temporal features in flight trajectories. Specifically, the patch encoder aims to capture the inter-patch features (i.e., the global correlations across patches) and intra-patch features (i.e., the local details within patches). After that, these features are reconstructed to the original dimension by the patch decoder. Due to the superiority of linear models for MTS chen2023tsmixer ; zeng2023transformers , the patch encoders and decoders are based on pure multi-layer perceptron (MLP) for temporal modeling.

Refer to caption
Figure 3: The structure of Patch Mixer Block

As illustrated in Figure 3, a patch encoder consists of an inter-patch MLP, an intra-patch MLP, and a linear projection. Each MLP has two fully connected layers, a GELU non-linearity layer and a dropout layer with a residual connection.

Given the patch-divided series 𝐙psubscript𝐙𝑝\mathbf{Z}_{p}bold_Z start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT , an inter-patch MLP performs on the dimension N𝑁Nitalic_N to capture the dependencies between different patches, which maps NNsuperscript𝑁superscript𝑁\mathbb{R}^{N}\rightarrow\mathbb{R}^{N}blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT to obtain the inter-patch mixed representation 𝐍interC×P×Nsubscript𝐍𝑖𝑛𝑡𝑒𝑟superscript𝐶𝑃𝑁\mathbf{N}_{inter}\in\mathbb{R}^{C\times P\times N}bold_N start_POSTSUBSCRIPT italic_i italic_n italic_t italic_e italic_r end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_P × italic_N end_POSTSUPERSCRIPT:

𝐍inter=𝐙p+Dropout(FC(σ(FC(𝐙p))))subscript𝐍𝑖𝑛𝑡𝑒𝑟subscript𝐙𝑝𝐷𝑟𝑜𝑝𝑜𝑢𝑡𝐹𝐶𝜎𝐹𝐶subscript𝐙𝑝\mathbf{N}_{inter}=\mathbf{Z}_{p}+Dropout(FC(\sigma(FC(\mathbf{Z}_{p}))))bold_N start_POSTSUBSCRIPT italic_i italic_n italic_t italic_e italic_r end_POSTSUBSCRIPT = bold_Z start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT + italic_D italic_r italic_o italic_p italic_o italic_u italic_t ( italic_F italic_C ( italic_σ ( italic_F italic_C ( bold_Z start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT ) ) ) ) (4)

where σ𝜎\sigmaitalic_σ denotes a GELU non-linearity layer, Dropout𝐷𝑟𝑜𝑝𝑜𝑢𝑡Dropoutitalic_D italic_r italic_o italic_p italic_o italic_u italic_t denotes a dropout layer and 𝐍intersubscript𝐍𝑖𝑛𝑡𝑒𝑟\mathbf{N}_{inter}bold_N start_POSTSUBSCRIPT italic_i italic_n italic_t italic_e italic_r end_POSTSUBSCRIPT reflects the global correlations across patches. After that, an intra-patch MLP performs on the dimension P𝑃Pitalic_P to capture the dependencies across different time steps within patches, which maps PPsuperscript𝑃superscript𝑃\mathbb{R}^{P}\rightarrow\mathbb{R}^{P}blackboard_R start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_P end_POSTSUPERSCRIPT to obtain the intra-patch mixed representation 𝐍intraC×N×Psubscript𝐍𝑖𝑛𝑡𝑟𝑎superscript𝐶𝑁𝑃\mathbf{N}_{intra}\in\mathbb{R}^{C\times N\times P}bold_N start_POSTSUBSCRIPT italic_i italic_n italic_t italic_r italic_a end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_N × italic_P end_POSTSUPERSCRIPT:

𝐍intra=𝐍inter+Dropout(FC(σ(FC(𝐍inter))))subscript𝐍𝑖𝑛𝑡𝑟𝑎superscriptsubscript𝐍𝑖𝑛𝑡𝑒𝑟top𝐷𝑟𝑜𝑝𝑜𝑢𝑡𝐹𝐶𝜎𝐹𝐶superscriptsubscript𝐍𝑖𝑛𝑡𝑒𝑟top\mathbf{N}_{intra}=\mathbf{N}_{inter}^{\top}+Dropout(FC(\sigma(FC(\mathbf{N}_{% inter}^{\top}))))bold_N start_POSTSUBSCRIPT italic_i italic_n italic_t italic_r italic_a end_POSTSUBSCRIPT = bold_N start_POSTSUBSCRIPT italic_i italic_n italic_t italic_e italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_D italic_r italic_o italic_p italic_o italic_u italic_t ( italic_F italic_C ( italic_σ ( italic_F italic_C ( bold_N start_POSTSUBSCRIPT italic_i italic_n italic_t italic_e italic_r end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ) ) ) (5)

where 𝐍intrasubscript𝐍𝑖𝑛𝑡𝑟𝑎\mathbf{N}_{intra}bold_N start_POSTSUBSCRIPT italic_i italic_n italic_t italic_r italic_a end_POSTSUBSCRIPT reflects the local details between different time steps within patches. Then, we perform a linear projection on 𝐍intrasuperscriptsubscript𝐍𝑖𝑛𝑡𝑟𝑎top\mathbf{N}_{intra}^{\top}bold_N start_POSTSUBSCRIPT italic_i italic_n italic_t italic_r italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT to obtain the final inter- and intra-patch mixed representation 𝐄𝐄\mathbf{E}bold_E C×P×1absentsuperscript𝐶𝑃1\in\mathbb{R}^{C\times P\times 1}∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_P × 1 end_POSTSUPERSCRIPT:

𝐄=𝐿𝑖𝑛𝑒𝑎𝑟(𝐍intra)𝐄𝐿𝑖𝑛𝑒𝑎𝑟superscriptsubscript𝐍𝑖𝑛𝑡𝑟𝑎top\mathbf{E}=\mathit{Linear}(\mathbf{N}_{intra}^{\top})bold_E = italic_Linear ( bold_N start_POSTSUBSCRIPT italic_i italic_n italic_t italic_r italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) (6)

After such patch encoding process, the correlations between nearby time steps within patches and distant time steps across patches are finely explored. Then, we utilize a patch decoder to reconstruct the original sequence. A patch decoder comprises the same components as the encoder in a reverse order, which is formulated as follows:

𝐃𝐃\displaystyle\mathbf{D}bold_D =Linear(𝐄)absent𝐿𝑖𝑛𝑒𝑎𝑟𝐄\displaystyle=Linear(\mathbf{E})= italic_L italic_i italic_n italic_e italic_a italic_r ( bold_E ) (7)
𝐏intrasubscript𝐏𝑖𝑛𝑡𝑟𝑎\displaystyle\mathbf{P}_{intra}bold_P start_POSTSUBSCRIPT italic_i italic_n italic_t italic_r italic_a end_POSTSUBSCRIPT =𝐃+Dropout(FC(σ(FC(𝐃))))absentsuperscript𝐃top𝐷𝑟𝑜𝑝𝑜𝑢𝑡𝐹𝐶𝜎𝐹𝐶superscript𝐃top\displaystyle=\mathbf{D}^{\top}+Dropout(FC(\sigma(FC(\mathbf{D}^{\top}))))= bold_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_D italic_r italic_o italic_p italic_o italic_u italic_t ( italic_F italic_C ( italic_σ ( italic_F italic_C ( bold_D start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ) ) )
𝐏𝐏\displaystyle\mathbf{P}bold_P =𝐏intra+Dropout(FC(σ(FC(𝐏intra))))absentsuperscriptsubscript𝐏𝑖𝑛𝑡𝑟𝑎top𝐷𝑟𝑜𝑝𝑜𝑢𝑡𝐹𝐶𝜎𝐹𝐶superscriptsubscript𝐏𝑖𝑛𝑡𝑟𝑎top\displaystyle=\mathbf{P}_{intra}^{\top}+Dropout(FC(\sigma(FC(\mathbf{P}_{intra% }^{\top}))))= bold_P start_POSTSUBSCRIPT italic_i italic_n italic_t italic_r italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT + italic_D italic_r italic_o italic_p italic_o italic_u italic_t ( italic_F italic_C ( italic_σ ( italic_F italic_C ( bold_P start_POSTSUBSCRIPT italic_i italic_n italic_t italic_r italic_a end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ) ) )

where Linear𝐿𝑖𝑛𝑒𝑎𝑟Linearitalic_L italic_i italic_n italic_e italic_a italic_r makes a dimensional projection to obtain 𝐃C×P×N𝐃superscript𝐶𝑃𝑁\mathbf{D}\in\mathbb{R}^{C\times P\times N}bold_D ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_P × italic_N end_POSTSUPERSCRIPT for reconstructing the original sequence, 𝐏intraC×N×Psubscript𝐏𝑖𝑛𝑡𝑟𝑎superscript𝐶𝑁𝑃\mathbf{P}_{intra}\in\mathbb{R}^{C\times N\times P}bold_P start_POSTSUBSCRIPT italic_i italic_n italic_t italic_r italic_a end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_N × italic_P end_POSTSUPERSCRIPT is the reconstructed intra-patch mixed representation, and 𝐏C×P×N𝐏superscript𝐶𝑃𝑁\mathbf{P}\in\mathbb{R}^{C\times P\times N}bold_P ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_P × italic_N end_POSTSUPERSCRIPT is the final reconstructed intra- and inter-patch mixed representation.

3.2.2 Multi-Scale Aggregator

To enable the ability of more complete multi-scale modeling, we introduce a multi-scale aggregator to integrate different temporal patterns. It contains two components: scale fusion and channel fusion. Scale fusion can figure out critical time scales and capture the scale-wise correlations, while channel fusion can discover dominant variables effecting temporal variations and explore the inter-variable relationships. These two components work together to help the model learn a robust multi-scale representation and improve generalization ability.

Given the K𝐾Kitalic_K scale-specific temporal representations {𝐏1,𝐏2,,𝐏K}subscript𝐏1subscript𝐏2subscript𝐏𝐾\{\mathbf{P}_{1},\mathbf{P}_{2},\dots,\mathbf{P}_{K}\}{ bold_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_P start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT }, we first stack them and rearrange the data to combine the three dimensions of channel size C𝐶Citalic_C, patch size P𝑃Pitalic_P and patch quantity N𝑁Nitalic_N, resulting in 𝐒0K×(C×L)superscript𝐒0superscript𝐾𝐶𝐿\mathbf{S}^{0}\in\mathbb{R}^{K\times(C\times L)}bold_S start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_K × ( italic_C × italic_L ) end_POSTSUPERSCRIPT, where L=P×N𝐿𝑃𝑁L=P\times Nitalic_L = italic_P × italic_N. Then we apply MSA on the scale dimension K𝐾Kitalic_K to learn the importance of contributed time scales. This process is formulated as:

𝐒0superscript𝐒0\displaystyle\mathbf{S}^{0}bold_S start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT =Reshape(Stack(𝐏1,𝐏2,,𝐏K))absent𝑅𝑒𝑠𝑎𝑝𝑒𝑆𝑡𝑎𝑐𝑘subscript𝐏1subscript𝐏2subscript𝐏𝐾\displaystyle=Reshape(Stack(\mathbf{P}_{1},\mathbf{P}_{2},\dots,\mathbf{P}_{K}))= italic_R italic_e italic_s italic_h italic_a italic_p italic_e ( italic_S italic_t italic_a italic_c italic_k ( bold_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , bold_P start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , bold_P start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) ) (8)
𝐒isuperscript𝐒𝑖\displaystyle\mathbf{S}^{i}bold_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =LayerNorm(𝐒i1+MSA(𝐒i1),i=1,,l\displaystyle=LayerNorm(\mathbf{S}^{i-1}+MSA(\mathbf{S}^{i-1}),i=1,\dots,l= italic_L italic_a italic_y italic_e italic_r italic_N italic_o italic_r italic_m ( bold_S start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT + italic_M italic_S italic_A ( bold_S start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ) , italic_i = 1 , … , italic_l
𝐒isuperscript𝐒𝑖\displaystyle\mathbf{S}^{i}bold_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =LayerNorm(𝐒i+FC(𝐒i),i=1,,l\displaystyle=LayerNorm(\mathbf{S}^{i}+FC(\mathbf{S}^{i}),i=1,\dots,l= italic_L italic_a italic_y italic_e italic_r italic_N italic_o italic_r italic_m ( bold_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_F italic_C ( bold_S start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , italic_i = 1 , … , italic_l

where 𝐒lsuperscript𝐒𝑙\mathbf{S}^{l}bold_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT is the final multi-scale fusion representation within variables. Inspired by liu2023itransformer , we consider each variable as a token and apply MSA to explore dependencies between different variables. We first reshape the 𝐒lsuperscript𝐒𝑙\mathbf{S}^{l}bold_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT to get 𝐂0superscript𝐂0\mathbf{C}^{0}bold_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT C×(K×L)absentsuperscript𝐶𝐾𝐿\in\mathbb{R}^{C\times(K\times L)}∈ blackboard_R start_POSTSUPERSCRIPT italic_C × ( italic_K × italic_L ) end_POSTSUPERSCRIPT and perform multi-head self-attention on the channel dimension C𝐶Citalic_C to identify dominant variables. This process is simply formulated as follows:

𝐂0superscript𝐂0\displaystyle\mathbf{C}^{0}bold_C start_POSTSUPERSCRIPT 0 end_POSTSUPERSCRIPT =Reshape(𝐒l)absent𝑅𝑒𝑠𝑎𝑝𝑒superscript𝐒𝑙\displaystyle=Reshape(\mathbf{S}^{l})= italic_R italic_e italic_s italic_h italic_a italic_p italic_e ( bold_S start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT ) (9)
𝐂isuperscript𝐂𝑖\displaystyle\mathbf{C}^{i}bold_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =LayerNorm(𝐂i1+𝑀𝑆𝐴(𝐂i1)),i=1,,lformulae-sequenceabsent𝐿𝑎𝑦𝑒𝑟𝑁𝑜𝑟𝑚superscript𝐂𝑖1𝑀𝑆𝐴superscript𝐂𝑖1𝑖1𝑙\displaystyle=LayerNorm(\mathbf{C}^{i-1}+\mathit{MSA}(\mathbf{C}^{i-1})),i=1,% \dots,l= italic_L italic_a italic_y italic_e italic_r italic_N italic_o italic_r italic_m ( bold_C start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT + italic_MSA ( bold_C start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ) ) , italic_i = 1 , … , italic_l
𝐂isuperscript𝐂𝑖\displaystyle\mathbf{C}^{i}bold_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT =𝐿𝑎𝑦𝑒𝑟𝑁𝑜𝑟𝑚(𝐂i+FC(𝐂i),i=1,,l\displaystyle=\mathit{LayerNorm}(\mathbf{C}^{i}+FC(\mathbf{C}^{i}),i=1,\dots,l= italic_LayerNorm ( bold_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT + italic_F italic_C ( bold_C start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT ) , italic_i = 1 , … , italic_l
𝐇𝐇\displaystyle\mathbf{H}bold_H =Reshape(𝐂l)absent𝑅𝑒𝑠𝑎𝑝𝑒superscript𝐂𝑙\displaystyle=Reshape(\mathbf{C}^{l})= italic_R italic_e italic_s italic_h italic_a italic_p italic_e ( bold_C start_POSTSUPERSCRIPT italic_l end_POSTSUPERSCRIPT )

where 𝐇C×L×K𝐇superscript𝐶𝐿𝐾\mathbf{H}\in\mathbb{R}^{C\times L\times K}bold_H ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_L × italic_K end_POSTSUPERSCRIPT is the final multi-scale representation which involves cross-scale correlations and inter-variable relationships.

3.3 Direct Multi-Step Prediction

We ensembles K𝐾Kitalic_K predictors to directly obtain the future flight trajectory series, which can exploit complementary information from different temporal patterns. The objective of our model is to predict the differential values of longitude and latitude relative to the last observation, and the raw absolute values of altitude, i.e., 𝐘^={𝐱^L+1,,𝐱^L+T}^𝐘subscript^𝐱𝐿1subscript^𝐱𝐿𝑇\hat{\mathbf{Y}}=\left\{\hat{\mathbf{x}}_{L+1},...,\hat{\mathbf{x}}_{L+T}\right\}over^ start_ARG bold_Y end_ARG = { over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_L + 1 end_POSTSUBSCRIPT , … , over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_L + italic_T end_POSTSUBSCRIPT }, where 𝐱^L+i=(Δ^𝐿𝑜𝑛(L+i,L),Δ^𝐿𝑎𝑡(L+i,L),Alt^L+i)subscript^𝐱𝐿𝑖superscriptsuperscript^Δ𝐿𝑜𝑛𝐿𝑖𝐿superscript^Δ𝐿𝑎𝑡𝐿𝑖𝐿subscript^𝐴𝑙𝑡𝐿𝑖top\hat{\mathbf{x}}_{L+i}=(\hat{\Delta}^{\mathit{Lon}}(L+i,L),\hat{\Delta}^{% \mathit{Lat}}(L+i,L),\hat{Alt}_{L+i})^{\top}over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT italic_L + italic_i end_POSTSUBSCRIPT = ( over^ start_ARG roman_Δ end_ARG start_POSTSUPERSCRIPT italic_Lon end_POSTSUPERSCRIPT ( italic_L + italic_i , italic_L ) , over^ start_ARG roman_Δ end_ARG start_POSTSUPERSCRIPT italic_Lat end_POSTSUPERSCRIPT ( italic_L + italic_i , italic_L ) , over^ start_ARG italic_A italic_l italic_t end_ARG start_POSTSUBSCRIPT italic_L + italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT for i=1,,T𝑖1𝑇i=1,\dots,Titalic_i = 1 , … , italic_T. We split the final multi-scale representation 𝐇C×L×K𝐇superscript𝐶𝐿𝐾\mathbf{H}\in\mathbb{R}^{C\times L\times K}bold_H ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_L × italic_K end_POSTSUPERSCRIPT into a sequence {𝐇,1,𝐇,2,,𝐇,K}subscript𝐇1subscript𝐇2subscript𝐇𝐾\left\{\mathbf{H}_{*,1},\mathbf{H}_{*,2},\dots,\mathbf{H}_{*,K}\right\}{ bold_H start_POSTSUBSCRIPT ∗ , 1 end_POSTSUBSCRIPT , bold_H start_POSTSUBSCRIPT ∗ , 2 end_POSTSUBSCRIPT , … , bold_H start_POSTSUBSCRIPT ∗ , italic_K end_POSTSUBSCRIPT }, where 𝐇,iC×Lsubscript𝐇𝑖superscript𝐶𝐿\mathbf{H}_{*,i}\in\mathbb{R}^{C\times L}bold_H start_POSTSUBSCRIPT ∗ , italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_C × italic_L end_POSTSUPERSCRIPT for i=1,,K𝑖1𝐾i=1,\dots,Kitalic_i = 1 , … , italic_K, and feed each 𝐇,isubscript𝐇𝑖\mathbf{H}_{*,i}bold_H start_POSTSUBSCRIPT ∗ , italic_i end_POSTSUBSCRIPT to a predictor. Each predictor has two MLPs. The first MLPCi𝑀𝐿subscript𝑃subscript𝐶𝑖MLP_{C_{i}}italic_M italic_L italic_P start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT transforms the input channel C𝐶Citalic_C into the output channel Csuperscript𝐶C^{\prime}italic_C start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and the second MLPTi𝑀𝐿subscript𝑃subscript𝑇𝑖MLP_{T_{i}}italic_M italic_L italic_P start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT projects the historical input sequence L𝐿Litalic_L to the prediction horizon T𝑇Titalic_T.

𝐘i^=^subscript𝐘𝑖absent\displaystyle\hat{\mathbf{Y}_{i}}=over^ start_ARG bold_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = MLPTi(MLPCi(𝐇,i))𝑀𝐿subscript𝑃subscript𝑇𝑖𝑀𝐿subscript𝑃subscript𝐶𝑖subscript𝐇𝑖\displaystyle MLP_{T_{i}}(MLP_{C_{i}}(\mathbf{H}_{*,i}))italic_M italic_L italic_P start_POSTSUBSCRIPT italic_T start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_M italic_L italic_P start_POSTSUBSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( bold_H start_POSTSUBSCRIPT ∗ , italic_i end_POSTSUBSCRIPT ) ) (10)
𝐘^=^𝐘absent\displaystyle\hat{\mathbf{Y}}=over^ start_ARG bold_Y end_ARG = i=1K𝐘i^superscriptsubscript𝑖1𝐾^subscript𝐘𝑖\displaystyle\sum\limits_{i=1}^{K}\hat{\mathbf{Y}_{i}}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT over^ start_ARG bold_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG

Finally, all the predictor results are aggregated to generate the final prediction, which can enhance the stability and generalization of our model.

4 Experiments

4.1 Dataset and Experimental Setup

Datasets

To evaluate the performance of FlightPatchNet, we conduct extensive experiments on ADS-B data provided by OpenSky 111https://opensky-network.org/datasets/states/ from 2020 to 2022. In this paper, six key attributes are extracted from the original data, including longitude, latitude, altitude, and velocity in x, y, z dimensions. The dataset is divided into three parts for training, validation, and testing with a ratio of 8:1:1.

Baselines and Setup

We compare our model with five competitive models, including four IMS-based models: LSTM LSTM8489734 , Bi-LSTM Sahadevan , CNN-LSTM CNN-LSTM9145522 , FlightBERT Guo2023FlightBERT ; one DMS-based model: FlightBERT++ Guo2023FlightBERT++ . These models have covered mainstream deep learning architectures, including Transformer(FlightBERT, FlightBERT++), CNN(CNN-LSTM), and RNN(LSTM, Bi-LSTM, CNN-LSTM), which help to provide a comprehensive comparison. For fairness, all the models are following the same experimental setup with lookback window L=60𝐿60L=60italic_L = 60 and prediction horizon T{1,3,9,15}𝑇13915T\in\{1,3,9,15\}italic_T ∈ { 1 , 3 , 9 , 15 }. Our model is trained with MSE loss, using the Adam optimizer kingma2014adam . We adopt the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) as evaluation metrics. More details about dataset, baselines, implementation and hyper-parameters are shown in Appendix A.

4.2 Main results

Comprehensive flight prediction results are demonstrated in Table 1 (see Appendix B.2 for error bar). FlightPatchNet achieves the most outstanding performance across various prediction lengths for longitude and latitude in terms of both MAE and RMSE, while does not achieve the optimal results for altitude compared with other strong baselines such as FlightBERT++. For simplification, we consider prediction horizon T=15𝑇15T=15italic_T = 15 and compare our model with the second best. FlightPatchNet achieves an overall 18.62% reduction on MAE and 49.63% reduction on RMSE for longitude, and 35.31% reduction on MAE and 44.80% reduction on RMSE for latitude. For the prediction performance of altitude, FlightBERT++ outperforms our model by 45.51 meters reduction on MAE but has a large RMSE which may caused by high-bit errors in the prediction. FlightPatchNet obtains the smallest RMSE results for all variables, indicating that our model can provide a more robust and stable prediction. Furthermore, as the prediction horizon increases, DMS-based models greatly outperform IMS-based models which suffer from serious performance degradation due to error accumulation. Note that longitude, latitude and altitude are combined together to determine the position of an aircraft. Poor prediction performance on any variable is intolerable and meaningless for short-term FTP task. Thus, FlightPatchNet achieves the most competitive performance in general.

Table 1: Flight trajectory prediction results. A lower MAE or RMSE represents a better prediction. The prediction horizon T{1,3,9,15}𝑇13915T\in\left\{1,3,9,15\right\}italic_T ∈ { 1 , 3 , 9 , 15 } and look-back window size L=60𝐿60L=60italic_L = 60 for all experiments. The best results are highlighted in bold and the second best are underlined. Note that 0.00001superscript0.000010.00001^{\circ}0.00001 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT is about 1m.
Model Metric Lon() Lat() Alt(m)
1 3 9 15 1 3 9 15 1 3 9 15
IMS LSTM MAE 0.00056 0.00427 0.01747 0.03132 0.00049 0.00493 0.02116 0.03717 92.27 159.30 549.55 882.86
RMSE 0.00095 0.00691 0.02597 0.04578 0.00089 0.00740 0.02956 0.05143 142.05 233.39 763.84 768.45
Bi-LSTM MAE 0.00155 0.00747 0.02319 0.03890 0.00137 0.00824 0.02711 0.04404 432.50 761.50 1648.68 2006.21
RMSE 0.00202 0.01124 0.03387 0.05532 0.00181 0.01142 0.03639 0.05982 563.74 953.37 2132.91 2420.74
CNN-LSTM MAE 0.00139 0.00700 0.02282 0.04149 0.00131 0.00801 0.02623 0.05139 520.03 746.67 1569.68 1136.80
RMSE 0.00240 0.01033 0.03263 0.05981 0.00212 0.01130 0.03559 0.07353 1176.96 926.40 1936.63 1658.53
FlightBERT MAE 0.00123 0.00241 0.01162 0.02407 0.00088 0.00158 0.00963 0.01238 24.67 35.67 78.58 134.29
RMSE 0.00241 0.00526 0.02189 0.03969 0.00154 0.00286 0.01904 0.03093 234.17 272.59 384.22 462.28
DMS FlightBERT++ MAE 0.00173 0.00317 0.00871 0.01187 0.00085 0.00210 0.00612 0.01048 9.39 21.89 47.84 78.46
RMSE 0.00360 0.00659 0.01846 0.03131 0.00148 0.00425 0.00959 0.02127 175.29 167.16 327.93 384.18
FlightPatchNet (Ours) MAE 0.00048 0.00153 0.00546 0.00966 0.00032 0.00105 0.00381 0.00678 13.34 32.65 78.57 123.97
RMSE 0.00087 0.00233 0.00885 0.01577 0.00064 0.00175 0.00652 0.01174 123.78 121.48 174.63 244.34

4.3 Effectiveness of Differential Coding

The results in Table 2 show that using differential coding for longitude and latitude can significantly improve their prediction performance but slightly decrease the accuracy of altitude. The differential coding can reveal the temporal variations of longitude and latitude, which helps the temporal modeling in flight trajectories. However, the variations of altitude in original series may come from unexpected noise. FlightPatchNet has strong modeling capacity for temporal variations and tends to focus on the noise points during altitude prediction, leading to a large bias towards the ground truth. Further analysis are presented in Appendix C.

Table 2: Flight trajectory prediction results for longitude and latitude in original data and differential data when prediction horizon T=15𝑇15T=15italic_T = 15. The best results are highlighted in bold. Note that altitude and velocities are always in original data.
Models Diff Lon() Lat() Alt(m)
MAE RMSE MAE RMSE MAE RMSE
LSTM \checkmark 0.03132 0.04578 0.03717 0.05143 882.86 1205.78
×\times× 0.82230 1.20424 0.12008 2.44136 768.45 1053.21
Bi-LSTM \checkmark 0.03890 0.05532 0.04404 0.05982 2006.21 2420.74
×\times× 1.71433 2.43607 0.19014 0.27621 2091.19 2665.51
CNN-LSTM \checkmark 0.04149 0.05981 0.05139 0.07353 1136.80 1658.53
×\times× 8.59512 23.07600 1.95957 8.15418 1638.02 2113.49
FlightPatchNet (ours) \checkmark 0.00966 0.01577 0.00678 0.01174 123.97 244.34
×\times× 0.19348 0.26243 0.05385 0.07457 60.63 169.57

4.4 Effectiveness of Multi Scales

Table 3: The flight trajectory prediction results of single scales and multi scales. The best results are highlighted in bold.
Lon() Lat() Alt(m)
Patch Size Metric 1 3 9 15 1 3 9 15 1 3 9 15
2 MAE 0.00050 0.00156 0.00564 0.01038 0.00034 0.00105 0.00388 0.00711 16.50 23.39 93.28 106.63
RMSE 0.00091 0.00248 0.00907 0.01675 0.00065 0.00176 0.00654 0.01218 122.41 105.19 202.30 225.08
6 MAE 0.00052 0.00158 0.00583 0.01027 0.00036 0.00106 0.00402 0.00715 12.72 22.57 120.51 127.15
RMSE 0.00091 0.00251 0.00937 0.01656 0.00067 0.00178 0.00670 0.01227 106.48 106.58 226.89 247.28
10 MAE 0.00049 0.00162 0.00570 0.01019 0.00033 0.00108 0.00382 0.00700 12.16 35.77 107.07 178.91
RMSE 0.00088 0.00256 0.00912 0.01652 0.00064 0.00180 0.00654 0.01201 126.31 101.96 216.02 306.40
20 MAE 0.00054 0.00163 0.00562 0.01032 0.00036 0.00110 0.00391 0.00707 15.58 25.57 95.56 152.48
RMSE 0.00091 0.00257 0.00907 0.01660 0.00066 0.00182 0.00659 0.01211 99.66 106.19 202.99 286.20
30 MAE 0.00050 0.00161 0.00580 0.01065 0.00033 0.00108 0.00399 0.00716 10.35 24.75 75.23 144.55
RMSE 0.00090 0.00256 0.00935 0.01719 0.00065 0.00180 0.00676 0.01220 104.48 99.18 182.47 273.72
30,20,10,6,2 MAE 0.00048 0.00153 0.00546 0.00966 0.00032 0.00105 0.00381 0.00678 13.34 32.65 78.57 123.97
RMSE 0.00087 0.00233 0.00885 0.01577 0.00064 0.00175 0.00652 0.01174 129.65 121.78 174.63 244.34

To investigate the effect of multi-scale modeling, we conduct experiments on single scale for {2,6,10,20,30}. The results in Table 3 illustrate the critical contribution of multi scales to our model. We can observe that different variables prefer distinct time scales. For example, using patch size 10 can obtain the second best prediction performance on longitude and latitude but the worst performance on altitude when prediction horizon T=15𝑇15T=15italic_T = 15. This indicates that longitude, latitude and altitude have distinct temporal patterns, and different scales can extract diverse complementary features, which can be effectively leveraged to obtain competitive and robust prediction results.

4.5 Ablation Study

We conduct ablation studies by removing corresponding modules from FlightPatchNet. Specifically, w/o global temporal attention does not capture the correlations between time steps. w/o scale fusion considers each time scale of equal importance. w/o channel fusion does not explore the relationships between variables. Table 4 shows the contribution of each component.

Table 4: Performance comparisons on ablative variants. The best results are highlighted in bold.
Case Horizon Lon() Lat() Alt(m)
1 3 9 15 1 3 9 15 1 3 9 15
w/o global temporal attention MAE 0.00051 0.00190 0.00667 0.01232 0.00034 0.00132 0.00466 0.00876 20.59 28.15 66.62 112.43
RMSE 0.00090 0.00308 0.01085 0.02005 0.00066 0.00222 0.00791 0.01486 136.79 112.10 163.78 220.83
w/o scale fusion MAE 0.00053 0.00169 0.00609 0.01112 0.00035 0.00114 0.00409 0.00759 24.17 32.70 91.06 162.20
RMSE 0.00092 0.00268 0.00975 0.01787 0.00067 0.00188 0.00688 0.01280 130.05 99.54 194.25 282.07
w/o channel fusion MAE 0.00050 0.00166 0.00573 0.01059 0.00034 0.00112 0.00398 0.00727 20.04 29.22 73.23 131.98
RMSE 0.00089 0.00265 0.00924 0.01707 0.00065 0.00187 0.00667 0.01240 159.47 122.32 174.36 250.07
FlightPatchNet MAE 0.00048 0.00153 0.00546 0.00966 0.00032 0.00105 0.00381 0.00678 13.34 32.65 78.57 123.97
RMSE 0.00087 0.00233 0.00885 0.01577 0.00064 0.00175 0.00652 0.01174 123.78 121.48 174.63 244.34

Removing the global temporal attention dramatically decreases the multi-step prediction performance, demonstrating the necessary of capturing the correlations between different time steps. Scale fusion can effectively improve the prediction accuracy, indicating that different time scales of trajectory series contain rich and diverse temporal variation information. Channel fusion also improves the model performance, suggesting the importance of exploring relationships between different variables in complex temporal modeling.

5 Conclusion

In this paper, we propose FlightPatchNet, a multi-scale patch network with differential coding for short-term FTP task. The differential coding is leveraged to reduce the significant differences in original data range and reflect the temporal variations in realistic flight trajectories. The multi-scale patch network is designed to explore global trends and local details based on divided patches of different sizes, and integrate scale-wise correlations and inter-variable relationships for complete temporal modeling. Extensive experiments on a real-world dataset demonstrate that FlightPatchNet achieves the most competitive performance.

References

  • [1] Afshin Abadi, Tooraj Rajabioun, and Petros A. Ioannou. Traffic flow prediction for road transportation networks with limited traffic data. IEEE Transactions on Intelligent Transportation Systems, 16(2):653–662, 2015.
  • [2] Yi Lin, Jian wei Zhang, and Hong Liu. Deep learning based short-term air traffic flow prediction considering temporal–spatial correlation. Aerospace Science and Technology, 93:105–113, 2019.
  • [3] Zhengmao Chen, Dongyue Guo, and Yi Lin. A deep gaussian process-based flight trajectory prediction approach and its application on conflict detection. Algorithms, 13(11):293, 2020.
  • [4] Zhengyi Wang, Man Liang, and Daniel Delahaye. A hybrid machine learning model for short-term estimated time of arrival prediction in terminal manoeuvring area. Transportation Research Part C: Emerging Technologies, 95:280–294, 2018.
  • [5] Yi Lin, Linjie Deng, Zhengmao Chen, Xi** Wu, Jianwei Zhang, and Bo Yang. A real-time atc safety monitoring framework using a deep learning approach. IEEE Transactions on Intelligent Transportation Systems, 21(11):4572–4581, 2020.
  • [6] Zhiyuan Shi, Min Xu, and Quan Pan. 4-d flight trajectory prediction with constrained lstm network. IEEE Transactions on Intelligent Transportation Systems, 22(11):7242–7255, 2021.
  • [7] Donggi Jeong, Min** Baek, and Sang-Sun Lee. Long-term prediction of vehicle trajectory based on a deep neural network. In 2017 International Conference on Information and Communication Technology Convergence (ICTC), pages 725–727. IEEE, 2017.
  • [8] Du Runle, Liu Jiaqi, Gao Lu, Li Zhifeng, and Zhang Li. Long term trajectory prediction based on advanced guidance law recognition. In 2017 IEEE International Workshop on Metrology for AeroSpace (MetroAeroSpace), pages 456–461. IEEE, 2017.
  • [9] Chengjue Yuan, Dewei Li, and Dewei Xi. Medium-term prediction of urban traffic states using probability tree. In 2016 35th Chinese Control Conference (CCC), pages 9246–9251. IEEE, 2016.
  • [10] Dan Chen, Minghua Hu, Ke Han, Honghai Zhang, and Jianan Yin. Short/medium-term prediction for the aviation emissions in the en route airspace considering the fluctuation in air traffic demand. Transportation Research Part D: Transport and Environment, 48:46–62, 2016.
  • [11] Darong Huang, Zhen** Deng, Ling Zhao, and Bo Mi. A short-term traffic flow forecasting method based on markov chain and grey verhulst model. In 2017 6th Data Driven Control and Learning Systems (DDCLS), pages 606–610. IEEE, 2017.
  • [12] Peibo Duan, Guoqiang Mao, Weifa Liang, and Degan Zhang. A unified spatio-temporal model for short-term traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems, 20(9):3212–3223, 2018.
  • [13] Honglei Yan, Genghua Huang, Haiwei Wang, and Rong Shu. Application of unscented kalman filter for flying target tracking. In 2013 International Conference on Information Science and Cloud Computing, pages 61–66, 2013.
  • [14] Zheng Zhang, Dongyue Guo, Shizhong Zhou, Jianwei Zhang, and Yi Lin. Flight trajectory prediction enabled by time-frequency wavelet transform. Nature Communications, 14(1):5258, 2023.
  • [15] Dongyue Guo, Edmond Q. Wu, Yuankai Wu, Jianwei Zhang, Rob Law, and Yi Lin. Flightbert: Binary encoding representation for flight trajectory prediction. IEEE Transactions on Intelligent Transportation Systems, 24(2):1828–1842, 2023.
  • [16] Dongyue Guo, Zheng Zhang, Zhen Yan, Jianwei Zhang, and Yi Lin. Flightbert++: A non-autoregressive multi-horizon flight trajectory prediction framework. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 127–134, 2024.
  • [17] Han Wu, Yan Liang, Bin Zhou, and Hao Sun. A bi-lstm and autoencoder based framework for multi-step flight trajectory prediction. In 2023 8th International Conference on Control and Robotics Engineering (ICCRE), pages 44–50. IEEE, 2023.
  • [18] Lan Ma and Shan Tian. A hybrid cnn-lstm model for aircraft 4d trajectory prediction. IEEE Access, 8:134668–134680, 2020.
  • [19] Zhiyuan Shi, Min Xu, Quan Pan, Bing Yan, and Haimin Zhang. Lstm-based flight trajectory prediction. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2018.
  • [20] David P Thipphavong, Charles A Schultz, Alan G Lee, and Steven H Chan. Adaptive algorithm to improve trajectory prediction accuracy of climbing aircraft. Journal of Guidance, Control, and Dynamics, 36(1):15–24, 2013.
  • [21] Manuel Soler, Alberto Olivares, and Ernesto Staffetti. Multiphase optimal control framework for commercial aircraft four-dimensional flight-planning problems. Journal of Aircraft, 52(1):274–286, 2015.
  • [22] Jose V Benavides, John Kaneshige, Shivanjli Sharma, Ramesh Panda, and Mieczyslaw Steglinski. Implementation of a trajectory prediction function for trajectory based operations. In AIAA Atmospheric Flight Mechanics Conference, page 2198, 2014.
  • [23] Xin-min Tang, Long Zhou, Zhi-yuan Shen, and Miao Tang. 4d trajectory prediction of aircraft taxiing based on fitting velocity profile. In Aeronautical Computing Technique, volume 45, pages 1–12. 2015.
  • [24] Chao Wang, Jiuxia Guo, and Zhipeng Shen. Prediction of 4d trajectory based on basic flight models. Journal of southwest jiaotong university, 44(2):295–300, 2009.
  • [25] Zhi**g Zhou, **liang Chen, Beibei Shen, Zhigang Xiong, Hua Shen, and Fangyue Guo. A trajectory prediction method based on aircraft motion model and grey theory. In 2016 IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), pages 1523–1527, 2016.
  • [26] Taobo Wang. 4d flight trajectory prediction model based on improved kalman filter. Journal of Computer Applications, 34(6), 2014.
  • [27] Lin Xi, Zhang Jun, Zhu Yanbo, and Liu Wei. Simulation study of algorithms for aircraft trajectory prediction based on ads-b technology. In 2008 Asia Simulation Conference-7th International Conference on System Simulation and Scientific Computing, pages 322–327. IEEE, 2008.
  • [28] Inseok Hwang, Jesse Hwang, and Claire Tomlin. Flight-mode-based aircraft conflict detection using a residual-mean interacting multiple model algorithm. In AIAA guidance, navigation, and control conference and exhibit, page 5340, 2003.
  • [29] X Rong Li and Vesselin P Jilkov. Survey of maneuvering target tracking. part v. multiple-model methods. IEEE Transactions on aerospace and electronic systems, 41(4):1255–1321, 2005.
  • [30] Yutian Pang, Xinyu Zhao, Jueming Hu, Hao Yan, and Yongming Liu. Bayesian spatio-temporal graph transformer network (b-star) for multi-aircraft trajectory prediction. Knowledge-Based Systems, 249, 2022.
  • [31] Zhengfeng Xu, Weili Zeng, Xiao Chu, and Puwen Cao. Multi-aircraft trajectory collaborative prediction based on social long short-term memory network. Aerospace, 8(4), 2021.
  • [32] Deepudev Sahadevan, Palanisamy Ponnusamy, Varun P Gopi, and Manjunath K Nelli. Ground-based 4d trajectory prediction using bi-directional lstm networks. Applied Intelligence, 52(14):16417–16434, 2022.
  • [33] Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Neural Information Processing Systems, 2017.
  • [34] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  • [35] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, and Sylvain Gelly. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  • [36] Yuqi Nie, Nam H. Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. In International Conference on Learning Representations, 2023.
  • [37] Si-An Chen, Chun-Liang Li, Sercan O Arik, Nathanael Christian Yoder, and Tomas Pfister. Tsmixer: An all-mlp architecture for time series forecasting. Transactions on Machine Learning Research, 2023.
  • [38] Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? In Proceedings of the AAAI conference on artificial intelligence, volume 37, pages 11121–11128, 2023.
  • [39] Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting. In The Twelfth International Conference on Learning Representations, 2023.
  • [40] Diederik P. Kingma and Jimmy Ba. Adam: a method for stochastic optimization. In The Second International Conference on Learning Representations, 2014.

Appendix A Experimental Details

A.1 Dataset Preprocessing and Description

This paper exploits real-world datasets provided by OpenSky from 2020 to 2022 to validate our proposed model. The data preprocessing steps are as follows:

(1) Data Extraction: We extract seven features from the raw data, including timestamp, longitude, latitude, altitude, horizontal flight speed, horizontal flight angle, and vertical speed. The timestamp is used to identify whether the trajectory points are continuous, and the other six features are further processed as inputs to the model.

(2) Data Filtering: Due to many missing values and outliers in the raw dataset, we select 100 consecutive points without missing values as a complete flight trajectory. Then, we adopt the z-score method to find out the outliers. If one flight trajectory contains any outliers, we discard the whole trajectory. The z-score formula is as follows:

z=(x¯μ)σn𝑧¯𝑥𝜇𝜎𝑛z=\frac{(\overline{x}-\mu)}{\sigma-\sqrt{n}}italic_z = divide start_ARG ( over¯ start_ARG italic_x end_ARG - italic_μ ) end_ARG start_ARG italic_σ - square-root start_ARG italic_n end_ARG end_ARG (1)

where x¯¯𝑥\overline{x}over¯ start_ARG italic_x end_ARG is the value of each feature point, μ𝜇\muitalic_μ is the mean of each feature, σ𝜎\sigmaitalic_σ is the variance of each feature, and n𝑛nitalic_n is the number of feature points.

(3) Format Transformation: We transform the horizontal velocity into Vxsubscript𝑉𝑥V_{x}italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and Vysubscript𝑉𝑦V_{y}italic_V start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT according to the angle, where Vxsubscript𝑉𝑥V_{x}italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT is the velocity in the longitude dimension and Vysubscript𝑉𝑦V_{y}italic_V start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT is the velocity in the latitude dimension. In this way, the features become longitude, latitude, altitude, Vxsubscript𝑉𝑥V_{x}italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, Vysubscript𝑉𝑦V_{y}italic_V start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT and Vzsubscript𝑉𝑧V_{z}italic_V start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT.

(4) Data Segmentation: The dataset is randomly divided into three parts with a ratio of 8:1:1 for training, validation, and testing.

After the above preprocessing, 274,605 flight trajectories are selected into our dataset. The range of longitude, latitude and altitude are [179.86396,178.82147]superscript179.86396superscript178.82147[-179.86396^{\circ},178.82147^{\circ}][ - 179.86396 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , 178.82147 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ], [46.42435,70.32590]superscript46.42435superscript70.32590[-46.42435^{\circ},70.32590^{\circ}][ - 46.42435 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , 70.32590 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ] and [0,21031.00m]021031.00𝑚[0,21031.00m][ 0 , 21031.00 italic_m ], respectively. The interval between two adjacent flight trajectory points is 10 seconds.

A.2 Baseline Methods

We briefly describe the selected 5 competitive baselines as follows:

  • LSTM [19]: Based on two layers of LSTM (with 30 and 60 nodes respectively) to encode each trajectory point, and future trajectories are predicted through a fully connected layer.

  • Bi-LSTM [32]: Based on two layers of Bi-LSTM (with 200 and 50 nodes respectively) to encode each trajectory point, and future trajectories are predicted through a fully connected layer.

  • CNN-LSTM [18]: Based on two layers of one-dimensional CNN (the convolution kernel size is 1×3131\times 31 × 3) and two layers of LSTM (with 50 nodes) to encode each trajectory point, and future trajectories are predicted through a fully connected layer.

  • FlightBERT [15]: It utilizes a BE representation to convert the scalar attributes of the flight trajectory into binary vectors, considering the FTP task as a multi binary classification problem. It uses 18, 16, 11 and 11 bits to encode the real values (decimals) of longitude, latitude, altitude and velocities into BE representation respectively.

  • FlightBERT++ [16]: It inherits the BE representation from the FlightBERT and introduces a differential prediction paradigm, which aims to predict the differential values of the trajectory attributes instead of the absolute values.

A.3 Implementation Details

For fairness, all the models follow the same experimental setup with look-back window L=60𝐿60L=60italic_L = 60 and prediction horizon T{1,3,9,15}𝑇13915T\in\{1,3,9,15\}italic_T ∈ { 1 , 3 , 9 , 15 }, which means the observation time is 10 minutes and the forecasting time is 10 seconds, 30 seconds, 1.5 minutes, 2.5 minutes. The patch sizes in multi-scale patch mixer blocks are set to {30, 20, 10, 6, 2}. The dimension of temporal embedding d𝑑ditalic_d is 128. For all the MSA in this paper, the head number is 8 and the attention layer l𝑙litalic_l is 3. The learning rate is set as 104superscript10410^{-4}10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT for all experiments. Our method is trained with MSE loss, using the Adam optimizer [40]. The training process is early stopped within 30 epochs. The training would be terminated early if the validation loss does not decrease for three consecutive rounds. The model is implemented in PyTorch 2.2.1 and trained on a single NVIDIA RTX 4080 GPU with 16GB memory.

A.4 Evaluation Metrics

Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are exploited to evaluate the proposed model and baselines, which are defined as:

MAE𝑀𝐴𝐸\displaystyle MAEitalic_M italic_A italic_E =1Ti=1T|𝐘i𝐘^i|absent1𝑇superscriptsubscript𝑖1𝑇subscript𝐘𝑖subscript^𝐘𝑖\displaystyle=\frac{1}{T}\sum_{i=1}^{T}|\mathbf{Y}_{i}-\hat{\mathbf{Y}}_{i}|= divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | bold_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT |
RMSE𝑅𝑀𝑆𝐸\displaystyle RMSEitalic_R italic_M italic_S italic_E =1Ti=1T(𝐘i𝐘^i)2absent1𝑇superscriptsubscript𝑖1𝑇superscriptsubscript𝐘𝑖subscript^𝐘𝑖2\displaystyle=\sqrt{\frac{1}{T}{\sum_{i=1}^{T}(\mathbf{Y}_{i}-\hat{\mathbf{Y}}% _{i})^{2}}}= square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_T end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT ( bold_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG

where 𝐘isubscript𝐘𝑖\mathbf{Y}_{i}bold_Y start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, 𝐘^isubscript^𝐘𝑖\hat{\mathbf{Y}}_{i}over^ start_ARG bold_Y end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are the ground truth and prediction result for i𝑖iitalic_i-th future point, respectively.

Appendix B Additional Experimental Results

B.1 Hyper-Parameter Sensitivity

Number of Scales

We perform experiments on the different number of scales and report the MAE and RMSE results. As shown in Figure 4, we can observe that when the number of scales increases from 2 to 5, the performance of FlightPatchNet is constantly improved. This is because FlightPatchNet can capture diverse global and local temporal patterns under different scales. When the number of scales increases up to 6, the performance starts to deteriorate. This indicates that a certain number of scales is sufficient for temporal modeling, and excessive scales may lead to the overfitting problem.

Refer to caption
Figure 4: MAE and RMSE with different number of scales for prediction horizon T{1,3,9,15}𝑇13915T\in\{1,3,9,15\}italic_T ∈ { 1 , 3 , 9 , 15 }
Number of Attention Layers

We test the number of attention layers in {1,2,3,6}1236\{1,2,3,6\}{ 1 , 2 , 3 , 6 } for global temporal attention, scale fusion, and channel fusion. The results are shown in Figure 5(a), Figure 5(b) and Figure 5(c). We can observe that when the number of attention layers increases from 1 to 3, the values of MAE and RMSE decrease, demonstrating that our model can better capture the dependencies between different time steps, scale-wise correlations and inter-variable relationships with more layers of attention. When the number of attention layers increases up to 6, the prediction accuracy does not improve. Thus, we choose to use three layers of attention in these parts.

Refer to caption
(a) MAE and RMSE of different attention layers in global temporal attention.
Refer to caption
(b) MAE and RMSE of different attention layers in scale fusion.
Refer to caption
(c) MAE and RMSE of different attention layers in channel fusion.
Figure 5: MAE and RMSE of different attention layers for prediction horizon T=15𝑇15T=15italic_T = 15
Look-Back Window Size 𝐋𝐋\mathbf{L}bold_L

Figure 6 demonstrates the MAE and RMSE results of our model with different look-back window sizes. We set the window size L𝐿Litalic_L to {10,20,30,40,50,60,70,80}1020304050607080\{10,20,30,40,50,60,70,80\}{ 10 , 20 , 30 , 40 , 50 , 60 , 70 , 80 }. The overall performance of FlightPatchNet is significantly improved as the window size increases from 10 to 60, indicating that FlightPatchNet can thoroughly capture the temporal dependencies from long flight trajectories. Moreover, the performance of altitude fluctuates with the increase of the window size, suggesting that the series of altitude are non-stationary and easily affected by unexpected noise. Thus, we set L𝐿Litalic_L as 60 to achieve the overall optimal performance.

Refer to caption
Refer to caption
Figure 6: MAE and RMSE of different look-back window sizes for prediction horizon T=15𝑇15T=15italic_T = 15
Order of Scales

We conduct experiments on the order of patch sizes and report the MAE and RMSE results. As shown in Table 5, we can observe that patch sizes in descending order can effectively improve the prediction performance, indicating that the macro knowledge from coarser scales can guide the temporal modeling of finer scales.

Table 5: The results of flight trajectory prediction with scales in ascending and descending order. \uparrow means scales in ascending order and \downarrow means scales in descending order. The better results are highlighted in bold.
patch sizes Lon() Lat() Alt(m)
Style Horizon 1 3 9 15 1 3 9 15 1 3 9 15
2,6,10,20,30 \uparrow MAE 0.00098 0.00155 0.00548 0.01008 0.00099 0.00106 0.00385 0.00697 54.02 33.47 79.39 127.51
RMSE 0.00187 0.00241 0.00887 0.01642 0.00131 0.00183 0.00656 0.01197 81.92 110.68 184.54 248.16
\downarrow MAE 0.00048 0.00153 0.00546 0.00966 0.00032 0.00105 0.00381 0.00678 13.34 32.65 78.57 123.97
RMSE 0.00087 0.00233 0.00885 0.01577 0.00064 0.00175 0.00652 0.01174 129.65 121.78 174.63 244.34
3,4,6,20,40 \uparrow MAE 0.00098 0.00155 0.00556 0.00997 0.00064 0.00105 0.00383 0.00704 39.77 32.15 76.38 124.18
RMSE 0.00188 0.00247 0.00901 0.01631 0.00131 0.00175 0.00655 0.01210 64.86 124.20 177.46 243.46
\downarrow MAE 0.00097 0.00153 0.00542 0.00963 0.00063 0.00104 0.00369 0.00670 43.46 28.96 79.27 128.13
RMSE 0.00187 0.00245 0.00879 0.01582 0.00130 0.00174 0.00631 0.01167 64.50 115.76 176.96 251.72
3,6,40 \uparrow MAE 0.00048 0.00156 0.00536 0.00994 0.00035 0.00105 0.00370 0.00691 14.96 31.42 79.07 117.43
RMSE 0.00087 0.00248 0.00876 0.01628 0.00065 0.00176 0.00634 0.01193 107.43 118.36 177.40 238.87
\downarrow MAE 0.00048 0.00153 0.00534 0.00988 0.00033 0.00103 0.00368 0.00685 16.81 31.26 71.96 118.66
RMSE 0.00087 0.00244 0.00870 0.01620 0.00064 0.00173 0.00633 0.01186 145.25 114.33 175.63 236.32

B.2 Error Bar

In this paper, we repeat all the experiments five times. Here we report the standard deviation of our model and the second best model in Table 6.

Table 6: Error bar of our FlightPatchNet and the second best model FlightBERT++.
Model Horizon Lon() Lat() Alt(m)
MAE RMSE MAE RMSE MAE RMSE
FlightBERT++ 1 0.00173±plus-or-minus\pm±6.45e-5 0.00360±plus-or-minus\pm±8.28e-5 0.00085±plus-or-minus\pm±3.33e-5 0.00148±plus-or-minus\pm±1.32e-4 9.39±plus-or-minus\pm±1.79 175.29±plus-or-minus\pm±29.09
3 0.00317±plus-or-minus\pm±2.61e-4 0.00659±plus-or-minus\pm±4.35e-5 0.00210±plus-or-minus\pm±3.05e-4 0.00425±plus-or-minus\pm±1.24e-4 21.89±plus-or-minus\pm±5.58 167.16±plus-or-minus\pm±46.39
9 0.00871±plus-or-minus\pm±1.74e-4 0.01846±plus-or-minus\pm±4.45e-4 0.00612±plus-or-minus\pm±6.07e-4 0.00959±plus-or-minus\pm±2.19e-4 47.84±plus-or-minus\pm±2.87 327.93±plus-or-minus\pm±52.84
15 0.01187±plus-or-minus\pm±5.91e-5 0.03131±plus-or-minus\pm±5.33e-4 0.01048±plus-or-minus\pm±3.69e-4 0.02127±plus-or-minus\pm±2.01e-4 78.46±plus-or-minus\pm±8.13 384.18±plus-or-minus\pm±51.82
FlightPatchNet (Ours) 1 0.00048±plus-or-minus\pm±1.24e-5 0.00087±plus-or-minus\pm±1.02e-5 0.00032±plus-or-minus\pm±1.06e-5 0.00064±plus-or-minus\pm±8.45e-6 13.34±plus-or-minus\pm±9.43 123.78±plus-or-minus\pm±15.13
3 0.00153±plus-or-minus\pm±3.19e-5 0.00233±plus-or-minus\pm±5.44e-4 0.00105±plus-or-minus\pm±1.19e-5 0.00175±plus-or-minus\pm±2.36e-5 32.65±plus-or-minus\pm±1.76 121.48±plus-or-minus\pm±2.81
9 0.00546±plus-or-minus\pm±1.54e-4 0.00885±plus-or-minus\pm±2.16e-4 0.00381±plus-or-minus\pm±7.47e-5 0.00652±plus-or-minus\pm±7.76e-5 78.57±plus-or-minus\pm±2.66 174.63±plus-or-minus\pm±6.87
15 0.00966±plus-or-minus\pm±3.65e-4 0.01577±plus-or-minus\pm±5.49e-4 0.00678±plus-or-minus\pm±2.58e-4 0.01174±plus-or-minus\pm±3.53e-4 123.97±plus-or-minus\pm±5.72 244.34±plus-or-minus\pm±6.91

B.3 Model Complexity

As shown in Table 7, our proposed FlightPatchNet achieves the greatest efficiency and has relatively small parameters compared to other models. For multi-step prediction, the DMS-based models (FlightPatchNet, FlightBERT++) demonstrate significant improvements in computational performance compared to the IMS-based models (FlightBERT, LSTM, Bi-LSTM, CNN-LSTM). In addition, FlightPatchNet is lightweight compared to FlightBERT++ and FlightBERT, which indicates our model can provide a reliable solution for real-time air transportation management.

Table 7: Model Complexity Comparisons. The look-back window size L=60𝐿60L=60italic_L = 60 and the prediction horizon T=15𝑇15T=15italic_T = 15 for all models .
Models Parameters (MB) FLOPs (M) Running Time (s/iter)
FlightPatchNet(ours) 5.69 64.38 0.0069
FlightBERT++ 44.26 3000 0.0112
FlightBERT 25.31 1620 0.2406
LSTM 0.03 1.67 0.0583
Bi-LSTM 0.51 31.15 0.1241
CNN-LSTM 0.04 1.22 0.0429

Appendix C Visualization

Visualization of FlightPatchNet Predictions

Figure 7 shows that FlightPatchNet can comprehensively capture the temporal variations of longitude and latitude, while it fails to fully reveal the temporal patterns from original altitude series.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 7: Visualization of the ground truth and predictions of FlightPatchNet. The series of altitude are in original data while those of longitude and latitude are in differential data, all denoted by meters.
Visualization of FlightPatchNet Predictions for Altitude

We present the visualization of FlightPatchNet predictions and ground truth for altitude in Figure 8. As shown in Figure 8, when the series of altitude are relatively smooth and stationary with obvious global trends, FlightPatchNet can effectively capture these trends and make accurate predictions. When the series suffers from many change points caused by frequent abrupt fluctuations, as depicted in Figure 8 and Figure 8, FlightPatchNet tends to focus on the irregular change points during prediction, leading to a large deviation from the ground truth. As a result, FlightPatchNet struggles to capture the real temporal variations in altitude and fails to provide accurate predictions.

Refer to caption
Refer to caption
Refer to caption
Figure 8: Visualization of predictions of FlightPatchNet and ground truth for altitude.
3D Trajectory Visualization

We visualize the flight trajectory prediction results of FlightPatchNet and all the baselines when the prediction horizon is 15. As shown in Figure 9, FlightPatchNet can provide stable and the most accurate predictions in longitude and latitude while it suffers from slight fluctuations in altitude.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 9: Visualization of flight trajectory prediction results when the prediction horizon T=15𝑇15T=15italic_T = 15. The look-back window size L=60𝐿60L=60italic_L = 60.

Appendix D Limitation and Future Work

FlightPatchNet has shown the most competitive performance for flight trajectory prediction. It is easy to implement and has fewer parameters, presenting a promising solution for real-time air traffic control applications. However, we also acknowledge the limitations of our work. Since the original series of altitude contains many fluctuations caused by unexpected noise, our primary focus on modeling temporal variations has led to a large bias towards output predictions of altitude. In the future, we will further explore temporal modeling of altitude and consider graph networks to overcome the prediction bias affected by unexpected noise.