Generative Modeling with Phase Stochastic Bridges

Tianrong Chen1, Jiatao Gu2, Laurent Dinh2, Evangelos A. Theodorou1
Joshua Susskind2, Shuangfei Zhai2
1Georgia Tech, 2Apple
{tianrong.chen, evangelos.theodorou}@gatech.edu, {jgu32,l_dinh,jsusskind,szhai}@apple.com
work done while Tianrong Chen is an intern at Apple
Abstract

We introduce a novel generative modeling framework grounded in phase space dynamics, taking inspiration from the principles underlying Critically damped Langevin Dynamics and Bridge Matching. Leveraging insights from Stochastic Optimal Control, we construct a more favorable path measure in the phase space that is highly advantageous for efficient sampling. A distinctive feature of our approach is the early-stage data prediction capability within the context of propagating generative Ordinary Differential Equations or Stochastic Differential Equations. This early prediction, enabled by the model’s unique structural characteristics, sets the stage for more efficient data generation, leveraging additional velocity information along the trajectory. This innovation has spurred the exploration of a novel avenue for mitigating sampling complexity by quickly converging to realistic data samples. Our model yields comparable results in image generation and notably outperforms baseline methods, particularly when faced with a limited Number of Function Evaluations. Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential in the realm of generative modeling. Code is available at https://github.com/apple/ml-agm.

1 Introduction

Diffusion Models (DMs;Song et al. (2020a); Ho et al. (2020)) constitute an instrumental technique in generative modeling, which formulate a particular Stochastic Differential Equation (SDE) linking the data distribution with a tractable prior distribution. Initially, a DM diffuses data towards the prior distribution via a predetermined linear SDE. In order to reverse the process, a neural network is used to approximate the score function which is analytically available. Subsequently, the approximated score is utilized to conduct time reversal (Anderson, 1982; Haussmann & Pardoux, 1986) of this diffusion process, ultimately generating data. Recently, the Critical-damped Langevin Dynamics (CLD;Dockhorn et al. (2021)) extends the SDE framework of DM into phase space (whereas DMs operate in the position space) by introducing an auxiliary velocity variable, which is defined by tractable Gaussian distributions at the initial and terminal time steps. This augmentation induces a trajectory in position space exhibiting enhanced smoothness, as stochasticity is solely introduced into the velocity space. The distinctive structure of CLD is shown to enhance the empirical performance and sample efficiency. However, despite the success of CLD, inefficient sampling still persists due to unnecessary curvature of the dynamics (Fig.1) as it has to converge to equilibrium for sampling from the tractable prior.

The remarkable accomplishments of DM have also catalyzed recent advancements in generative modeling, leading to the development of Bridge Matching (BM;(Peluchetti, 2021; Liu et al., 2022; 2023)) and Flow Matching (FM;models(Lipman et al., 2022)). These models leverage dynamic transport maps underpinned by the utilization of SDEs or ODEs. Unlike DM, Bridge and Flow Matching relaxes the reliance on a forward diffusion process with an asymptotic convergence to a prior distribution over an infinite time horizon. Moreover, they exhibit a heightened degree of versatility, enabling the construction of transport maps between two arbitrary distributions by drawing upon insights from domains such as optimal transport (Pooladian et al., 2023), normalizing flow (Tong et al., 2023b), and optimal control (Liu et al., 2023).

In this paper, we focus on enhancing the sample efficiency of velocity based generative modeling (eg, CLD) by utilizing the Stochastic Optimal Control (SOC) theory. Specifically, we leverage the outcomes of stochastic bridge within the context of linear momentum systems (Chen & Georgiou, 2015) to construct a path measure bridging the data and prior distribution. The resulting path exhibits a more straight position and velocity trajectory compared to CLD (fig.1), making it more amenable to efficient sampling. Within the broader landscape of dynamic generative modeling (ie, ODE/SDE based generative models), data point can often be represented as linear combinations of scaled intermediate data of dynamics and Gaussian noise. In our work, we re-establish this property, enabling the estimation of target data points by leveraging both state and velocity information. In the case of DM and FM, the estimation of target data is exclusively reliant on position information, whereas our method incorporates the additional dimension of velocity data, enhancing the precision and comprehensiveness of our estimations. It is also worth noting that our model exhibits the capacity to generate high fidelity images at early time steps (fig.2). In addition, we propose a sampling technique which demonstrates competitive results with small Number of Function Evaluations (NFEs), eg, 5 to 10. Table.1 demonstrates the design differences among aforementioned models. In summary, our paper presents the following contributions:

  1. 1.

    We propose Acceleration Generative Modeling (AGM) which is built on the SOC theory, enabling the favorable trajectories for efficient sampling over 2nd-order momentum dynamics generative modeling such as CLD.

  2. 2.

    As a result of AGM structural characteristics, it becomes possible to estimate a realistic data point at an early time point, a concept we refer to as sampling-hop. This approach not only yields a significant reduction in sampling complexity but also offers a novel perspective on accelerating the sampling in generative modeling by leveraging additional information from the dynamics.

  3. 3.

    We achieve competitive results compared to DM approaches equipped with specifically designed fast sampling techniques on image datasets, particularly in small NFE settings.

Refer to caption
Figure 1: The pixel-wise trajectories comparison with CLD(Dockhorn et al., 2021). Left figures correspond to the trajectories over time w.r.t random sampled 16 pixels, for position and velocity. Our model is able to learn straighter trajectories which is beneficial for reducing sampling complexity.

2 Preliminary

Notation: Let 𝐱tdsubscript𝐱𝑡superscript𝑑{\mathbf{x}}_{t}\in\mathbb{R}^{d}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT and 𝐯tdsubscript𝐯𝑡superscript𝑑{\mathbf{v}}_{t}\in\mathbb{R}^{d}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT denote the d𝑑ditalic_d-dimensional position and velocity variable of a particle 𝐦t=[𝐱t,𝐯t]𝖳2dsubscript𝐦𝑡superscriptsubscript𝐱𝑡subscript𝐯𝑡𝖳superscript2𝑑{\mathbf{m}}_{t}=[{\mathbf{x}}_{t},{\mathbf{v}}_{t}]^{\mathsf{T}}\in\mathbb{R}% ^{2d}bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 2 italic_d end_POSTSUPERSCRIPT at time t𝑡titalic_t. We denote the discretized time series as 0t0<tn<tN<10subscript𝑡0subscript𝑡𝑛subscript𝑡𝑁10\leq t_{0}<...t_{n}<t_{N}<10 ≤ italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT < … italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT < italic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT < 1. The Wiener Process is denoted as 𝐰tsubscript𝐰𝑡{\mathbf{w}}_{t}bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. The identity matrix is denoted as 𝐈dd×dsubscript𝐈𝑑superscript𝑑𝑑{\mathbf{I}}_{d}\in\mathbb{R}^{d\times d}bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_d × italic_d end_POSTSUPERSCRIPT. We define 𝚺tsubscript𝚺𝑡{\bm{\Sigma}_{t}}bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT as the covariance matrix of 𝐱tsubscript𝐱𝑡{\mathbf{x}}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 𝐯tsubscript𝐯𝑡{\mathbf{v}}_{t}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT at time step t𝑡titalic_t.

2.1 Dynamical Generative Modeling

The generative modeling approaches rooted in dynamical systems, including ODE and SDE, have garnered significant attention. Here, we present three noteworthy dynamical generative models: Diffusion Model (DM), Flow Matching (FM) and Bridge Matching (BM).

Table 1: Comparison between models in terms of boundary distributions p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Our AGM generalizes DM beyond Gaussian priors to phase space, similar to CLD. However, unlike CLD, AGM does not need to converge to the Gaussian at equilibrium which causes curved trajectory(see Fig.1), instead, velocity distribution will be the convolution of data distribution with Gaussian.
Models DM/FM CLD AGM(ours)
p0()subscript𝑝0p_{0}(\cdot)italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( ⋅ ) pdata(x)subscript𝑝data𝑥p_{\rm{data}}(x)italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT ( italic_x ) pdata(x)×𝒩(𝟎,𝑰d)subscript𝑝data𝑥𝒩0subscript𝑰𝑑p_{\rm{data}}(x)\times\mathcal{N}(\mathbf{0},{\bm{I}}_{d})italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT ( italic_x ) × caligraphic_N ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) 𝒩(𝟎,𝚺0×𝑰2d)𝒩0subscript𝚺0subscript𝑰2𝑑\mathcal{N}(\mathbf{0},{\bm{\Sigma}}_{0}\times{\bm{I}}_{2d})caligraphic_N ( bold_0 , bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT × bold_italic_I start_POSTSUBSCRIPT 2 italic_d end_POSTSUBSCRIPT )
p1()subscript𝑝1p_{1}(\cdot)italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ ) 𝒩(𝟎,𝑰d)𝒩0subscript𝑰𝑑{\mathcal{N}}({\mathbf{0}},{\bm{I}}_{d})caligraphic_N ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) 𝒩(𝟎,𝑰d)×𝒩(𝟎,𝑰d)𝒩0subscript𝑰𝑑𝒩0subscript𝑰𝑑\mathcal{N}(\mathbf{0},{\bm{I}}_{d})\times\mathcal{N}(\mathbf{0},{\bm{I}}_{d})caligraphic_N ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) × caligraphic_N ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) pdata(x)×pdata(x)𝒩(𝟎,𝚺1𝑰2d)subscript𝑝data𝑥subscript𝑝data𝑥𝒩0tensor-productsubscript𝚺1subscript𝑰2𝑑p_{\rm{data}}(x)\times p_{\rm{data}}(x)*\mathcal{N}(\mathbf{0},\bm{\Sigma}_{1}% \otimes{\bm{I}}_{2d})italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT ( italic_x ) × italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT ( italic_x ) ∗ caligraphic_N ( bold_0 , bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⊗ bold_italic_I start_POSTSUBSCRIPT 2 italic_d end_POSTSUBSCRIPT )

Diffusion Model: In the framework of DM, given 𝐱0subscript𝐱0{\mathbf{x}}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT drawn from a data distribution pdatasubscript𝑝datap_{\rm{data}}italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT, the model proceeds to construct a SDE,

d𝐱t=ft(𝐱t)dt+g(t)d𝐰t𝐱0pdata(𝐱)formulae-sequencedsubscript𝐱𝑡subscript𝑓𝑡subscript𝐱𝑡d𝑡𝑔𝑡dsubscript𝐰𝑡similar-tosubscript𝐱0subscript𝑝data𝐱\displaystyle{\textnormal{d}}{\mathbf{x}}_{t}=f_{t}({\mathbf{x}}_{t}){% \textnormal{d}}t+g(t){\textnormal{d}}{\mathbf{w}}_{t}\quad{\mathbf{x}}_{0}\sim p% _{\rm{data}}({\mathbf{x}})d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) d italic_t + italic_g ( italic_t ) d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT ( bold_x ) (1)

whose terminal distributions at t=1𝑡1t=1italic_t = 1 approach an approximate Gaussian, i.e. 𝐱1𝒩(𝟎,𝑰d)similar-tosubscript𝐱1𝒩0subscript𝑰𝑑{\mathbf{x}}_{1}\sim\mathcal{N}({\mathbf{0}},{\bm{I}}_{d})bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_0 , bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ). This accomplishment is realized through the careful selection of the diffusion coefficient gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the base drift ft(𝐱t)subscript𝑓𝑡subscript𝐱𝑡f_{t}({\mathbf{x}}_{t})italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ). It is noteworthy that the time-reversal (Anderson, 1982) of (1) results in another SDE:

d𝐱t=[ft(𝐱t)gt2𝐱logp(𝐱t,t)]dt+g(t)d𝐰t,𝐱1𝒩(𝟎,𝐈d)formulae-sequencedsubscript𝐱𝑡delimited-[]subscript𝑓𝑡subscript𝐱𝑡superscriptsubscript𝑔𝑡2subscript𝐱𝑝subscript𝐱𝑡𝑡d𝑡𝑔𝑡dsubscript𝐰𝑡similar-tosubscript𝐱1𝒩0subscript𝐈𝑑\displaystyle{\textnormal{d}}{\mathbf{x}}_{t}=\left[f_{t}({\mathbf{x}}_{t})-g_% {t}^{2}\nabla_{{\mathbf{x}}}\log p({\mathbf{x}}_{t},t)\right]{\textnormal{d}}t% +g(t){\textnormal{d}}{\mathbf{w}}_{t},\quad{\mathbf{x}}_{1}\sim\mathcal{N}({% \mathbf{0}},{\mathbf{I}}_{d})d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT roman_log italic_p ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ] d italic_t + italic_g ( italic_t ) d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) (2)

where p(,t)𝑝𝑡p(\cdot,t)italic_p ( ⋅ , italic_t ) is the marginal density of (1) at time t𝑡titalic_t and 𝐱logptsubscript𝐱subscript𝑝𝑡\nabla_{{\mathbf{x}}}\log p_{t}∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is known as the score function. SDE (2) can be regarded as the time-reversal of (1) in such a manner that the path-wise measure is almost surely equivalent to the one induced by (1). As a consequence, these two SDEs share identical marginal over time. In practice, it is feasible to analytically sample 𝐱tsubscript𝐱𝑡{\mathbf{x}}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT given t𝑡titalic_t and 𝐱0subscript𝐱0{\mathbf{x}}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Additionally, we can leverage a neural network to learn the score function by regressing scaled Stein Score 𝔼𝐱t,t𝐬tθ(𝐱t,t;θ)𝐱logp(𝐱t,t|𝐱0)22subscript𝔼subscript𝐱𝑡𝑡superscriptsubscriptdelimited-∥∥superscriptsubscript𝐬𝑡𝜃subscript𝐱𝑡𝑡𝜃subscript𝐱𝑝subscript𝐱𝑡conditional𝑡subscript𝐱022\mathbb{E}_{{\mathbf{x}}_{t},t}\lVert{\mathbf{s}}_{t}^{\theta}({\mathbf{x}}_{t% },t;\theta)-\nabla_{{\mathbf{x}}}\log p({\mathbf{x}}_{t},t|{\mathbf{x}}_{0})% \rVert_{2}^{2}blackboard_E start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ∥ bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ; italic_θ ) - ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT roman_log italic_p ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t | bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for the purpose of propagating (2). This learned score can then be integrated into the solution of the aforementioned SDE(2) to simulate the generation of data that adheres to the target data distribution from the prior distribution. Meanwhile, (2) also corresponds to an ODE which shares the same path-wise measure:

d𝐱t=[ft(𝐱t)12gt2𝐱logp(𝐱t,t)]dt,𝐱1𝒩(𝟎,𝐈d)formulae-sequencedsubscript𝐱𝑡delimited-[]subscript𝑓𝑡subscript𝐱𝑡12superscriptsubscript𝑔𝑡2subscript𝐱𝑝subscript𝐱𝑡𝑡d𝑡similar-tosubscript𝐱1𝒩0subscript𝐈𝑑\displaystyle{\textnormal{d}}{\mathbf{x}}_{t}=\left[f_{t}({\mathbf{x}}_{t})-% \frac{1}{2}g_{t}^{2}\nabla_{{\mathbf{x}}}\log p({\mathbf{x}}_{t},t)\right]{% \textnormal{d}}t,\quad{\mathbf{x}}_{1}\sim\mathcal{N}({\mathbf{0}},{\mathbf{I}% }_{d})d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_f start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_x end_POSTSUBSCRIPT roman_log italic_p ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ] d italic_t , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT ) (3)

which motivates the popular sampler introduced in (Zhang & Chen, 2022; Zhang et al., 2022; Bao et al., 2022) to solve the ODE (2) efficiently.

Bridge Matching and Flow Matching: An alternative approach to exploring the time-reversal of a forward noising process involves the concept of ’building bridges’ between two distinct distributions p0()subscript𝑝0p_{0}(\cdot)italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( ⋅ ) and p1()subscript𝑝1p_{1}(\cdot)italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( ⋅ ). This method entails the learning of a mimicking diffusion process, commonly referred to as bridge matching, as elucidated in previous works (Peluchetti, 2021; Shi et al., 2022). Here we consider the SDE in the form of:

d𝐱t=𝐯t(𝐱,t)dt+gtd𝐰ts.t.(x0,x1)Π0,1(𝐱0,𝐱1):=p0×p1\displaystyle{\textnormal{d}}{\mathbf{x}}_{t}={\mathbf{v}}_{t}({\mathbf{x}},t)% {\textnormal{d}}t+g_{t}{\textnormal{d}}{\mathbf{w}}_{t}\quad s.t.\quad(x_{0},x% _{1})\sim\Pi_{0,1}({\mathbf{x}}_{0},{\mathbf{x}}_{1}):=p_{0}\times p_{1}d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x , italic_t ) d italic_t + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_s . italic_t . ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∼ roman_Π start_POSTSUBSCRIPT 0 , 1 end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) := italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT × italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (4)

which is pinned down at an initial and terminal point x0,x1subscript𝑥0subscript𝑥1x_{0},x_{1}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT which are independently samples from predefined p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. This is commonly known as the reciprocal projection of x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in the literature (Shi et al., 2023; Peluchetti, 2023; Liu et al., 2022; Léonard et al., 2014). The construction of such SDE is accomplished by meticulous design of 𝐯tsubscript𝐯𝑡{\mathbf{v}}_{t}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. A widely adopted choice for 𝐯tsubscript𝐯𝑡{\mathbf{v}}_{t}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is 𝐯t:=(𝐱1𝐱t)/(1t)assignsubscript𝐯𝑡subscript𝐱1subscript𝐱𝑡1𝑡{\mathbf{v}}_{t}:=({\mathbf{x}}_{1}-{\mathbf{x}}_{t})/(1-t)bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / ( 1 - italic_t ), which induces the well-known Brownian Bridge (Liu et al., 2023; Somnath et al., 2023). Similar to the approach in DM and owing to the linear structure of the dynamics, one can efficiently estimate this drift by employing a neural network parameterized by weights θ𝜃\thetaitalic_θ for regression on: 𝔼𝐱t,t𝐯tθ(𝐱t,t;θ)𝐯t(𝐱t,t)22subscript𝔼subscript𝐱𝑡𝑡superscriptsubscriptdelimited-∥∥superscriptsubscript𝐯𝑡𝜃subscript𝐱𝑡𝑡𝜃subscript𝐯𝑡subscript𝐱𝑡𝑡22\mathbb{E}_{{\mathbf{x}}_{t},t}\lVert{\mathbf{v}}_{t}^{\theta}({\mathbf{x}}_{t% },t;\theta)-{\mathbf{v}}_{t}({\mathbf{x}}_{t},t)\rVert_{2}^{2}blackboard_E start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t end_POSTSUBSCRIPT ∥ bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ; italic_θ ) - bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT given 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and t𝑡titalic_t. As extensively discussed in previous studies (Liu et al., 2023; Shi et al., 2022), this bridge matching framework takes on the characteristics of FM (Lipman et al., 2022) when the diffusion coefficient gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT tends to zero.

Remark 1.

The practice of constraining a stochastic process to specific initial and terminal conditions is a well-established setup in SOC. For a gentle introduction of it’s connection with Brownian Bridge, Schrödinger Bridge please see Appendix.C. From this perspective, one can derive Brownian Bridge, as elaborated in Appendix.D.1 for comprehensive elucidation. It is imperative to note that the SOC framework will serve as the fundamental basis upon which we will develop our algorithm.

3 Acceleration Generative Model

We apply SOC to characterize the twisted trajectory of momentum dynamics induced by CLD(Dockhorn et al., 2021). It becomes evident that the mechanisms encompassing flow matching, diffusion modeling, and Bridge matching collectively facilitate the construction of an estimated target data point, denoted as 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, by utilizing the intermediate state of the dynamics, 𝐱tsubscript𝐱𝑡{\mathbf{x}}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Our additional objective is to expedite the estimation of a plausible 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by incorporating additional dynamics-related information, such as velocity, thereby curtailing the requisite time integration.

In this section, we introduce the proposed method, termed as the Acceleration Generative Model (AGM), rooted in SOC theory. Building upon (Chen & Georgiou, 2015), we extend the framework by incorporating a time-varying diffusion coefficient and accommodating arbitrary boundary conditions, ultimately arriving at an analytical solution suited for the generative modeling. We demonstrate its efficacy in rectifying the trajectory of CLD, concurrently showcasing its aptitude for accurately estimating the target data at an early timestep tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, thereby enabling expeditious sampling.

As suggested by BM approach, there is a necessity to formulate a trajectory that bridges the two data points sampled from p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT respectively. Desirably, the intermediate trajectory should exhibit optimal characteristics that facilitate smoothness and linearity. This is essential for the ease of simulating the dynamics system to obtain the solution. In our endeavor to tackle this challenge and enhance the estimation of the data point 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by incorporating velocity components, we encapsulate the problem within a SOC framework, specifically formulated in the phase space which reads:

Definition 2 (Stochastic Bridge problem of linear momentum system (Chen & Georgiou, 2015)).
min𝐚tτ1𝐚t22dt+(𝐦1m1)𝖳𝐑(𝐦1m1)subscriptsubscript𝐚𝑡superscriptsubscript𝜏1superscriptsubscriptdelimited-∥∥subscript𝐚𝑡22d𝑡superscriptsubscript𝐦1subscript𝑚1𝖳𝐑subscript𝐦1subscript𝑚1\displaystyle\min_{{\mathbf{a}}_{t}}\int_{\tau}^{1}\lVert{\mathbf{a}}_{t}% \rVert_{2}^{2}{\textnormal{d}}t+({\mathbf{m}}_{1}-m_{1})^{\mathsf{T}}{\mathbf{% R}}({\mathbf{m}}_{1}-m_{1})roman_min start_POSTSUBSCRIPT bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT ∥ bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT d italic_t + ( bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_R ( bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) s.t[d𝐱td𝐯t]d𝐦t=[𝐯t𝐚t(𝐱t,𝐯t,t)]𝐟(𝐦,t)dt+[𝟎𝟎𝟎gt]𝐠td𝐰t,formulae-sequence𝑠𝑡subscriptmatrixdsubscript𝐱𝑡dsubscript𝐯𝑡dsubscript𝐦𝑡subscriptmatrixsubscript𝐯𝑡subscript𝐚𝑡subscript𝐱𝑡subscript𝐯𝑡𝑡𝐟𝐦𝑡d𝑡subscriptmatrix000subscript𝑔𝑡subscript𝐠𝑡dsubscript𝐰𝑡\displaystyle s.t\underbrace{\begin{bmatrix}{\textnormal{d}}{\mathbf{x}}_{t}\\ {\textnormal{d}}{\mathbf{v}}_{t}\end{bmatrix}}_{{\textnormal{d}}{\mathbf{m}}_{% t}}=\underbrace{\begin{bmatrix}{\mathbf{v}}_{t}\\ {\mathbf{a}}_{t}({\mathbf{x}}_{t},{\mathbf{v}}_{t},t)\end{bmatrix}}_{{\mathbf{% f}}({\mathbf{m}},t)}{\textnormal{d}}t+\underbrace{\begin{bmatrix}{\mathbf{0}}&% {\mathbf{0}}\\ {\mathbf{0}}&g_{t}\end{bmatrix}}_{{\mathbf{g}}_{t}}{\textnormal{d}}{\mathbf{w}% }_{t},italic_s . italic_t under⏟ start_ARG [ start_ARG start_ROW start_CELL d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL d bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT d bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = under⏟ start_ARG [ start_ARG start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT bold_f ( bold_m , italic_t ) end_POSTSUBSCRIPT d italic_t + under⏟ start_ARG [ start_ARG start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (5)
𝐦τ:=[𝐱τ𝐯τ]=[xτvτ]assignsubscript𝐦𝜏matrixsubscript𝐱𝜏subscript𝐯𝜏matrixsubscript𝑥𝜏subscript𝑣𝜏\displaystyle{\mathbf{m}}_{\tau}:=\begin{bmatrix}{\mathbf{x}}_{\tau}\\ {\mathbf{v}}_{\tau}\end{bmatrix}=\begin{bmatrix}x_{\tau}\\ v_{\tau}\end{bmatrix}bold_m start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT := [ start_ARG start_ROW start_CELL bold_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_v start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ,𝐑=[𝐫𝟎𝟎𝐫]𝑰d,x1pdata.\displaystyle,\quad{\mathbf{R}}=\begin{bmatrix}{\mathbf{r}}&{\mathbf{0}}\\ {\mathbf{0}}&{\mathbf{r}}\end{bmatrix}\otimes{\bm{I}}_{d},\quad x_{1}\sim p_{% \rm{data}}., bold_R = [ start_ARG start_ROW start_CELL bold_r end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL bold_r end_CELL end_ROW end_ARG ] ⊗ bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT , italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT .

In this context, the matrix 𝐑𝐑{\mathbf{R}}bold_R is recognized as the terminal cost matrix, serving to assess the proximity between the propagated 𝐦1subscript𝐦1{\mathbf{m}}_{1}bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and the ground truth m1subscript𝑚1m_{1}italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT at the terminal time t=1𝑡1t=1italic_t = 1. As the parameter 𝐫𝐫{\mathbf{r}}bold_r approaches positive infinity, the trajectory converges toward the state x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, prompting a transition to constrained dynamics wherein the system becomes constrained by two predetermined boundaries, namely m0subscript𝑚0m_{0}italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and m1subscript𝑚1m_{1}italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. This configuration aligns seamlessly with the principles of constructing a feasible bridge, as advocated by the tenets of BM. It is worth noting that this interpolation approach essentially represents a natural extension (Chen & Georgiou, 2015) of the well-established concept of the Brownian Bridge (Revuz & Yor, 2013), which has been employed in trajectory inference (Somnath et al., 2023; Tong et al., 2023a) and image inpainting tasks (Liu et al., 2023) and its connection with Diffusion has been discussed in Liu et al. (2023). Indeed, it is evident that the target velocity lacks a precise definition within this problem, allowing for flexibility in the design space for our approach. To address this, we opt for the linear interpolation of the intermediate point and the target point, represented as 𝐯1=(𝐱1𝐱t)/(1t)subscript𝐯1subscript𝐱1subscript𝐱𝑡1𝑡{\mathbf{v}}_{1}=({\mathbf{x}}_{1}-{\mathbf{x}}_{t})/(1-t)bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) / ( 1 - italic_t ), as the chosen terminal velocity, which also is the optimal control in the original space (see Appendix..D.1). This choice is made due to its ability to construct a trajectory characterized by straightness. Conceptually, the acceleration 𝐚tsubscript𝐚𝑡{\mathbf{a}}_{t}bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT continually guides the dynamics towards the linear interpolation of the two data points, serving to mitigate the impact of introduced stochasticity. In contrast to previous bridge matching frameworks, the velocity’s boundary condition in our approach varies over time since it depends on the state 𝐱tsubscript𝐱𝑡{\mathbf{x}}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and t𝑡titalic_t. The velocity variable serves solely as an auxiliary component aimed at straightening the trajectories. Regarding this SOC problem formulation, the solution is,

Proposition 3 (Phase Space Brownian Bridge).

When 𝐫+𝐫{\mathbf{r}}\rightarrow+\inftybold_r → + ∞, The solution w.r.t optimization problem 5 is,

𝐚(𝐦t,t)=gt2P11(𝐱1𝐱t1t𝐯t)where:P11=4gt2(t1).\displaystyle{\mathbf{a}}^{*}({\mathbf{m}}_{t},t)=g_{t}^{2}P_{11}\left(\frac{{% \mathbf{x}}_{1}-{\mathbf{x}}_{t}}{1-t}-{\mathbf{v}}_{t}\right)\quad\text{where% }:\quad P_{11}=\frac{-4}{g_{t}^{2}(t-1)}.bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) = italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG - bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) where : italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = divide start_ARG - 4 end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) end_ARG . (6)
Proof.

Please see Appendix.D.2. ∎

Remark 4.

P11subscript𝑃11P_{11}italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT denotes the second diagonal component in the matrix Ptsubscript𝑃𝑡P_{t}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, a solution derived from the Lyapunov equation (see Lemma.9), serving as an implicit representation of the optimality of the control. This value is dependent upon the uncontrolled dynamics, where 𝐚tsubscript𝐚𝑡{\mathbf{a}}_{t}bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is set to the zero vector in (5), and will vary accordingly when uncontrolled dynamics change.

Refer to caption
Figure 2: Data estimation comparison with EDM (Karras et al., 2022). When the network is endowed with supplementary velocity, AGM gains the capacity to estimate the target data point during the early stages of the trajectory. One can use estimated image 𝐱~1subscript~𝐱1\tilde{{\mathbf{x}}}_{1}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT at ti<tNsubscript𝑡𝑖subscript𝑡𝑁t_{i}<t_{N}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT as generated results and allocated more NFE between time [0,ti]0subscript𝑡𝑖[0,t_{i}][ 0 , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] which results to smaller discretization error.

3.1 Training

By plugging the optimal control (6) back to the dynamics (5), we can obtain the desired SDE. As been suggested by (Song et al., 2020b; Dockhorn et al., 2021), such SDE has a corresponding probablistic ODE which shares the same marginal over time in which the drift term will have an additional score term 𝐯logp(𝐦t,t)subscript𝐯𝑝subscript𝐦𝑡𝑡\nabla_{{\mathbf{v}}}\log p({\mathbf{m}}_{t},t)∇ start_POSTSUBSCRIPT bold_v end_POSTSUBSCRIPT roman_log italic_p ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ). Here we summarize the force term for SDE and ODE as:

[d𝐱td𝐯t]=[𝐯t𝐅t]dt+[𝟎𝟎𝟎ht]d𝐰ts.t𝐦0:=[𝐱0𝐯0]𝒩(𝝁0,𝚺0),formulae-sequencematrixdsubscript𝐱𝑡dsubscript𝐯𝑡matrixsubscript𝐯𝑡subscript𝐅𝑡d𝑡matrix000subscript𝑡dsubscript𝐰𝑡s.tassignsubscript𝐦0matrixsubscript𝐱0subscript𝐯0similar-to𝒩subscript𝝁0subscript𝚺0\displaystyle\begin{bmatrix}{\textnormal{d}}{\mathbf{x}}_{t}\\ {\textnormal{d}}{\mathbf{v}}_{t}\end{bmatrix}=\begin{bmatrix}{\mathbf{v}}_{t}% \\ {\mathbf{F}}_{t}\end{bmatrix}{\textnormal{d}}t+\begin{bmatrix}{\mathbf{0}}&{% \mathbf{0}}\\ {\mathbf{0}}&h_{t}\end{bmatrix}{\textnormal{d}}{\mathbf{w}}_{t}\quad\text{s.t}% \quad{\mathbf{m}}_{0}:=\begin{bmatrix}{\mathbf{x}}_{0}\\ {\mathbf{v}}_{0}\end{bmatrix}\sim\mathcal{N}({\bm{\mu}}_{0},{\bm{\Sigma}}_{0}),[ start_ARG start_ROW start_CELL d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL d bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] d italic_t + [ start_ARG start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL italic_h start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT s.t bold_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := [ start_ARG start_ROW start_CELL bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ∼ caligraphic_N ( bold_italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , (7)
Bridge Matching SDE:𝐅t:=𝐅tb(𝐦t,t)𝐚t(𝐦t,t),h(t):=g(t),\displaystyle\text{Bridge Matching SDE}:{\mathbf{F}}_{t}:={\mathbf{F}}_{t}^{b}% ({\mathbf{m}}_{t},t)\equiv{\mathbf{a}}_{t}^{*}({\mathbf{m}}_{t},t),\quad\quad% \quad\quad\quad\quad\quad\quad h(t):=g(t),Bridge Matching SDE : bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ≡ bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) , italic_h ( italic_t ) := italic_g ( italic_t ) ,
Probablistic ODE:𝐅t:=𝐅tp(𝐦t,t)𝐚t(𝐦t,t)12gt2𝐯logp(𝐦,t),h(t):=0.\displaystyle\text{Probablistic ODE}:{\mathbf{F}}_{t}:={\mathbf{F}}_{t}^{p}({% \mathbf{m}}_{t},t)\equiv{\mathbf{a}}^{*}_{t}({\mathbf{m}}_{t},t)-\frac{1}{2}g_% {t}^{2}\nabla_{{\mathbf{v}}}\log p({\mathbf{m}},t),\quad h(t):=0.Probablistic ODE : bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ≡ bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_v end_POSTSUBSCRIPT roman_log italic_p ( bold_m , italic_t ) , italic_h ( italic_t ) := 0 .

Henceforth, we refer to the dynamics associated with the Bridge Matching SDE as AGM-SDE, and its corresponding ODE counterpart as AGM-ODE. Meanwhile, the linearity of the system implies the intermediate state 𝐦tsubscript𝐦𝑡{\mathbf{m}}_{t}bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the close form solution of score term are analytically available. In particular, the mean 𝝁tsubscript𝝁𝑡{\bm{\mu}_{t}}bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and covariance matrix 𝚺tsubscript𝚺𝑡{\bm{\Sigma}_{t}}bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of the intermediate marginal pt(𝐦t|𝐱1)=𝒩(𝝁t,𝚺t)subscript𝑝𝑡conditionalsubscript𝐦𝑡subscript𝐱1𝒩subscript𝝁𝑡subscript𝚺𝑡p_{t}({\mathbf{m}}_{t}|{\mathbf{x}}_{1})=\mathcal{N}({\bm{\mu}_{t}},{\bm{% \Sigma}_{t}})italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = caligraphic_N ( bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) of such a system can be analytically computed with 𝚺t=[ΣtxxΣtxvΣtxvΣtvv]𝑰dsubscript𝚺𝑡tensor-productmatrixsubscriptsuperscriptΣ𝑥𝑥𝑡subscriptsuperscriptΣ𝑥𝑣𝑡subscriptsuperscriptΣ𝑥𝑣𝑡subscriptsuperscriptΣ𝑣𝑣𝑡subscript𝑰𝑑{\bm{\Sigma}}_{t}=\begin{bmatrix}{\Sigma^{xx}_{t}}&{\Sigma^{xv}_{t}}\\ {\Sigma^{xv}_{t}}&{\Sigma^{vv}_{t}}\end{bmatrix}\otimes{\bm{I}}_{d}bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL roman_Σ start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL start_CELL roman_Σ start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL roman_Σ start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL start_CELL roman_Σ start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ⊗ bold_italic_I start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT, and 𝝁t=[μtxμtv]subscript𝝁𝑡matrixsubscriptsuperscript𝜇𝑥𝑡subscriptsuperscript𝜇𝑣𝑡{\bm{\mu}_{t}}=\begin{bmatrix}{\mu^{x}_{t}}\\ {\mu^{v}_{t}}\end{bmatrix}bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ], provided we have the boundary conditions 𝝁0subscript𝝁0{\bm{\mu}}_{0}bold_italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and 𝚺0subscript𝚺0{\bm{\Sigma}}_{0}bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in place, as outlined in Särkkä & Solin (2019). Please see Appendix.D.3 for detail. In order to sample from such multi-variant Gaussian, one need to decompose the covariance matrix by Cholesky decomposition, and 𝐦tsubscript𝐦𝑡{\mathbf{m}}_{t}bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is reparamertized as:

𝐦t=𝝁t+𝐋tϵ=𝝁t+[Ltxxϵ0Ltxvϵ0+Ltvvϵ1],𝐯logpt:=tϵ1formulae-sequencesubscript𝐦𝑡subscript𝝁𝑡subscript𝐋𝑡italic-ϵsubscript𝝁𝑡matrixsuperscriptsubscript𝐿𝑡𝑥𝑥subscriptbold-italic-ϵ0superscriptsubscript𝐿𝑡𝑥𝑣subscriptbold-italic-ϵ0superscriptsubscript𝐿𝑡𝑣𝑣subscriptbold-italic-ϵ1assignsubscript𝐯subscript𝑝𝑡subscript𝑡subscriptbold-italic-ϵ1\displaystyle{\mathbf{m}}_{t}={\bm{\mu}_{t}}+{\mathbf{L}}_{t}\epsilon={\bm{\mu% }_{t}}+\begin{bmatrix}L_{t}^{xx}{\bm{\epsilon}_{0}}\\ L_{t}^{xv}{\bm{\epsilon}_{0}}+L_{t}^{vv}{\bm{\epsilon}_{1}}\\ \end{bmatrix},\nabla_{{\mathbf{v}}}\log p_{t}:=-\ell_{t}{\bm{\epsilon}_{1}}bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + bold_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_ϵ = bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + [ start_ARG start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] , ∇ start_POSTSUBSCRIPT bold_v end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := - roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (8)

where 𝚺t=𝐋t𝐋t𝖳subscript𝚺𝑡subscript𝐋𝑡superscriptsubscript𝐋𝑡𝖳{\bm{\Sigma}_{t}}={\mathbf{L}}_{t}{\mathbf{L}}_{t}^{\mathsf{T}}bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ,ϵ=[ϵ0ϵ1]𝒩(𝟎,𝐈2d)italic-ϵmatrixsubscriptbold-italic-ϵ0subscriptbold-italic-ϵ1similar-to𝒩0subscript𝐈2𝑑\epsilon=\begin{bmatrix}{\bm{\epsilon}_{0}}\\ {\bm{\epsilon}_{1}}\end{bmatrix}\sim\mathcal{N}({\mathbf{0}},\mathbf{I}_{2d})italic_ϵ = [ start_ARG start_ROW start_CELL bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ∼ caligraphic_N ( bold_0 , bold_I start_POSTSUBSCRIPT 2 italic_d end_POSTSUBSCRIPT ) and t=ΣtxxΣtxxΣtvv(Σtxv)2 subscript𝑡ΣtxxΣtxxΣtvv(Σtxv)2 \ell_{t}=\mathchoice{{\hbox{$\displaystyle\sqrt{\frac{{\Sigma^{xx}_{t}}}{{% \Sigma^{xx}_{t}}{\Sigma^{vv}_{t}}-({\Sigma^{xv}_{t}})^{2}}\,}$}\lower 0.4pt% \hbox{\vrule height=15.66331pt,depth=-12.5307pt}}}{{\hbox{$\textstyle\sqrt{% \frac{{\Sigma^{xx}_{t}}}{{\Sigma^{xx}_{t}}{\Sigma^{vv}_{t}}-({\Sigma^{xv}_{t}}% )^{2}}\,}$}\lower 0.4pt\hbox{\vrule height=11.01904pt,depth=-8.81528pt}}}{{% \hbox{$\scriptstyle\sqrt{\frac{{\Sigma^{xx}_{t}}}{{\Sigma^{xx}_{t}}{\Sigma^{vv% }_{t}}-({\Sigma^{xv}_{t}})^{2}}\,}$}\lower 0.4pt\hbox{\vrule height=8.65237pt,% depth=-6.92194pt}}}{{\hbox{$\scriptscriptstyle\sqrt{\frac{{\Sigma^{xx}_{t}}}{{% \Sigma^{xx}_{t}}{\Sigma^{vv}_{t}}-({\Sigma^{xv}_{t}})^{2}}\,}$}\lower 0.4pt% \hbox{\vrule height=8.65237pt,depth=-6.92194pt}}}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ΣxxtΣxxtΣvvt-(Σxvt)2.

Parameterization: The Force term can be represented as a composite of the data point and Gaussian noise. Specifically,

𝐚(𝐦t,t)=4𝐱1(1t)2gt2P11[(Ltxx1t+Ltxv)ϵ0+Ltvvϵ1].superscript𝐚subscript𝐦𝑡𝑡4subscript𝐱1superscript1𝑡2superscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]superscriptsubscript𝐿𝑡𝑥𝑥1𝑡superscriptsubscript𝐿𝑡𝑥𝑣subscriptbold-italic-ϵ0superscriptsubscript𝐿𝑡𝑣𝑣subscriptbold-italic-ϵ1\displaystyle{\mathbf{a}}^{*}({\mathbf{m}}_{t},t)=4{\mathbf{x}}_{1}(1-t)^{2}-g% _{t}^{2}P_{11}\left[\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}\right){\bm{% \epsilon}_{0}}+L_{t}^{vv}{\bm{\epsilon}_{1}}\right].bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) = 4 bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ ( divide start_ARG italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_t end_ARG + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT ) bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] . (9)

We express the force term as 𝐅tθ=𝐬tθ𝐳tsuperscriptsubscript𝐅𝑡𝜃subscriptsuperscript𝐬𝜃𝑡subscript𝐳𝑡{\mathbf{F}}_{t}^{\theta}={\mathbf{s}}^{\theta}_{t}\cdot{\mathbf{z}}_{t}bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT = bold_s start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⋅ bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. Here, 𝐳tsubscript𝐳𝑡{\mathbf{z}}_{t}bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT assumes the role of regulating the output of the network 𝐬tθsubscriptsuperscript𝐬𝜃𝑡{\mathbf{s}}^{\theta}_{t}bold_s start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, ensuring that the variance of the network output is normalized to unity. For the detailed formulation of the normalizer 𝐳tsubscript𝐳𝑡{\mathbf{z}}_{t}bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, please refer to Appendix.D.8. In a manner similar to the BM approach, one can formulate the objective function for regressing the force term as follows:

minθ𝔼t[0,1]𝔼𝐱1pdata𝔼𝐦tpt(𝐦t|𝐱1)λ(t)[𝐅tθ(𝐦t,t;θ)𝐅t(𝐦t,t)22]subscript𝜃subscript𝔼𝑡01subscript𝔼similar-tosubscript𝐱1subscript𝑝datasubscript𝔼similar-tosubscript𝐦𝑡subscript𝑝𝑡conditionalsubscript𝐦𝑡subscript𝐱1𝜆𝑡delimited-[]superscriptsubscriptdelimited-∥∥superscriptsubscript𝐅𝑡𝜃subscript𝐦𝑡𝑡𝜃subscript𝐅𝑡subscript𝐦𝑡𝑡22\displaystyle\min_{\theta}\mathbb{E}_{t\in[0,1]}\mathbb{E}_{{\mathbf{x}}_{1}% \sim p_{\rm{data}}}\mathbb{E}_{{\mathbf{m}}_{t}\sim p_{t}({\mathbf{m}}_{t}|{% \mathbf{x}}_{1})}\lambda(t)\left[\lVert{\mathbf{F}}_{t}^{\theta}({\mathbf{m}}_% {t},t;\theta)-{\mathbf{F}}_{t}({\mathbf{m}}_{t},t)\rVert_{2}^{2}\right]roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_λ ( italic_t ) [ ∥ bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ; italic_θ ) - bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (10)

Where λ(t)𝜆𝑡\lambda(t)italic_λ ( italic_t ) is known as the reweight of the objective function across the time horizon. We defer the derivation of tsubscript𝑡\ell_{t}roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and the presentation of 𝐋tsubscript𝐋𝑡{\mathbf{L}}_{t}bold_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, λ(t)𝜆𝑡\lambda(t)italic_λ ( italic_t ) and 𝐚tsubscript𝐚𝑡{\mathbf{a}}_{t}bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in Appendix.D.

3.2 Sampling from AGM

Once the paramterized force term 𝐅tθsuperscriptsubscript𝐅𝑡𝜃{\mathbf{F}}_{t}^{\theta}bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT is trained, we are ready to simulate the dynamics to generate the samples by plugging it back to the dynamics (7). One can use any type of SDE or ODE sampler to propagate the learnt system. Here we list our choice of sampler for AGM-SDE and AGM-ODE.

Stochastic Sampler: To simulate the SDE, prior works are majorly relying on Euler-Maruyama(EM) (Kloeden et al., 1992) and related methods. We adopt the Symmetric Splitting Sampler(SSS) from Dockhorn et al. (2021) in our AGM-SDE. This selection is based on the compelling performance it offers when dealing with momentum systems.

Deterministic Sampler: It is imperative to acknowledge that this system is inherently underactuated because the force term is exclusively injected into the velocity component, while velocity serves as the driving factor for the position—a variable of primary interest in generative modeling context. More specifically, at time step tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, the impact of force does not immediately manifest in the position but rather takes effect at a subsequent time step, denoted as ti+1subscript𝑡𝑖1t_{i+1}italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT after discretizing time horizon. At time t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, it becomes undesirable to propagate the state 𝐱0subscript𝐱0{\mathbf{x}}_{0}bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT using an initially uncontrolled velocity over an extended time interval δ0subscript𝛿0\delta_{0}italic_δ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. The presence of this delay phenomenon can also exert an influence when the time interval δtsubscript𝛿𝑡\delta_{t}italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is large, thereby impeding our ability to reduce the NFE during sampling. We propose the adoption of an Exponential Integrator (EI) approach, as elaborated in Zhang & Chen (2022). Empirical evidence suggests that this method aligns well with our model. We provide an illustrative example of how the AGM-ODE, in conjunction with the EI technique, can be employed to inject the learnt network into both velocity and position channels simultaneously:

[𝐱ti+1𝐯ti+1]matrixsubscript𝐱subscript𝑡𝑖1subscript𝐯subscript𝑡𝑖1\displaystyle\begin{bmatrix}{\mathbf{x}}_{t_{i+1}}\\ {\mathbf{v}}_{t_{i+1}}\end{bmatrix}[ start_ARG start_ROW start_CELL bold_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] =Φ(ti+1,ti)[𝐱t𝐯t]+j=0w[titi+1(ti+1τ)𝐳τ𝐌i,j(τ)dτ𝐬tθ(𝐦tij,tij))titi+1𝐳τ𝐌i,j(τ)dτ𝐬tθ(𝐦tij,tij)]\displaystyle=\Phi(t_{i+1},t_{i})\begin{bmatrix}{\mathbf{x}}_{t}\\ {\mathbf{v}}_{t}\end{bmatrix}+\sum_{j=0}^{w}\begin{bmatrix}\int_{t_{i}}^{t_{i+% 1}}\left(t_{i+1}-\tau\right){\mathbf{z}}_{\tau}\cdot{\mathbf{M}}_{i,j}(\tau){% \textnormal{d}}\tau\cdot{\ignorespaces\color[rgb]{0,0.5,0}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0.5,0}{\mathbf{s}}^{\theta}_{t}({\mathbf{m}}_{t_{i-j}},% t_{i-j})})\\ \int_{t_{i}}^{t_{i+1}}{\mathbf{z}}_{\tau}\cdot{\mathbf{M}}_{i,j}(\tau){% \textnormal{d}}\tau\cdot{\ignorespaces\color[rgb]{0,0.5,0}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0.5,0}{\mathbf{s}}^{\theta}_{t}({\mathbf{m}}_{t_{i-j}},% t_{i-j})}\end{bmatrix}= roman_Φ ( italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) [ start_ARG start_ROW start_CELL bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT - italic_τ ) bold_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ⋅ bold_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( italic_τ ) d italic_τ ⋅ bold_s start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i - italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i - italic_j end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL ∫ start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT bold_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT ⋅ bold_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( italic_τ ) d italic_τ ⋅ bold_s start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i - italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i - italic_j end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ] (11)
Where𝐌i,j(τ)=kj(τtiktijtik),andΦ(t,s)=[1ts01].formulae-sequenceWheresubscript𝐌𝑖𝑗𝜏subscriptproduct𝑘𝑗𝜏subscript𝑡𝑖𝑘subscript𝑡𝑖𝑗subscript𝑡𝑖𝑘andΦ𝑡𝑠matrix1𝑡𝑠01\displaystyle\text{Where}\ {\mathbf{M}}_{i,j}(\tau)=\prod_{k\neq j}\left(\frac% {\tau-t_{i-k}}{t_{i-j}-t_{i-k}}\right),\quad\text{and}\quad\Phi(t,s)=\begin{% bmatrix}1&t-s\\ 0&1\end{bmatrix}.Where bold_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( italic_τ ) = ∏ start_POSTSUBSCRIPT italic_k ≠ italic_j end_POSTSUBSCRIPT ( divide start_ARG italic_τ - italic_t start_POSTSUBSCRIPT italic_i - italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_t start_POSTSUBSCRIPT italic_i - italic_j end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_i - italic_k end_POSTSUBSCRIPT end_ARG ) , and roman_Φ ( italic_t , italic_s ) = [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_t - italic_s end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] .

In Eq.11, Φ(s,t)Φ𝑠𝑡\Phi(s,t)roman_Φ ( italic_s , italic_t ) denotes the transition matrix for our system, while 𝐌i,j(τ)subscript𝐌𝑖𝑗𝜏{\mathbf{M}}_{i,j}(\tau)bold_M start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ( italic_τ ) represents the wlimit-from𝑤w-italic_w -order multistep coefficient (Hochbruck & Ostermann, 2010). For a comprehensive derivation of these terms, please refer to Appendix.D.9. It is worth noting that the map** of 𝐬θsubscript𝐬𝜃{\mathbf{s}}_{\theta}bold_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT into both the position and velocity channels significantly emulates the errors introduced by discretization delays. Sampling-hop: In the context of CLD (Dockhorn et al., 2021), their focus is on estimating the score function w.r.t. velocity, which essentially corresponds to estimating scaled ϵ1subscriptbold-italic-ϵ1{\bm{\epsilon}}_{1}bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in our notation. However, relying solely on the aforementioned information is not sufficient for estimating the data point 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Additional knowledge regarding ϵ0subscriptbold-italic-ϵ0{\bm{\epsilon}_{0}}bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is also required in order to perform such estimation. In our case, the training objective implicitly includes both ϵ0subscriptbold-italic-ϵ0{\bm{\epsilon}_{0}}bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and ϵ1subscriptbold-italic-ϵ1{\bm{\epsilon}_{1}}bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (see eq.9), hence one can manage to recover 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by Proposition.5. Remarkably, our observations have unveiled that when the network is equipped with additional velocity information, it acquires the capability to estimate the target data point during the early stages of the trajectory, as illustrated in fig.2. This estimation can be seamlessly integrated into AGM-SDE and AGM-ODE and we name it sampling-hop. Specifically,

Proposition 5 (Sampling-Hop).

Given the state, velocity and trained force term 𝐅tθsuperscriptsubscript𝐅𝑡𝜃{\mathbf{F}}_{t}^{\theta}bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT at time step t𝑡titalic_t in sampling phase, The estimated data point 𝐱~1subscript~𝐱1\tilde{{\mathbf{x}}}_{1}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT can be represented as

𝐱~1SDE=(1t)(𝐅tθ+𝐯t)gt2P11+𝐱t,superscriptsubscript~𝐱1𝑆𝐷𝐸1𝑡superscriptsubscript𝐅𝑡𝜃subscript𝐯𝑡superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐱𝑡\displaystyle\tilde{{\mathbf{x}}}_{1}^{SDE}=\frac{(1-t)({\mathbf{F}}_{t}^{% \theta}+{\mathbf{v}}_{t})}{g_{t}^{2}P_{11}}+{\mathbf{x}}_{t},\ \ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S italic_D italic_E end_POSTSUPERSCRIPT = divide start_ARG ( 1 - italic_t ) ( bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT + bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG + bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , or𝐱~1ODE=𝐅tθ+gt2P11(αt𝐱t+βt𝐯t)4(t1)2+gt2P11(αtμtx+βtμtv)orsubscriptsuperscript~𝐱𝑂𝐷𝐸1superscriptsubscript𝐅𝑡𝜃superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝛼𝑡subscript𝐱𝑡subscript𝛽𝑡subscript𝐯𝑡4superscript𝑡12superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝛼𝑡subscriptsuperscript𝜇𝑥𝑡subscript𝛽𝑡subscriptsuperscript𝜇𝑣𝑡\displaystyle\text{or}\quad\tilde{{\mathbf{x}}}^{ODE}_{1}=\frac{{\mathbf{F}}_{% t}^{\theta}+g_{t}^{2}P_{11}(\alpha_{t}{\mathbf{x}}_{t}+\beta_{t}{\mathbf{v}}_{% t})}{4(t-1)^{2}+g_{t}^{2}P_{11}(\alpha_{t}{\mu^{x}_{t}}+\beta_{t}{\mu^{v}_{t}})}or over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_O italic_D italic_E end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG 4 ( italic_t - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG (12)

for AGM-SDE and AGM-ODE dynamics respectively, and βt=Ltvv+12P11subscript𝛽𝑡subscriptsuperscript𝐿𝑣𝑣𝑡12subscript𝑃11\beta_{t}={L^{vv}_{t}}+\frac{1}{2P_{11}}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG,αt=(Ltxx1t+Ltxv)βtLtxvLtxxsubscript𝛼𝑡superscriptsubscript𝐿𝑡𝑥𝑥1𝑡superscriptsubscript𝐿𝑡𝑥𝑣subscript𝛽𝑡subscriptsuperscript𝐿𝑥𝑣𝑡subscriptsuperscript𝐿𝑥𝑥𝑡\alpha_{t}=\frac{(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv})-\beta_{t}L^{xv}_{t}}{L^{% xx}_{t}}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG ( divide start_ARG italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_t end_ARG + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT ) - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG.

Proof.

See Appendix.D.10

This property empowers us to allocate the NFE budget selectively within the time interval t[0,ti]𝑡0subscript𝑡𝑖t\in[0,t_{i}]italic_t ∈ [ 0 , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ], where ti<tNsubscript𝑡𝑖subscript𝑡𝑁t_{i}<t_{N}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT < italic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT, effectively reducing the discretization error while maintaining the sampling quality. This insight paves the way for efficient low NFE sampling strategies later. Here we summarized the training and sampling procedure of our method in Algorithm.1 and Algorithm.2 respectively. Algorithm 1 Training 1:  Input: data distribution pdata()subscript𝑝datap_{\rm{data}}(\cdot)italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT ( ⋅ ) 2:  while not converge do 3:     t𝒰([0,1])similar-to𝑡𝒰01t\sim\mathcal{U}([0,1])italic_t ∼ caligraphic_U ( [ 0 , 1 ] ), 𝐱1pdata(𝐱1)similar-tosubscript𝐱1subscript𝑝datasubscript𝐱1{\mathbf{x}}_{1}\sim p_{\rm{data}}({\mathbf{x}}_{1})bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) 4:     Compute mean and covariance 𝝁tsubscript𝝁𝑡{\bm{\mu}_{t}}bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 𝚺tsubscript𝚺𝑡{\bm{\Sigma}_{t}}bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. (Appendix.D.3) 5:     Sample 𝐦t=𝝁t+𝐋tϵsubscript𝐦𝑡subscript𝝁𝑡subscript𝐋𝑡bold-italic-ϵ{\mathbf{m}}_{t}={\bm{\mu}_{t}}+{\mathbf{L}}_{t}{\bm{\epsilon}}bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + bold_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ.(eq.8) 6:     Compute target 𝐅tsubscript𝐅𝑡{\mathbf{F}}_{t}bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (eq.7) using optimal acceleration (eq.9) 7:     Compute loss 𝔼[λ𝐅tθ𝐅t22]𝔼delimited-[]𝜆superscriptsubscriptdelimited-∥∥superscriptsubscript𝐅𝑡𝜃subscript𝐅𝑡22\mathbb{E}\left[\lambda\lVert{\mathbf{F}}_{t}^{\theta}-{\mathbf{F}}_{t}\rVert_% {2}^{2}\right]blackboard_E [ italic_λ ∥ bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT - bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ](eq.10). 8:     Take gradient descent with respect to 𝐅tθ(𝐦t,t;θ)superscriptsubscript𝐅𝑡𝜃subscript𝐦𝑡𝑡𝜃{\mathbf{F}}_{t}^{\theta}({\mathbf{m}}_{t},t;\theta)bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ; italic_θ ). 9:  end while Algorithm 2 Sampling 1:  Input: trained 𝐅(,;θ)𝐅𝜃{\mathbf{F}}(\cdot,\cdot;\theta)bold_F ( ⋅ , ⋅ ; italic_θ ), discretized time step [t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT,\cdots,tisubscript𝑡𝑖t_{i}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT], Choose the sampler from [SSS(SDE), EI(ODE)]. Choose prior mean and covariance 𝝁0subscript𝝁0{\bm{\mu}}_{0}bold_italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, 𝚺0subscript𝚺0{\bm{\Sigma}}_{0}bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT 2:  Sample 𝐦0p0(𝐦;𝝁0,𝚺0)similar-tosubscript𝐦0subscript𝑝0𝐦subscript𝝁0subscript𝚺0{\mathbf{m}}_{0}\sim p_{0}({\mathbf{m}};{\bm{\mu}}_{0},{\bm{\Sigma}}_{0})bold_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( bold_m ; bold_italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). 3:  for n = 00 to i𝑖iitalic_i do 4:     estimate 𝐅tnθ(𝐦tn,tn)superscriptsubscript𝐅subscript𝑡𝑛𝜃subscript𝐦subscript𝑡𝑛subscript𝑡𝑛{\mathbf{F}}_{t_{n}}^{\theta}({\mathbf{m}}_{t_{n}},t_{n})bold_F start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) 5:     𝐦tn+1=Sampler(𝐦tn,Ftnθ,tn)subscript𝐦subscript𝑡𝑛1Samplersubscript𝐦subscript𝑡𝑛superscriptsubscript𝐹subscript𝑡𝑛𝜃subscript𝑡𝑛{\mathbf{m}}_{t_{n+1}}=\textbf{Sampler}({\mathbf{m}}_{t_{n}},F_{t_{n}}^{\theta% },t_{n})bold_m start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT = Sampler ( bold_m start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT , italic_t start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) 6:     reconstruct 𝐱^1subscript^𝐱1\hat{{\mathbf{x}}}_{1}over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT using Proposition.5. 7:  end for 8:  Return 𝐱^1subscript^𝐱1\hat{{\mathbf{x}}}_{1}over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

Refer to caption

Figure 3: The standard deviaton σ𝜎\sigmaitalic_σ of the terminal marginal for uncontrolled dynamics. We empirically selected the hyperparameter k=0.2𝑘0.2k=-0.2italic_k = - 0.2. This choice induces a terminal marginal distribution with σ𝜎\sigmaitalic_σ that covers the data range with uncontrolled dynamics.

4 Experimental Results

Architectures and Hyperparameters: We parameterize 𝐬tθ(,;θ)superscriptsubscript𝐬𝑡𝜃𝜃{\mathbf{s}}_{t}^{\theta}(\cdot,\cdot;\theta)bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( ⋅ , ⋅ ; italic_θ ) using modified NCSN++ model as provided in Karras et al. (2022). We employ six input channels, accounting for both position and velocity variables, as opposed to the standard three channels used in the CIFAR-10 (Krizhevsky et al., 2009), AFHQv2 (Choi et al., 2020) and ImageNet (Deng et al., 2009) which leads to a negligible increase of network parameters. For the purpose of comparison with CLD in the toy dataset, we adopt the same ResNet-based architecture utilized in CLD. Throughout all of our experiments, we maintain a monotonically decreasing diffusion coefficient, given by g(t)=3(1t)𝑔𝑡31𝑡g(t)=3(1-t)italic_g ( italic_t ) = 3 ( 1 - italic_t ). For the detailed experimental setup, please refer further to Appendix.E.

Evaluation: To assess the performance and the sampling speed of various algorithms, we employ the Fréchet Inception Distance score (FID;Heusel et al. (2017)) and the Number of Function Evaluations (NFE) as our metrics. For FID evaluation, we utilize reference statistics of all datasets obtained from EDM (Karras et al., 2022) and use 50k generated samples to evaluate. Additionally, we re-evaluate the FID of CLD and EDM using the same reference statistics to ensure consistency in our comparisons. For all other reported values, we directly source them from respective referenced papers.

Selection of 𝚺0subscript𝚺0{\bm{\Sigma}}_{0}bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT: The choice of initial covariance 𝚺0subscript𝚺0{\bm{\Sigma}}_{0}bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT directly influences the path measure of the trajectory. In our case, we set 𝚺0:=[1kk1]assignsubscript𝚺0delimited-[]1𝑘𝑘1{\bm{\Sigma}}_{0}:=\bigl{[}\begin{smallmatrix}1&k\\ k&1\end{smallmatrix}\bigr{]}bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := [ start_ROW start_CELL 1 end_CELL start_CELL italic_k end_CELL end_ROW start_ROW start_CELL italic_k end_CELL start_CELL 1 end_CELL end_ROW ] with hyperparameter k𝑘kitalic_k. We observe that trajectories tend to exhibit pronounced curvature under specific conditions: when the k𝑘kitalic_k is positive, the absolute value of the position is large. This behavior is particularly noticeable when dealing with images, where the data scale ranges from -1 to 1. We aim for favorable uncontrolled dynamics, as this can potentially lead to better-controlled dynamics. Our strategy is to design k𝑘kitalic_k in such a way that the marginal distribution of uncontrolled dynamics at tN=1subscript𝑡𝑁1t_{N}=1italic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = 1 effectively covers the range of image data values meanwhile k𝑘kitalic_k keeps negative. We can express the marginal of uncontrolled dynamics by leveraging the transition matrix Φ(1,0)Φ10\Phi(1,0)roman_Φ ( 1 , 0 ), which gives us 𝐱1:=𝐱0+𝐯0assignsubscript𝐱1subscript𝐱0subscript𝐯0{\mathbf{x}}_{1}:={\mathbf{x}}_{0}+{\mathbf{v}}_{0}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Figure 3 illustrates the standard deviation of 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT for various values of k𝑘kitalic_k. Based on our empirical observations, we choose k=0.2𝑘0.2k=-0.2italic_k = - 0.2 for all experiments, as it effectively covers the data range. The subsequent controlled dynamics (eq.7) will be constructed based on such desired uncontrolled dynamics as established.

Refer to caption
Figure 4: Comparison with EDM (Karras et al., 2022) on AFHQv2 dataset. AGM-ODE exhibits superior generative performance when NFE is exceedingly low, owing to its unique dynamics architecture that incorporates velocity when predicting the estimated data point.
Refer to caption
Figure 5: We showcase that AGM can generate conditional results from an unconditional model by injecting the conditional information into the velocity 𝐯0subscript𝐯0{\mathbf{v}}_{0}bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, thus leading to new initial velocity 𝐯0condsuperscriptsubscript𝐯0𝑐𝑜𝑛𝑑{\mathbf{v}}_{0}^{cond}bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_d end_POSTSUPERSCRIPT.
Table 2: FID\downarrow Comparison with CLD(Dockhorn et al., 2021) using same SSS Sampler on CIFAR-10.
NFE\downarrow CLD-SDE AGM-SDE
20 >>>100 7.9
50 19.93 3.21
150 2.99 2.68
1000 2.44 2.46

Stochastic Sampling: In experiments, we emphasize the advantages of using the AGM-SDE compared with CLD. Firstly, we show that our model exhibits superior performance when NFE is significantly lower than that of CLD, particularly in toy dataset scenarios. For evaluation, we utilized the multi-modal Mixture of Gaussian and Multi-Swiss-Roll datasets. The results obtained from the toy dataset, as shown in Fig.8, demonstrate that AGM-SDE is capable of generating data that closely aligns with the ground truth, while requiring NFE that is around one order of magnitude lower than CLD. Furthermore, our findings reveal that AGM-SDE outperforms CLD in the context of CIFAR-10 image generation tasks, especially when faced with limited NFE, as illustrated in Table 2.

Deterministic Sampling: We validate our algorithm on high-dimensional image generation with a deterministic sampler. We provide uncurated samples from CIFAR-10, AFHQv2 and ImageNet-64 with varying NFE in Appendix.H. Regarding the quantitative evaluation, Table.4 and Table.4 summarize the FID together with NFE used for sampling on CIFAR-10 and ImageNet-64. Notably, AGM-ODE achieves 2.46 FID score with 50 NFE on CIFAR-10, and 10.55 FID score with 20 NFE in unconditional ImageNet-64 which is comparable to the existing dynamical generative modeling.

We underscore the effectiveness of sampling-hop, especially when faced with a constrained NFE budget, in comparison to baselines. We validate it on the CIFAR-10 and AFHQv2 dataset respectively. Fig.4 illustrates that AGM-ODE is able to generate plausible images even when NFE=5absent5=5= 5 and outperforms EDM(Karras et al., 2022) when NFE is extremely small (NFE<<<15) visually and numerically on AFHQv2 dataset. We also compare with other fast sampling algorithms built upon DM in table.5 on CIFAR-10 dataset where AGM-ODE demonstrates competitive performance. Notably, AGM-ODE outperforms the baseline CLD with the same EI sampler by a large margin. We suspect that the improvement is based on the rectified trajectory which is more friendly for the ODE solver.

Conditional Generation We showcase the capability of AGM to generate conditional samples using an unconditional model (fig.5) by incorporating conditional information into the prior velocity variable 𝐯0subscript𝐯0{\mathbf{v}}_{0}bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT. Instead of employing a randomly sampled 𝐯0subscript𝐯0{\mathbf{v}}_{0}bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, we use a linear combination of 𝐯0subscript𝐯0{\mathbf{v}}_{0}bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and the desired velocity 𝐯1=(𝐱1𝐱t0)/(1t0)subscript𝐯1subscript𝐱1subscript𝐱subscript𝑡01subscript𝑡0{\mathbf{v}}_{1}=({\mathbf{x}}_{1}-{\mathbf{x}}_{t_{0}})/(1-t_{0})bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ) / ( 1 - italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ), where 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is conditioned data. Thus, t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, the initial velocity is defined as 𝐯0cond:=(1ξ)𝐯0+ξ𝐯1assignsuperscriptsubscript𝐯0𝑐𝑜𝑛𝑑1𝜉subscript𝐯0𝜉subscript𝐯1{\mathbf{v}}_{0}^{cond}:=(1-\xi){\mathbf{v}}_{0}+\xi{\mathbf{v}}_{1}bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_c italic_o italic_n italic_d end_POSTSUPERSCRIPT := ( 1 - italic_ξ ) bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_ξ bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, with ξ𝜉\xiitalic_ξ serving as a mixing coefficient. Fig.5 shows that AGM can generate conditional data without augmentation and additional fine-tuning. Such property can be extended to the inpainting task as well and the detail can be found in appendix.F.

Table 3: Unconditional CIFAR-10 generative performance
Model Name NFE\downarrow FID\downarrow
ODE EDM (Karras et al., 2022) 35 1.84
CLD+EI (Zhang et al., 2022) 50 2.26
FM-OT (Lipman et al., 2022) 142 6.35
AGM-ODE(ours) 50 2.46
SDE VP (Song et al., 2020b) 1000 2.66
VE (Song et al., 2020b) 1000 2.43
CLD (Dockhorn et al., 2021) 1000 2.44
AGM-SDE(ours) 1000 2.46
Table 4: Unconditional ImageNet-64 generative performance
Model NFE\downarrow FID\downarrow
FM-OT(Lipman et al., 2022) 138 14.45
MFM(Pooladian et al., 2023) 132 11.82
MFM(Pooladian et al., 2023) 40 12.97
AGM-ODE(ours) 40 10.10
AGM-ODE(ours) 30 10.07
AGM-ODE(ours) 20 10.55
Table 5: Performance comparing with fast sampling algorithm using FID\downarrow metric on CIFAR-10
NFE\downarrow 5 10 20
Dynamics Order Model Name
1st order dynamics EDM (Karras et al., 2022) >>> 100 15.78 2.23
VP+EI (Zhang & Chen, 2022) 15.37 4.17 3.03
DDIM (Song et al., 2020a) 26.91 11.14 3.50
Analytic-DPM(Bao et al., 2022) 51.47 14.06 6.74
2nd order dynamics CLD+EI (Zhang et al., 2022) N/A 13.41 3.39
AGM-ODE(ours) 11.93 4.60 2.60

5 Conclusion and Limitation

In this paper, we introduce a novel Acceleration Generative Modeling (AGM) framework rooted in SOC theory. Within this framework, we devise more favorable, straight trajectories for the momentum system. Leveraging the intrinsic characteristics of the momentum system, we capitalize on additional velocity to expedite the sampling process by using the sampling-hop technique, significantly reducing the time required to converge to accurate predictions of realistic data points. Our experimental results, conducted on both toy and image datasets in unconditional generative tasks, demonstrate promising outcomes for fast sampling.

However, it is essential to acknowledge that our approach’s performance lags behind state-of-the-art methods in scenarios with sufficient NFE. This observation suggests avenues for enhancing AGM performance. Such improvements could be achieved by enhancing the training quality through the adoption of techniques proposed in Karras et al. (2022) including data augmentation, fine-tuned noise scheduling, and network preconditioning, among others.

References

  • Anderson (1982) Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
  • Bao et al. (2022) Fan Bao, Chongxuan Li, Jun Zhu, and Bo Zhang. Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503, 2022.
  • Bryson (1975) Arthur Earl Bryson. Applied optimal control: optimization, estimation and control. CRC Press, 1975.
  • Chen et al. (2023) Tianrong Chen, Guan-Horng Liu, Molei Tao, and Evangelos A Theodorou. Deep momentum multi-marginal schr\\\backslash\” odinger bridge. arXiv preprint arXiv:2303.01751, 2023.
  • Chen & Georgiou (2015) Yongxin Chen and Tryphon Georgiou. Stochastic bridges of linear systems. IEEE Transactions on Automatic Control, 61(2):526–531, 2015.
  • Choi et al. (2020) Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  8188–8197, 2020.
  • De Bortoli et al. (2023) Valentin De Bortoli, Guan-Horng Liu, Tianrong Chen, Evangelos A Theodorou, and Weilie Nie. Augmented bridge matching. arXiv preprint arXiv:2311.06978, 2023.
  • Deng et al. (2009) Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  • Dhariwal & Nichol (2021) Prafulla Dhariwal and Alex Nichol. Diffusion models beat gans on image synthesis. arXiv preprint arXiv:2105.05233, 2021.
  • Dockhorn et al. (2021) Tim Dockhorn, Arash Vahdat, and Karsten Kreis. Score-based generative modeling with critically-damped langevin diffusion. arXiv preprint arXiv:2112.07068, 2021.
  • Haussmann & Pardoux (1986) Ulrich G Haussmann and Etienne Pardoux. Time reversal of diffusions. The Annals of Probability, pp.  1188–1205, 1986.
  • Heng et al. (2021) Jeremy Heng, Valentin De Bortoli, Arnaud Doucet, and James Thornton. Simulating diffusion bridges with score matching. arXiv preprint arXiv:2111.07243, 2021.
  • Heusel et al. (2017) Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  • Ho et al. (2020) Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239, 2020.
  • Hochbruck & Ostermann (2010) Marlis Hochbruck and Alexander Ostermann. Exponential integrators. Acta Numerica, 19:209–286, 2010.
  • Inc. (2022) The MathWorks Inc. Matlab version: 9.13.0 (r2022b), 2022. URL https://www.mathworks.com.
  • Kappen (2008) HJ Kappen. Stochastic optimal control theory. ICML, Helsinki, Radbound University, Nijmegen, Netherlands, 2008.
  • Karras et al. (2022) Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
  • Kirk (2004) Donald E Kirk. Optimal control theory: an introduction. Courier Corporation, 2004.
  • Kloeden et al. (1992) Peter E Kloeden, Eckhard Platen, Peter E Kloeden, and Eckhard Platen. Stochastic differential equations. Springer, 1992.
  • Krizhevsky et al. (2009) Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
  • Léonard et al. (2014) Christian Léonard, Sylvie Rœlly, and Jean-Claude Zambrini. Reciprocal processes. a measure-theoretical point of view. 2014.
  • Lipman et al. (2022) Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022.
  • Liu et al. (2023) Guan-Horng Liu, Arash Vahdat, De-An Huang, Evangelos A Theodorou, Weili Nie, and Anima Anandkumar. I2sb: Image-to-image schr\\\backslash\” odinger bridge. arXiv preprint arXiv:2302.05872, 2023.
  • Liu et al. (2022) Xingchao Liu, Lemeng Wu, Mao Ye, and Qiang Liu. Let us build bridges: Understanding and extending diffusion generative models. arXiv preprint arXiv:2208.14699, 2022.
  • Loshchilov & Hutter (2017) Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  • O’Connell (2003) Neil O’Connell. Conditioned random walks and the rsk correspondence. Journal of Physics A: Mathematical and General, 36(12):3049, 2003.
  • Øksendal (2003) Bernt Øksendal. Stochastic differential equations. In Stochastic differential equations, pp.  65–84. Springer, 2003.
  • Pandey et al. (2023) Kushagra Pandey, Maja Rudolph, and Stephan Mandt. Efficient integrators for diffusion generative models. arXiv preprint arXiv:2310.07894, 2023.
  • Peluchetti (2021) Stefano Peluchetti. Non-denoising forward-time diffusions. 2021.
  • Peluchetti (2023) Stefano Peluchetti. Diffusion bridge mixture transports, schr\\\backslash\” odinger bridge problems and generative modeling. arXiv preprint arXiv:2304.00917, 2023.
  • Pooladian et al. (2023) Aram-Alexandre Pooladian, Heli Ben-Hamu, Carles Domingo-Enrich, Brandon Amos, Yaron Lipman, and Ricky Chen. Multisample flow matching: Straightening flows with minibatch couplings. arXiv preprint arXiv:2304.14772, 2023.
  • Revuz & Yor (2013) Daniel Revuz and Marc Yor. Continuous martingales and Brownian motion, volume 293. Springer Science & Business Media, 2013.
  • Särkkä & Solin (2019) Simo Särkkä and Arno Solin. Applied stochastic differential equations, volume 10. Cambridge University Press, 2019.
  • Shi et al. (2022) Yuyang Shi, Valentin De Bortoli, George Deligiannidis, and Arnaud Doucet. Conditional simulation using diffusion schrödinger bridges. In Uncertainty in Artificial Intelligence, pp.  1792–1802. PMLR, 2022.
  • Shi et al. (2023) Yuyang Shi, Valentin De Bortoli, Andrew Campbell, and Arnaud Doucet. Diffusion schr\\\backslash\” odinger bridge matching. arXiv preprint arXiv:2303.16852, 2023.
  • Somnath et al. (2023) Vignesh Ram Somnath, Matteo Pariset, Ya-** Hsieh, Maria Rodriguez Martinez, Andreas Krause, and Charlotte Bunne. Aligned diffusion schr\\\backslash\” odinger bridges. arXiv preprint arXiv:2302.11419, 2023.
  • Song et al. (2020a) Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  • Song et al. (2020b) Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
  • Song et al. (2021) Yang Song, Conor Durkan, Iain Murray, and Stefano Ermon. Maximum likelihood training of score-based diffusion models. arXiv e-prints, pp.  arXiv–2101, 2021.
  • Stengel (1994) Robert F Stengel. Optimal control and estimation. Courier Corporation, 1994.
  • Tong et al. (2023a) Alexander Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, and Yoshua Bengio. Simulation-free schr\\\backslash\” odinger bridges via score and flow matching. arXiv preprint arXiv:2307.03672, 2023a.
  • Tong et al. (2023b) Alexander Tong, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Kilian Fatras, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023b.
  • Yong & Zhou (1999) Jiongmin Yong and Xun Yu Zhou. Stochastic controls: Hamiltonian systems and HJB equations, volume 43. Springer Science & Business Media, 1999.
  • Zhang & Chen (2022) Qinsheng Zhang and Yongxin Chen. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022.
  • Zhang et al. (2022) Qinsheng Zhang, Molei Tao, and Yongxin Chen. gddim: Generalized denoising diffusion implicit models. arXiv preprint arXiv:2206.05564, 2022.
  • Zhang et al. (2023) Qinsheng Zhang, Jiaming Song, and Yongxin Chen. Improved order analysis and design of exponential integrator for diffusion models sampling. arXiv preprint arXiv:2308.02157, 2023.

Appendix A supplementary Summary

We state the assumptions in Appendix.B. We provide the technique details appearing in Section.3 at Appendix.D. The details of the experiments can be found in Appendix.E. The visualization of generated figures can be found in Appendix.H.

Appendix B Assumptions

We will use the following assumptions to construct the proposed method. These assumptions are adopted from stochastic analysis for SGM (Song et al., 2021; Yong & Zhou, 1999; Anderson, 1982),

  1. (i)

    p0subscript𝑝0p_{0}italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and p1subscript𝑝1p_{1}italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT with finite second-order moment.

  2. (ii)

    gtsubscript𝑔𝑡g_{t}italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is continuous functions, and |g(t)|2>0superscript𝑔𝑡20|g(t)|^{2}>0| italic_g ( italic_t ) | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT > 0 is uniformly lower-bounded w.r.t. t𝑡titalic_t.

  3. (iii)

    t[0,1]for-all𝑡01\forall t\in[0,1]∀ italic_t ∈ [ 0 , 1 ], we have 𝐯logpt(𝐦t,t)subscript𝐯subscript𝑝𝑡subscript𝐦𝑡𝑡\nabla_{\mathbf{v}}\log p_{t}({\mathbf{m}}_{t},t)∇ start_POSTSUBSCRIPT bold_v end_POSTSUBSCRIPT roman_log italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) Lipschitz and at most linear growth w.r.t. 𝐱𝐱{\mathbf{x}}bold_x and 𝐯𝐯{\mathbf{v}}bold_v.

Assumptions (i) (ii) are standard conditions in stochastic analysis to ensure the existence-uniqueness of the SDEs; hence also appear in SGM analysis (Song et al., 2021).

Appendix C Stochastic Optimal Control (SOC) in the Wild

In this section, we are going to provide a gentle introduction of Stochastic Optimal Control (SOC). Our work is majorly relying on the prior work Chen & Georgiou (2015) in which some technical details are missing. Here we first clarify some core derivations that may help the broader audience to understand Chen & Georgiou (2015) and our work.

C.1 Linear Quadratic Stochastic Optimal Control

SOC has wide applications in finance, robotics, and manufacturing. Here we will focus on Linear Quadratic SOC which usually refers to Linear Quadratic Regulator because the dynamic is linear and the objective function is quadratic (Bryson, 1975; Stengel, 1994). The problem states as:

min𝐮t0112𝐮t22dt+𝐱1𝖳R𝐱1subscriptsubscript𝐮𝑡superscriptsubscript0112superscriptsubscriptdelimited-∥∥subscript𝐮𝑡22d𝑡superscriptsubscript𝐱1𝖳𝑅subscript𝐱1\displaystyle\min_{{\mathbf{u}}_{t}}\int_{0}^{1}\frac{1}{2}\lVert{\mathbf{u}}_% {t}\rVert_{2}^{2}{\textnormal{d}}t+{\mathbf{x}}_{1}^{\mathsf{T}}R{\mathbf{x}}_% {1}\quadroman_min start_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT d italic_t + bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (13)
s.td𝐱tformulae-sequence𝑠𝑡dsubscript𝐱𝑡\displaystyle s.t\ \ {\textnormal{d}}{\mathbf{x}}_{t}italic_s . italic_t d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =[A(t)𝐱t+gt𝐮t]dt+gtdwt,𝐱0=x0.formulae-sequenceabsentdelimited-[]𝐴𝑡subscript𝐱𝑡subscript𝑔𝑡subscript𝐮𝑡d𝑡subscript𝑔𝑡dsubscript𝑤𝑡subscript𝐱0subscript𝑥0\displaystyle=[A(t){\mathbf{x}}_{t}+g_{t}{\mathbf{u}}_{t}]{\textnormal{d}}t+g_% {t}{\textnormal{d}}w_{t},\quad{\mathbf{x}}_{0}=x_{0}.= [ italic_A ( italic_t ) bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] d italic_t + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT .

In this formulation, 𝐱tsubscript𝐱𝑡{\mathbf{x}}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT means the state and 𝐮tsubscript𝐮𝑡{\mathbf{u}}_{t}bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the control variable. Conceptually, the SOC problem is aiming to design the controller 𝐮tsubscript𝐮𝑡{\mathbf{u}}_{t}bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to drive the system from point x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to x10subscript𝑥10x_{1}\equiv 0italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≡ 0 with minimum effort. In the case of first-order system, the control will be the optimal vector field 𝐯tsuperscriptsubscript𝐯𝑡{\mathbf{v}}_{t}^{*}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT and for the second-order system, the control is denoted as the optimal acceleration 𝐚tsuperscriptsubscript𝐚𝑡{\mathbf{a}}_{t}^{*}bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT. The presence of stochasticity, introduced by the Wiener Process denoted as dwtdsubscript𝑤𝑡{\textnormal{d}}w_{t}d italic_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, prevents the system from precisely converging to the Dirac mass x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. In order to strike a balance between the objective of converging to x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and minimizing overall control effort 𝐮t22dtsuperscriptsubscriptdelimited-∥∥subscript𝐮𝑡22d𝑡\int\lVert{\mathbf{u}}_{t}\rVert_{2}^{2}{\textnormal{d}}t∫ ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT d italic_t, the terminal cost 𝐱1𝖳R𝐱1superscriptsubscript𝐱1𝖳𝑅subscript𝐱1{\mathbf{x}}_{1}^{\mathsf{T}}R{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT has been imposed.

One special case is R𝑅R\rightarrow\inftyitalic_R → ∞. Intuitively, it means the controlled dynamics should precisely converge to x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. However, one can notice that the stochastic trajectory which connects x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT is not unique in this case. Based on this constraint (pinned down at x1subscript𝑥1x_{1}italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and x0subscript𝑥0x_{0}italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT at two boundaries), the optimization problem of SOC finds the optimal solution with minimum effort 𝐮tsubscript𝐮𝑡{\mathbf{u}}_{t}bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which can be understood as the regularization of the trajectories, hence, such stochastic trajectory is unique while the regularization of controller is still applied. One can also draw the connection with such pinned-down SDE with well-known Doob-hhitalic_h transform. For the people who are not familiar with these, here are some interesting papers (Heng et al., 2021; O’Connell, 2003).

The classical procedure to solve the SOC problem includes:

  1. 1.

    write down the Hamilton–Jacobi–Bellman equation (HJB PDE) which explicitly represents the propagation of value function over time.

  2. 2.

    Construct the Ricatti/Lyapunov Equation.

  3. 3.

    Solve Ricatti/Lyapunov Equation and obtain the optimal control.

C.2 Value Function, Hamilton-Jacobian (Hamilton–Jacobi–Bellman equation) and Ricatti Equation

We adopt the classical notation in the SOC for the value function. Specifically, the underscript of the value function V𝑉Vitalic_V represents the partial derivative of it. For example, Vtsubscript𝑉𝑡V_{t}italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, Vxsubscript𝑉𝑥V_{x}italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and Vxxsubscript𝑉𝑥𝑥V_{xx}italic_V start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT represent for the first order derivative of V𝑉Vitalic_V w.r.t time t𝑡titalic_t , state 𝐱𝐱{\mathbf{x}}bold_x and second order derivate of V𝑉Vitalic_V w.r.t 𝐱𝐱{\mathbf{x}}bold_x. We first define the value function as:

V(𝐱t,t)=inf𝐮𝔼[t112𝐮t22dτ+𝐱1𝖳R𝐱1]𝑉subscript𝐱𝑡𝑡subscriptinfimum𝐮𝔼delimited-[]superscriptsubscript𝑡112superscriptsubscriptdelimited-∥∥subscript𝐮𝑡22d𝜏superscriptsubscript𝐱1𝖳𝑅subscript𝐱1\displaystyle V({\mathbf{x}}_{t},t)=\inf_{{\mathbf{u}}}\mathbb{E}\left[\int_{t% }^{1}\frac{1}{2}\lVert{\mathbf{u}}_{t}\rVert_{2}^{2}{\textnormal{d}}\tau+{% \mathbf{x}}_{1}^{\mathsf{T}}R{\mathbf{x}}_{1}\right]italic_V ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) = roman_inf start_POSTSUBSCRIPT bold_u end_POSTSUBSCRIPT blackboard_E [ ∫ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT d italic_τ + bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ]

and the dynamics is,

d𝐱t=(A𝐱t+gt𝐮t)dt+gtd𝐰tdsubscript𝐱𝑡𝐴subscript𝐱𝑡subscript𝑔𝑡subscript𝐮𝑡d𝑡subscript𝑔𝑡dsubscript𝐰𝑡\displaystyle{\textnormal{d}}{\mathbf{x}}_{t}=(A{\mathbf{x}}_{t}+g_{t}{\mathbf% {u}}_{t}){\textnormal{d}}t+g_{t}{\textnormal{d}}{\mathbf{w}}_{t}d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( italic_A bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) d italic_t + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

From Bellman’s principle to the value function, one can get:

V(t,𝐱t)𝑉𝑡subscript𝐱𝑡\displaystyle V(t,{\mathbf{x}}_{t})italic_V ( italic_t , bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) =inf𝐮𝔼[V(t+dt,𝐱t+dt)+tt+dt12𝐮t22dτ]absentsubscriptinfimum𝐮𝔼delimited-[]𝑉𝑡d𝑡subscript𝐱𝑡d𝑡superscriptsubscript𝑡𝑡d𝑡12superscriptsubscriptdelimited-∥∥subscript𝐮𝑡22d𝜏\displaystyle=\inf_{{\mathbf{u}}}\mathbb{E}\left[V(t+{\textnormal{d}}t,{% \mathbf{x}}_{t+{\textnormal{d}}t})+\int_{t}^{t+{\textnormal{d}}t}\frac{1}{2}% \lVert{\mathbf{u}}_{t}\rVert_{2}^{2}{\textnormal{d}}\tau\right]= roman_inf start_POSTSUBSCRIPT bold_u end_POSTSUBSCRIPT blackboard_E [ italic_V ( italic_t + d italic_t , bold_x start_POSTSUBSCRIPT italic_t + d italic_t end_POSTSUBSCRIPT ) + ∫ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + d italic_t end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT d italic_τ ]
=inf𝐮𝔼[12𝐮t22dt+V(t,𝐱t)+Vt(t,𝐱t)dt+Vx(t,𝐱)d𝐱+12tr[Vxxgg𝖳]dt]absentsubscriptinfimum𝐮𝔼delimited-[]12superscriptsubscriptdelimited-∥∥subscript𝐮𝑡22d𝑡𝑉𝑡subscript𝐱𝑡subscript𝑉𝑡𝑡subscript𝐱𝑡d𝑡subscript𝑉𝑥𝑡𝐱d𝐱12𝑡𝑟delimited-[]subscript𝑉𝑥𝑥𝑔superscript𝑔𝖳d𝑡\displaystyle=\inf_{{\mathbf{u}}}\mathbb{E}\left[\frac{1}{2}\lVert{\mathbf{u}}% _{t}\rVert_{2}^{2}{\textnormal{d}}t+V(t,{\mathbf{x}}_{t})+V_{t}(t,{\mathbf{x}}% _{t}){\textnormal{d}}t+V_{x}(t,{\mathbf{x}}){\textnormal{d}}{\mathbf{x}}+\frac% {1}{2}tr\left[V_{xx}gg^{\mathsf{T}}\right]{\textnormal{d}}t\right]= roman_inf start_POSTSUBSCRIPT bold_u end_POSTSUBSCRIPT blackboard_E [ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT d italic_t + italic_V ( italic_t , bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_t , bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) d italic_t + italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_t , bold_x ) d bold_x + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t italic_r [ italic_V start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ] d italic_t ]
=Plug in the dynamics d𝐱t=absentPlug in the dynamics d𝐱t=\displaystyle=\text{Plug in the dynamics ${\textnormal{d}}{\mathbf{x}}_{t}=% \cdots$}= Plug in the dynamics d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ⋯
=inf𝐮𝔼[12𝐮t22dt+V(t,𝐱t)+Vt(t,𝐱t)dt+Vx(t,𝐱)𝖳((A𝐱t+gt𝐮t)dt+gd𝐰t)\displaystyle=\inf_{{\mathbf{u}}}\mathbb{E}\left[\frac{1}{2}\lVert{\mathbf{u}}% _{t}\rVert_{2}^{2}{\textnormal{d}}t+V(t,{\mathbf{x}}_{t})+V_{t}(t,{\mathbf{x}}% _{t}){\textnormal{d}}t+V_{x}(t,{\mathbf{x}})^{\mathsf{T}}((A{\mathbf{x}}_{t}+g% _{t}{\mathbf{u}}_{t})dt+g{\textnormal{d}}{\mathbf{w}}_{t})\right.= roman_inf start_POSTSUBSCRIPT bold_u end_POSTSUBSCRIPT blackboard_E [ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT d italic_t + italic_V ( italic_t , bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_t , bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) d italic_t + italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_t , bold_x ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( ( italic_A bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) italic_d italic_t + italic_g d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
+12tr[Vxxgg𝖳]dt]\displaystyle\left.+\frac{1}{2}tr\left[V_{xx}gg^{\mathsf{T}}\right]{% \textnormal{d}}t\right]+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t italic_r [ italic_V start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ] d italic_t ]
=inf𝐮[12𝐮t22dt+V(t,𝐱t)+Vt(t,𝐱t)dt+Vx(t,𝐱)𝖳(A𝐱t+gt𝐮t)dt\displaystyle=\inf_{{\mathbf{u}}}\left[\frac{1}{2}\lVert{\mathbf{u}}_{t}\rVert% _{2}^{2}{\textnormal{d}}t+V(t,{\mathbf{x}}_{t})+V_{t}(t,{\mathbf{x}}_{t}){% \textnormal{d}}t+V_{x}(t,{\mathbf{x}})^{\mathsf{T}}(A{\mathbf{x}}_{t}+g_{t}{% \mathbf{u}}_{t}){\textnormal{d}}t\right.= roman_inf start_POSTSUBSCRIPT bold_u end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT d italic_t + italic_V ( italic_t , bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_t , bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) d italic_t + italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT ( italic_t , bold_x ) start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( italic_A bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) d italic_t
+12tr[Vxxgg𝖳]dt]\displaystyle\left.+\frac{1}{2}tr\left[V_{xx}gg^{\mathsf{T}}\right]{% \textnormal{d}}t\right]+ divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t italic_r [ italic_V start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ] d italic_t ]

One obtain:

Vt+inf𝐮[12𝐮t22+Vx𝖳(A𝐱t+gt𝐮t)]+12tr[Vxxgg𝖳]=0subscript𝑉𝑡subscriptinfimum𝐮delimited-[]12superscriptsubscriptdelimited-∥∥subscript𝐮𝑡22superscriptsubscript𝑉𝑥𝖳𝐴subscript𝐱𝑡subscript𝑔𝑡subscript𝐮𝑡12𝑡𝑟delimited-[]subscript𝑉𝑥𝑥𝑔superscript𝑔𝖳0\displaystyle V_{t}+\inf_{{\mathbf{u}}}\left[\frac{1}{2}\lVert{\mathbf{u}}_{t}% \rVert_{2}^{2}+V_{x}^{\mathsf{T}}(A{\mathbf{x}}_{t}+g_{t}{\mathbf{u}}_{t})% \right]+\frac{1}{2}tr\left[V_{xx}gg^{\mathsf{T}}\right]=0italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + roman_inf start_POSTSUBSCRIPT bold_u end_POSTSUBSCRIPT [ divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ( italic_A bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t italic_r [ italic_V start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ] = 0

The optimal control can be obtained by

𝐮t=gtVxsuperscriptsubscript𝐮𝑡subscript𝑔𝑡subscript𝑉𝑥\displaystyle{\mathbf{u}}_{t}^{*}=-g_{t}V_{x}bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT

Plugging it back, one can obtain the HJB PDE:

Vt12Vxgg𝖳Vx+Vx𝖳A𝐱t+12tr[Vxxgg𝖳]=0subscript𝑉𝑡12subscript𝑉𝑥𝑔superscript𝑔𝖳subscript𝑉𝑥superscriptsubscript𝑉𝑥𝖳𝐴subscript𝐱𝑡12𝑡𝑟delimited-[]subscript𝑉𝑥𝑥𝑔superscript𝑔𝖳0\displaystyle V_{t}-\frac{1}{2}V_{x}gg^{\mathsf{T}}V_{x}+V_{x}^{\mathsf{T}}A{% \mathbf{x}}_{t}+\frac{1}{2}tr\left[V_{xx}gg^{\mathsf{T}}\right]=0italic_V start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT + italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t italic_r [ italic_V start_POSTSUBSCRIPT italic_x italic_x end_POSTSUBSCRIPT italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ] = 0

We assume that there exist certain matrix Q𝑄Qitalic_Q, s.t. V(𝐱,t)12𝐱𝖳Q𝐱+Ξ(t)𝑉𝐱𝑡12superscript𝐱𝖳𝑄𝐱Ξ𝑡V({\mathbf{x}},t)\equiv\frac{1}{2}{\mathbf{x}}^{\mathsf{T}}Q{\mathbf{x}}+\Xi(t)italic_V ( bold_x , italic_t ) ≡ divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q bold_x + roman_Ξ ( italic_t ). By matching the different power terms of HJB, one can write:

Ξ˙12𝐱𝖳Q˙𝐱=12𝐱𝖳Qgg𝖳Q𝐱𝖳+𝐱𝖳A𝖳Q𝐱+12tr[Qgg𝖳]˙Ξ12superscript𝐱𝖳˙𝑄𝐱12superscript𝐱𝖳𝑄𝑔superscript𝑔𝖳𝑄superscript𝐱𝖳superscript𝐱𝖳superscript𝐴𝖳𝑄𝐱12𝑡𝑟delimited-[]𝑄𝑔superscript𝑔𝖳\displaystyle-\dot{\Xi}-\frac{1}{2}{\mathbf{x}}^{\mathsf{T}}\dot{Q}{\mathbf{x}% }=-\frac{1}{2}{\mathbf{x}}^{\mathsf{T}}Qgg^{\mathsf{T}}Q{\mathbf{x}}^{\mathsf{% T}}+{\mathbf{x}}^{\mathsf{T}}A^{\mathsf{T}}Q{\mathbf{x}}+\frac{1}{2}tr\left[% Qgg^{\mathsf{T}}\right]- over˙ start_ARG roman_Ξ end_ARG - divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT over˙ start_ARG italic_Q end_ARG bold_x = - divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q bold_x start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT + bold_x start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q bold_x + divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_t italic_r [ italic_Q italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ] (14)

with boundary condition:

Ξ(1)=0,Q(1)=Rformulae-sequenceΞ10𝑄1𝑅\displaystyle\Xi(1)=0,\quad Q(1)=Rroman_Ξ ( 1 ) = 0 , italic_Q ( 1 ) = italic_R (15)

Due to the fact that 𝐱𝖳A𝖳Q𝐱=𝐱𝖳QA𝐱superscript𝐱𝖳superscript𝐴𝖳𝑄𝐱superscript𝐱𝖳𝑄𝐴𝐱{\mathbf{x}}^{\mathsf{T}}A^{\mathsf{T}}Q{\mathbf{x}}={\mathbf{x}}^{\mathsf{T}}% QA{\mathbf{x}}bold_x start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q bold_x = bold_x start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q italic_A bold_x, one arrives Riccati Equation:

Q˙=A𝖳Q+QAQgg𝖳Q˙𝑄superscript𝐴𝖳𝑄𝑄𝐴𝑄𝑔superscript𝑔𝖳𝑄\displaystyle-\dot{Q}=A^{\mathsf{T}}Q+QA-Qgg^{\mathsf{T}}Q- over˙ start_ARG italic_Q end_ARG = italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q + italic_Q italic_A - italic_Q italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q (16)

Recall that the optimal solution is 𝐮t=gtVxsuperscriptsubscript𝐮𝑡subscript𝑔𝑡subscript𝑉𝑥{\mathbf{u}}_{t}^{*}=-g_{t}V_{x}bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_V start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and V:=12𝐱𝖳Q𝐱+Ξ(t)assign𝑉12superscript𝐱𝖳𝑄𝐱Ξ𝑡V:=\frac{1}{2}{\mathbf{x}}^{\mathsf{T}}Q{\mathbf{x}}+\Xi(t)italic_V := divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_x start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q bold_x + roman_Ξ ( italic_t ), the optimal control can be expressed in the way of the solution of Ricatti equation: 𝐮t=g𝖳Q(t)𝐱tsuperscriptsubscript𝐮𝑡superscript𝑔𝖳𝑄𝑡subscript𝐱𝑡{\mathbf{u}}_{t}^{*}=-g^{\mathsf{T}}Q(t){\mathbf{x}}_{t}bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = - italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q ( italic_t ) bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

C.3 Ricatti Equation and Lyapunov Equation

Here we provide the connection between Ricatti Equation and Lyapunov Equation in the current setup.

Lemma 6.

Define P(t):=Q(t)1assign𝑃𝑡𝑄superscript𝑡1P(t):=Q(t)^{-1}italic_P ( italic_t ) := italic_Q ( italic_t ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT in which Q(t)𝑄𝑡Q(t)italic_Q ( italic_t ) is the solution of Ricatti equation (eq.16), Then P(t)𝑃𝑡P(t)italic_P ( italic_t ) solve the Lyapunov equation:

P˙=AP+PA𝖳gg𝖳˙𝑃𝐴𝑃𝑃superscript𝐴𝖳𝑔superscript𝑔𝖳\displaystyle\dot{P}=AP+PA^{\mathsf{T}}-gg^{\mathsf{T}}over˙ start_ARG italic_P end_ARG = italic_A italic_P + italic_P italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT - italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT (17)

For notation consistency, we name the elements in P𝑃Pitalic_P matrix as,

P=[P00P01P10P11]𝑃matrixsubscript𝑃00subscript𝑃01subscript𝑃10subscript𝑃11\displaystyle P=\begin{bmatrix}P_{00}&P_{01}\\ P_{10}&P_{11}\end{bmatrix}italic_P = [ start_ARG start_ROW start_CELL italic_P start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ]
Proof.

By plugging in the Lyapunov equation P(t):=Q(t)1assign𝑃𝑡𝑄superscript𝑡1P(t):=Q(t)^{-1}italic_P ( italic_t ) := italic_Q ( italic_t ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, one can get:

Q1˙˙superscript𝑄1\displaystyle\dot{Q^{-1}}over˙ start_ARG italic_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG =AQ1+Q1A𝖳gg𝖳absent𝐴superscript𝑄1superscript𝑄1superscript𝐴𝖳𝑔superscript𝑔𝖳\displaystyle=AQ^{-1}+Q^{-1}A^{\mathsf{T}}-gg^{\mathsf{T}}= italic_A italic_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT - italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT
Q1Q˙Q1absentsuperscript𝑄1˙𝑄superscript𝑄1\displaystyle\Leftrightarrow-Q^{-1}\dot{Q}Q^{-1}⇔ - italic_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over˙ start_ARG italic_Q end_ARG italic_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT =AQ1+Q1A𝖳gg𝖳absent𝐴superscript𝑄1superscript𝑄1superscript𝐴𝖳𝑔superscript𝑔𝖳\displaystyle=AQ^{-1}+Q^{-1}A^{\mathsf{T}}-gg^{\mathsf{T}}= italic_A italic_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT + italic_Q start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT - italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT
Q˙absent˙𝑄\displaystyle\Leftrightarrow-\dot{Q}⇔ - over˙ start_ARG italic_Q end_ARG =QA+A𝖳QQgg𝖳Qabsent𝑄𝐴superscript𝐴𝖳𝑄𝑄𝑔superscript𝑔𝖳𝑄\displaystyle=QA+A^{\mathsf{T}}Q-Qgg^{\mathsf{T}}Q= italic_Q italic_A + italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q - italic_Q italic_g italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_Q

By Lemma.6, the optimal control can also be represented as the solution of the Lyapunov equation: 𝐮t=g𝖳P(t)1𝐱tsuperscriptsubscript𝐮𝑡superscript𝑔𝖳𝑃superscript𝑡1subscript𝐱𝑡{\mathbf{u}}_{t}^{*}=-g^{\mathsf{T}}P(t)^{-1}{\mathbf{x}}_{t}bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = - italic_g start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_P ( italic_t ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT which is indeed the optimal control term used in Chen & Georgiou (2015) after adopting their notation, and it is same as the optimal control term we used in the Lemma.12 without base dynamics compensation.

C.4 SOC Connection with Schrödinger Bridge

The optimal control solution is also the solution of Schrödinger Bridge when the terminal condition degenerates to the point mass (see example of Brownian Bridge in Appendix.D.1). It is also the solution of the Schrödinger Bridge when the optimal pairing is available to see proposition.2 De Bortoli et al. (2023).

So in our case, we are not solving the momentum Schrödinger Bridge as shown in Chen et al. (2023) (also see. fig.6), even though the problem formulation is similar. Specifically, AGM is a special case of momentum Schrödinger Bridge when the boundary conditions are degenerated to Dirac Distributions.

Refer to caption
Figure 6: momentum Schrodinger Bridge versus AGM.

Appendix D Technique Details in Section.3

D.1 Brownian Bridge as the solution of Stochastic Optimal Control

We adopt the presentation form Kappen (2008). We consider the control problem:

min𝐮tt112subscriptsubscript𝐮𝑡superscriptsubscript𝑡112\displaystyle\min_{{\mathbf{u}}_{t}}\int_{t}^{1}\frac{1}{2}roman_min start_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG 𝐮t22dt+𝐫2𝐱1x122superscriptsubscriptdelimited-∥∥subscript𝐮𝑡22d𝑡𝐫2superscriptsubscriptdelimited-∥∥subscript𝐱1subscript𝑥122\displaystyle\lVert{\mathbf{u}}_{t}\rVert_{2}^{2}{\textnormal{d}}t+\frac{{% \mathbf{r}}}{2}\lVert{\mathbf{x}}_{1}-x_{1}\rVert_{2}^{2}∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT d italic_t + divide start_ARG bold_r end_ARG start_ARG 2 end_ARG ∥ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
s.t.d𝐱ts.t.dsubscript𝐱𝑡\displaystyle\text{s.t.}\quad{\textnormal{d}}{\mathbf{x}}_{t}s.t. d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =𝐮tdt,𝐱0=x0formulae-sequenceabsentsubscript𝐮𝑡d𝑡subscript𝐱0subscript𝑥0\displaystyle={\mathbf{u}}_{t}{\textnormal{d}}t,\quad{\mathbf{x}}_{0}=x_{0}= bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t , bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT

Where 𝐫𝐫{\mathbf{r}}bold_r is the terminal cost coefficient. According to Pontryagin Maximum Principle (PMP;Kirk (2004)) recipe, one can construct the Hamiltonian:

H(t,𝐱,𝐮,γ)𝐻𝑡𝐱𝐮𝛾\displaystyle H(t,{\mathbf{x}},{\mathbf{u}},\gamma)italic_H ( italic_t , bold_x , bold_u , italic_γ ) =12𝐮t22+γ𝐮tabsent12superscriptsubscriptdelimited-∥∥subscript𝐮𝑡22𝛾subscript𝐮𝑡\displaystyle=-\frac{1}{2}\lVert{\mathbf{u}}_{t}\rVert_{2}^{2}+\gamma{\mathbf{% u}}_{t}= - divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_γ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

By setting:

H𝐮t=0,𝐻subscript𝐮𝑡0\displaystyle\frac{\partial H}{\partial{\mathbf{u}}_{t}}=0,divide start_ARG ∂ italic_H end_ARG start_ARG ∂ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG = 0 ,

the optimized Hamiltonian is:

H(t,𝐱,𝐮,γ)𝐻superscript𝑡𝐱𝐮𝛾\displaystyle H(t,{\mathbf{x}},{\mathbf{u}},\gamma)^{*}italic_H ( italic_t , bold_x , bold_u , italic_γ ) start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =12γ2,where𝐮t=γformulae-sequenceabsent12superscript𝛾2wheresubscript𝐮𝑡𝛾\displaystyle=\frac{1}{2}\gamma^{2},\quad\text{where}\quad{\mathbf{u}}_{t}=\gamma= divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_γ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , where bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_γ

Then we solve the Hamiltonian equation of motion:

d𝐱tdt=Hγ=γdsubscript𝐱𝑡d𝑡superscript𝐻𝛾𝛾\displaystyle\frac{{\textnormal{d}}{\mathbf{x}}_{t}}{{\textnormal{d}}t}=\frac{% \partial H^{*}}{\partial\gamma}=\gammadivide start_ARG d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG d italic_t end_ARG = divide start_ARG ∂ italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG ∂ italic_γ end_ARG = italic_γ
dγdt=H𝐱=0d𝛾d𝑡superscript𝐻𝐱0\displaystyle\frac{{\textnormal{d}}\gamma}{{\textnormal{d}}t}=\frac{\partial H% ^{*}}{\partial{\mathbf{x}}}=0divide start_ARG d italic_γ end_ARG start_ARG d italic_t end_ARG = divide start_ARG ∂ italic_H start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_x end_ARG = 0
where𝐱0wheresubscript𝐱0\displaystyle\text{where}\quad{\mathbf{x}}_{0}where bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT =x0andγ1=𝐫(𝐱1x1)formulae-sequenceabsentsubscript𝑥0andsubscript𝛾1𝐫subscript𝐱1subscript𝑥1\displaystyle=x_{0}\quad\text{and}\quad\gamma_{1}=-{\mathbf{r}}\cdot({\mathbf{% x}}_{1}-x_{1})= italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and italic_γ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = - bold_r ⋅ ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

One can notice that the solution for γtsubscript𝛾𝑡\gamma_{t}italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the constant γt=γ=𝐫(𝐱1x1)subscript𝛾𝑡𝛾𝐫subscript𝐱1subscript𝑥1\gamma_{t}=\gamma=-{\mathbf{r}}\cdot({\mathbf{x}}_{1}-x_{1})italic_γ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_γ = - bold_r ⋅ ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ), hence the solution for 𝐱tsubscript𝐱𝑡{\mathbf{x}}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is 𝐱t=𝐱1+γtsubscript𝐱𝑡subscript𝐱1𝛾𝑡{\mathbf{x}}_{t}={\mathbf{x}}_{1}+\gamma tbold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_γ italic_t.

γ𝛾\displaystyle\gammaitalic_γ =𝐫(𝐱1x1)=𝐫(𝐱0+(1t)γx1)absent𝐫subscript𝐱1subscript𝑥1𝐫subscript𝐱01𝑡𝛾subscript𝑥1\displaystyle=-{\mathbf{r}}({\mathbf{x}}_{1}-x_{1})=-{\mathbf{r}}({\mathbf{x}}% _{0}+(1-t)\gamma-x_{1})= - bold_r ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) = - bold_r ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + ( 1 - italic_t ) italic_γ - italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )
\displaystyle\rightarrow 𝐮t:=γ=𝐫(x1𝐱0)1+𝐫(1t)assignsubscriptsuperscript𝐮𝑡𝛾𝐫subscript𝑥1subscript𝐱01𝐫1𝑡\displaystyle\quad{\mathbf{u}}^{*}_{t}:=\gamma=\frac{{\mathbf{r}}(x_{1}-{% \mathbf{x}}_{0})}{1+{\mathbf{r}}(1-t)}bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := italic_γ = divide start_ARG bold_r ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) end_ARG start_ARG 1 + bold_r ( 1 - italic_t ) end_ARG

When 𝐫+𝐫{\mathbf{r}}\rightarrow+\inftybold_r → + ∞, we arrive the optimal control as 𝐮t=x1𝐱01tsuperscriptsubscript𝐮𝑡subscript𝑥1subscript𝐱01𝑡{\mathbf{u}}_{t}^{*}=\frac{x_{1}-{\mathbf{x}}_{0}}{1-t}bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT = divide start_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG. Due to certainty equivalence, this is also the optimal control law for

d𝐱t=𝐮tdt+d𝐰tdsubscript𝐱𝑡subscript𝐮𝑡d𝑡dsubscript𝐰𝑡\displaystyle{\textnormal{d}}{\mathbf{x}}_{t}={\mathbf{u}}_{t}{\textnormal{d}}% t+{\textnormal{d}}{\mathbf{w}}_{t}d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t + d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

By plugging it back into the dynamics, we obtain the well-known Brownian Bridge:

d𝐱t=x1𝐱t1tdt+d𝐰tdsubscript𝐱𝑡subscript𝑥1subscript𝐱𝑡1𝑡d𝑡dsubscript𝐰𝑡\displaystyle{\textnormal{d}}{\mathbf{x}}_{t}=\frac{x_{1}-{\mathbf{x}}_{t}}{1-% t}{\textnormal{d}}t+{\textnormal{d}}{\mathbf{w}}_{t}d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG d italic_t + d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
Remark 7.

If there is not stochasticity d𝐰tdsubscript𝐰𝑡{\textnormal{d}}{\mathbf{w}}_{t}d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, one can get 𝐮t:=x1𝐱t1t=x1𝐱0assignsubscript𝐮𝑡subscript𝑥1subscript𝐱𝑡1𝑡subscript𝑥1subscript𝐱0{\mathbf{u}}_{t}:=\frac{x_{1}-{\mathbf{x}}_{t}}{1-t}=x_{1}-{\mathbf{x}}_{0}bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := divide start_ARG italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG = italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT which is the vector field constructed by Lipman et al. (2022) during traning.

D.2 Proof of Proposition.3

Proposition 8.

The solution of the stochastic bridge problem of linear momentum system (Chen & Georgiou, 2015) is

𝐚(𝐦t,t)=gt2P11(𝐱1𝐱t1t𝐯t)where:P11=4gt2(t1).\displaystyle{\mathbf{a}}^{*}({\mathbf{m}}_{t},t)=g_{t}^{2}P_{11}\left(\frac{{% \mathbf{x}}_{1}-{\mathbf{x}}_{t}}{1-t}-{\mathbf{v}}_{t}\right)\quad\text{where% }:\quad P_{11}=\frac{-4}{g_{t}^{2}(t-1)}.bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) = italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG - bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) where : italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT = divide start_ARG - 4 end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) end_ARG . (18)
Proof.

From Lemma.12, one can get the optimal control for this problem is

𝐮t=𝐠𝐠𝖳𝐏t1(𝐦tΦ(t,1)𝐦1)subscriptsuperscript𝐮𝑡superscript𝐠𝐠𝖳superscriptsubscript𝐏𝑡1subscript𝐦𝑡Φ𝑡1subscript𝐦1\displaystyle{\mathbf{u}}^{*}_{t}=-{\mathbf{g}}{\mathbf{g}}^{\mathsf{T}}{% \mathbf{P}}_{t}^{-1}\left({\mathbf{m}}_{t}-\Phi(t,1){\mathbf{m}}_{1}\right)bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - bold_gg start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - roman_Φ ( italic_t , 1 ) bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

where state transition function ΦΦ\Phiroman_Φ can be obtained from Lemma.11 and 𝐏tsubscript𝐏𝑡{\mathbf{P}}_{t}bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the solution of Lyapunov equation and 𝐏t1superscriptsubscript𝐏𝑡1{\mathbf{P}}_{t}^{-1}bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT can be found in Lemma.9.

Then we have:

𝐮tsuperscriptsubscript𝐮𝑡\displaystyle{\mathbf{u}}_{t}^{*}bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =𝐠𝐠𝖳𝐏t1(𝐦tΦ(t,1)𝐦1)absentsuperscript𝐠𝐠𝖳superscriptsubscript𝐏𝑡1subscript𝐦𝑡Φ𝑡1subscript𝐦1\displaystyle=-{\mathbf{g}}{\mathbf{g}}^{\mathsf{T}}{\mathbf{P}}_{t}^{-1}\left% ({\mathbf{m}}_{t}-\Phi(t,1){\mathbf{m}}_{1}\right)= - bold_gg start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - roman_Φ ( italic_t , 1 ) bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )
=𝐠𝐠𝖳𝐏t1𝐦t+𝐠𝐠𝖳𝐏t1Φ(t,1)𝐦1absentsuperscript𝐠𝐠𝖳superscriptsubscript𝐏𝑡1subscript𝐦𝑡superscript𝐠𝐠𝖳superscriptsubscript𝐏𝑡1Φ𝑡1subscript𝐦1\displaystyle=-{\mathbf{g}}{\mathbf{g}}^{\mathsf{T}}{\mathbf{P}}_{t}^{-1}{% \mathbf{m}}_{t}+{\mathbf{g}}{\mathbf{g}}^{\mathsf{T}}{\mathbf{P}}_{t}^{-1}\Phi% (t,1){\mathbf{m}}_{1}= - bold_gg start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + bold_gg start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT roman_Φ ( italic_t , 1 ) bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
=[000g2]𝐏t1𝐦t+𝐠𝐠𝖳𝐏t1[1t101]𝐦1absentmatrix000superscript𝑔2superscriptsubscript𝐏𝑡1subscript𝐦𝑡superscript𝐠𝐠𝖳superscriptsubscript𝐏𝑡1matrix1𝑡101subscript𝐦1\displaystyle=-\begin{bmatrix}0&0\\ 0&g^{2}\\ \end{bmatrix}{\mathbf{P}}_{t}^{-1}{\mathbf{m}}_{t}+{\mathbf{g}}{\mathbf{g}}^{% \mathsf{T}}{\mathbf{P}}_{t}^{-1}\begin{bmatrix}1&t-1\\ 0&1\\ \end{bmatrix}{\mathbf{m}}_{1}= - [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ] bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + bold_gg start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_t - 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
=gt2[00P10P11]𝐦t+[000gt2][P00P01P10P11][1t101]𝐦1absentsuperscriptsubscript𝑔𝑡2matrix00subscript𝑃10subscript𝑃11subscript𝐦𝑡matrix000subscriptsuperscript𝑔2𝑡matrixsubscript𝑃00subscript𝑃01subscript𝑃10subscript𝑃11matrix1𝑡101subscript𝐦1\displaystyle=-g_{t}^{2}\begin{bmatrix}0&0\\ P_{10}&P_{11}\\ \end{bmatrix}{\mathbf{m}}_{t}+\begin{bmatrix}0&0\\ 0&g^{2}_{t}\end{bmatrix}\begin{bmatrix}P_{00}&P_{01}\\ P_{10}&P_{11}\\ \end{bmatrix}\begin{bmatrix}1&t-1\\ 0&1\\ \end{bmatrix}{\mathbf{m}}_{1}= - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL italic_P start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 01 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_t - 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
=gt2[00P10P11]𝐦t+gt2[00P10P11][1t101]𝐦1absentsuperscriptsubscript𝑔𝑡2matrix00subscript𝑃10subscript𝑃11subscript𝐦𝑡superscriptsubscript𝑔𝑡2matrix00subscript𝑃10subscript𝑃11matrix1𝑡101subscript𝐦1\displaystyle=-g_{t}^{2}\begin{bmatrix}0&0\\ P_{10}&P_{11}\\ \end{bmatrix}{\mathbf{m}}_{t}+g_{t}^{2}\begin{bmatrix}0&0\\ P_{10}&P_{11}\\ \end{bmatrix}\begin{bmatrix}1&t-1\\ 0&1\\ \end{bmatrix}{\mathbf{m}}_{1}= - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_t - 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ] bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
=gt2[00P10P11]𝐦t+gt2[00P10P10(t1)+P11]𝐦1absentsuperscriptsubscript𝑔𝑡2matrix00subscript𝑃10subscript𝑃11subscript𝐦𝑡superscriptsubscript𝑔𝑡2matrix00subscript𝑃10subscript𝑃10𝑡1subscript𝑃11subscript𝐦1\displaystyle=-g_{t}^{2}\begin{bmatrix}0&0\\ P_{10}&P_{11}\\ \end{bmatrix}{\mathbf{m}}_{t}+g_{t}^{2}\begin{bmatrix}0&0\\ P_{10}&P_{10}(t-1)+P_{11}\\ \end{bmatrix}{\mathbf{m}}_{1}= - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT end_CELL start_CELL italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( italic_t - 1 ) + italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
=[0gt2P10(𝐱1𝐱t)+gt2P10(t1)𝐯1+gt2P11(𝐯1𝐯t)]absentmatrix0superscriptsubscript𝑔𝑡2subscript𝑃10subscript𝐱1subscript𝐱𝑡superscriptsubscript𝑔𝑡2subscript𝑃10𝑡1subscript𝐯1superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐯1subscript𝐯𝑡\displaystyle=\begin{bmatrix}0\\ g_{t}^{2}P_{10}({\mathbf{x}}_{1}-{\mathbf{x}}_{t})+g_{t}^{2}P_{10}(t-1)\cdot{% \mathbf{v}}_{1}+g_{t}^{2}P_{11}({\mathbf{v}}_{1}-{\mathbf{v}}_{t})\end{bmatrix}= [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT ( italic_t - 1 ) ⋅ bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ]
Plug in 𝐯1:=𝐱1𝐱t1tassignPlug in subscript𝐯1subscript𝐱1subscript𝐱𝑡1𝑡\displaystyle\text{Plug in }{\mathbf{v}}_{1}:=\frac{{\mathbf{x}}_{1}-{\mathbf{% x}}_{t}}{1-t}Plug in bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT := divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG
=[0gt2P11(𝐱1𝐱t1t𝐯t)]absentmatrix0superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐱1subscript𝐱𝑡1𝑡subscript𝐯𝑡\displaystyle=\begin{bmatrix}0\\ g_{t}^{2}P_{11}\left(\frac{{\mathbf{x}}_{1}-{\mathbf{x}}_{t}}{1-t}-{\mathbf{v}% }_{t}\right)\end{bmatrix}= [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG - bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ]

Lemma 9.

The Lyapunov equation corresponding to the optimization problem showed in Lemma.12:

𝐮tsubscriptsuperscript𝐮𝑡\displaystyle{\mathbf{u}}^{*}_{t}bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT argmin𝐮t𝒰𝔼[0T12𝐮t2]dt+𝐱1𝖳𝐑𝐱1absentsubscriptargminsubscript𝐮𝑡𝒰𝔼delimited-[]superscriptsubscript0𝑇12superscriptdelimited-∥∥subscript𝐮𝑡2d𝑡superscriptsubscript𝐱1𝖳subscript𝐑𝐱1\displaystyle\in\operatorname*{arg\,min}_{{\mathbf{u}}_{t}\in\mathcal{U}}% \mathbb{E}\left[\int_{0}^{T}\frac{1}{2}\lVert{\mathbf{u}}_{t}\rVert^{2}\right]% {\textnormal{d}}t+{\mathbf{x}}_{1}^{\mathsf{T}}{\mathbf{R}}{\mathbf{x}}_{1}∈ start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_U end_POSTSUBSCRIPT blackboard_E [ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] d italic_t + bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_Rx start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
s.tformulae-sequence𝑠𝑡\displaystyle s.t\quaditalic_s . italic_t d𝐦t=[0100]A𝐦tdt+𝐮tdt+𝐠d𝐰tdsubscript𝐦𝑡subscriptmatrix0100𝐴subscript𝐦𝑡d𝑡subscript𝐮𝑡d𝑡𝐠dsubscript𝐰𝑡\displaystyle{\textnormal{d}}{\mathbf{m}}_{t}=\underbrace{\begin{bmatrix}0&1\\ 0&0\\ \end{bmatrix}}_{A}{\mathbf{m}}_{t}{\textnormal{d}}t+{\mathbf{u}}_{t}{% \textnormal{d}}t+{\mathbf{g}}{\textnormal{d}}{\mathbf{w}}_{t}d bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = under⏟ start_ARG [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t + bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t + bold_g d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
𝐦0=m0,𝐦1=m1formulae-sequencesubscript𝐦0subscript𝑚0subscript𝐦1subscript𝑚1\displaystyle{\mathbf{m}}_{0}=m_{0},\quad{\mathbf{m}}_{1}=m_{1}bold_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

is depited as

𝐏˙=A𝐏+𝐏A𝖳𝒈𝒈T.˙𝐏𝐴𝐏𝐏superscript𝐴𝖳𝒈superscript𝒈𝑇\displaystyle\dot{{\mathbf{P}}}=A{\mathbf{P}}+{\mathbf{P}}A^{\mathsf{T}}-{\bm{% g}}{\bm{g}}^{T}.over˙ start_ARG bold_P end_ARG = italic_A bold_P + bold_P italic_A start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT - bold_italic_g bold_italic_g start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT . (19)

When 𝐠=[0g]𝐠matrix0𝑔{\bm{g}}=\begin{bmatrix}0\\ g\\ \end{bmatrix}bold_italic_g = [ start_ARG start_ROW start_CELL 0 end_CELL end_ROW start_ROW start_CELL italic_g end_CELL end_ROW end_ARG ], the solution for Lyapunov equation above, with terminal condition

𝐏1=𝐑1=lim𝐫inf[𝐫00𝐫]1=[0000]subscript𝐏1superscript𝐑1subscript𝐫infimumsuperscriptmatrix𝐫00𝐫1matrix0000\displaystyle{\mathbf{P}}_{1}={\mathbf{R}}^{-1}=\lim_{{\mathbf{r}}\rightarrow% \inf}\begin{bmatrix}{\mathbf{r}}&0\\ 0&{\mathbf{r}}\\ \end{bmatrix}^{-1}=\begin{bmatrix}0&0\\ 0&0\\ \end{bmatrix}bold_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = roman_lim start_POSTSUBSCRIPT bold_r → roman_inf end_POSTSUBSCRIPT [ start_ARG start_ROW start_CELL bold_r end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL bold_r end_CELL end_ROW end_ARG ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] (20)

However, one does not need the force to converge exactly at 𝐯1subscript𝐯1{\mathbf{v}}_{1}bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT because we only care about the generated quality of 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Here we give a general case in which the 𝐫𝐫{\mathbf{r}}bold_r keeps a small value ω𝜔\omegaitalic_ω for the velocity channel:

𝐏1=𝐑1=[000ω]subscript𝐏1superscript𝐑1matrix000𝜔\displaystyle{\mathbf{P}}_{1}={\mathbf{R}}^{-1}=\begin{bmatrix}0&0\\ 0&\omega\\ \end{bmatrix}bold_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_R start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL italic_ω end_CELL end_ROW end_ARG ] (21)

Then the solution is given by

𝐏t=[ω(t1)213g2(t1)3ω(t1)12g2(t1)2ω(t1)12g2(t1)2g2(1t)+ω]subscript𝐏𝑡matrixmissing-subexpression𝜔superscript𝑡1213superscript𝑔2superscript𝑡13𝜔𝑡112superscript𝑔2superscript𝑡12missing-subexpression𝜔𝑡112superscript𝑔2superscript𝑡12superscript𝑔21𝑡𝜔\displaystyle{\mathbf{P}}_{t}=\begin{bmatrix}&\omega(t-1)^{2}-\frac{1}{3}g^{2}% (t-1)^{3}&\omega(t-1)-\frac{1}{2}g^{2}(t-1)^{2}\\ &\omega(t-1)-\frac{1}{2}g^{2}(t-1)^{2}&g^{2}(1-t)+\omega\\ \end{bmatrix}bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL end_CELL start_CELL italic_ω ( italic_t - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 3 end_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT end_CELL start_CELL italic_ω ( italic_t - 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_ω ( italic_t - 1 ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_CELL start_CELL italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 1 - italic_t ) + italic_ω end_CELL end_ROW end_ARG ]

and the inverse of 𝐏tsubscript𝐏𝑡{\mathbf{P}}_{t}bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is,

𝐏t1superscriptsubscript𝐏𝑡1\displaystyle{\mathbf{P}}_{t}^{-1}bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT =1g2(4ω+g2(t1))(t1)[12(ωg2(t1))(t1)26(2ω+g2(t1))t16(2ω+g2(1+t))t112ω4g2(t1)]absent1superscript𝑔24𝜔superscript𝑔2𝑡1𝑡1matrixmissing-subexpression12𝜔superscript𝑔2𝑡1superscript𝑡1262𝜔superscript𝑔2𝑡1𝑡1missing-subexpression62𝜔superscript𝑔21𝑡𝑡112𝜔4superscript𝑔2𝑡1\displaystyle=\frac{1}{g^{2}(-4\omega+g^{2}(t-1))(t-1)}\begin{bmatrix}&\frac{1% 2(\omega-g^{2}(t-1))}{(t-1)^{2}}&\frac{6(-2\omega+g^{2}(t-1))}{t-1}\\ &\frac{6(-2\omega+g^{2}(-1+t))}{t-1}&12\omega-4g^{2}(t-1)\\ \end{bmatrix}= divide start_ARG 1 end_ARG start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( - 4 italic_ω + italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) ) ( italic_t - 1 ) end_ARG [ start_ARG start_ROW start_CELL end_CELL start_CELL divide start_ARG 12 ( italic_ω - italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) ) end_ARG start_ARG ( italic_t - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG end_CELL start_CELL divide start_ARG 6 ( - 2 italic_ω + italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) ) end_ARG start_ARG italic_t - 1 end_ARG end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL divide start_ARG 6 ( - 2 italic_ω + italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( - 1 + italic_t ) ) end_ARG start_ARG italic_t - 1 end_ARG end_CELL start_CELL 12 italic_ω - 4 italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) end_CELL end_ROW end_ARG ]

Thus,

P10subscript𝑃10\displaystyle P_{10}italic_P start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT =12ω+6g2(t1)g2[4ω+g2(t1)](t1)2=12ωg2[4ω+g2(t1)](t1)2+6[4ω+g2(t1)](t1)absent12𝜔6superscript𝑔2𝑡1superscript𝑔2delimited-[]4𝜔superscript𝑔2𝑡1superscript𝑡1212𝜔superscript𝑔2delimited-[]4𝜔superscript𝑔2𝑡1superscript𝑡126delimited-[]4𝜔superscript𝑔2𝑡1𝑡1\displaystyle=\frac{-12\omega+6g^{2}(t-1)}{g^{2}[-4\omega+g^{2}(t-1)](t-1)^{2}% }=\frac{-12\omega}{g^{2}[-4\omega+g^{2}(t-1)](t-1)^{2}}+\frac{6}{[-4\omega+g^{% 2}(t-1)](t-1)}= divide start_ARG - 12 italic_ω + 6 italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) end_ARG start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ - 4 italic_ω + italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) ] ( italic_t - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG = divide start_ARG - 12 italic_ω end_ARG start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ - 4 italic_ω + italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) ] ( italic_t - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG + divide start_ARG 6 end_ARG start_ARG [ - 4 italic_ω + italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) ] ( italic_t - 1 ) end_ARG
P11subscript𝑃11\displaystyle P_{11}italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT =12ω4g2(t1)g2[4ω+g2(t1)](t1)=12ωg2[4ω+g2(t1)](t1)+4[4ω+g2(t1)]absent12𝜔4superscript𝑔2𝑡1superscript𝑔2delimited-[]4𝜔superscript𝑔2𝑡1𝑡112𝜔superscript𝑔2delimited-[]4𝜔superscript𝑔2𝑡1𝑡14delimited-[]4𝜔superscript𝑔2𝑡1\displaystyle=\frac{12\omega-4g^{2}(t-1)}{g^{2}[-4\omega+g^{2}(t-1)](t-1)}=% \frac{12\omega}{g^{2}[-4\omega+g^{2}(t-1)](t-1)}+\frac{-4}{[-4\omega+g^{2}(t-1% )]}= divide start_ARG 12 italic_ω - 4 italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) end_ARG start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ - 4 italic_ω + italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) ] ( italic_t - 1 ) end_ARG = divide start_ARG 12 italic_ω end_ARG start_ARG italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ - 4 italic_ω + italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) ] ( italic_t - 1 ) end_ARG + divide start_ARG - 4 end_ARG start_ARG [ - 4 italic_ω + italic_g start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t - 1 ) ] end_ARG
Proof.

One can plug in the solution of 𝐏tsubscript𝐏𝑡{\mathbf{P}}_{t}bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT into the Lyapunov equation 𝐏tsubscript𝐏𝑡{\mathbf{P}}_{t}bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and it validates 𝐏tsubscript𝐏𝑡{\mathbf{P}}_{t}bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is indeed the solution.

Remark 10.

Here we provide a general form when the terminal condition of the Lyapunov function is not a zero matrix. It explicitly means that it allows that the velocity does not necessarily need to converge to the exact predefined 𝐯1subscript𝐯1{\mathbf{v}}_{1}bold_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. It will have the same results as shown in the paper by setting ω=0𝜔0\omega=0italic_ω = 0.

Lemma 11.

The state transition function Φ(t,s)Φ𝑡𝑠\Phi(t,s)roman_Φ ( italic_t , italic_s ) of following dynamics,

d𝐦t=[0100]𝐦tdtdsubscript𝐦𝑡matrix0100subscript𝐦𝑡d𝑡\displaystyle{\textnormal{d}}{\mathbf{m}}_{t}=\begin{bmatrix}0&1\\ 0&0\\ \end{bmatrix}{\mathbf{m}}_{t}{\textnormal{d}}td bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t

is,

Φ(t,s)=[1ts01]Φ𝑡𝑠matrix1𝑡𝑠01\displaystyle\Phi(t,s)=\begin{bmatrix}1&t-s\\ 0&1\\ \end{bmatrix}roman_Φ ( italic_t , italic_s ) = [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_t - italic_s end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ]
Proof.

One can easily verify that such ΦΦ\Phiroman_Φ satisfies Φ/t=[0100]ΦΦ𝑡matrix0100Φ\partial\Phi/\partial t=\begin{bmatrix}0&1\\ 0&0\\ \end{bmatrix}\Phi∂ roman_Φ / ∂ italic_t = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] roman_Φ. ∎

Lemma 12 (Chen & Georgiou (2015)).

When R𝑅R\rightarrow\inftyitalic_R → ∞, The optimal control 𝐮tsubscriptsuperscript𝐮𝑡{\mathbf{u}}^{*}_{t}bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of following problem,

𝐮t=[𝟎𝐚t]subscriptsuperscript𝐮𝑡matrix0subscript𝐚𝑡\displaystyle{\mathbf{u}}^{*}_{t}\>=\begin{bmatrix}{\mathbf{0}}\\ {\mathbf{a}}_{t}\end{bmatrix}bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] argmin𝐮t𝒰0T12𝐮t2dt+𝐱1𝖳R𝐱1absentsubscriptargminsubscript𝐮𝑡𝒰superscriptsubscript0𝑇12superscriptdelimited-∥∥subscript𝐮𝑡2d𝑡superscriptsubscript𝐱1𝖳𝑅subscript𝐱1\displaystyle\in\operatorname*{arg\,min}_{{\mathbf{u}}_{t}\in\mathcal{U}}\int_% {0}^{T}\frac{1}{2}\lVert{\mathbf{u}}_{t}\rVert^{2}{\textnormal{d}}t+{\mathbf{x% }}_{1}^{\mathsf{T}}R{\mathbf{x}}_{1}∈ start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ caligraphic_U end_POSTSUBSCRIPT ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∥ bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT d italic_t + bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_R bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
s.tformulae-sequence𝑠𝑡\displaystyle s.t\quaditalic_s . italic_t d𝐦t=[0100]𝐦tdt+𝐮tdt+𝐠td𝐰tdsubscript𝐦𝑡matrix0100subscript𝐦𝑡d𝑡subscript𝐮𝑡d𝑡subscript𝐠𝑡dsubscript𝐰𝑡\displaystyle{\textnormal{d}}{\mathbf{m}}_{t}=\begin{bmatrix}0&1\\ 0&0\\ \end{bmatrix}{\mathbf{m}}_{t}{\textnormal{d}}t+{\mathbf{u}}_{t}{\textnormal{d}% }t+{\mathbf{g}}_{t}{\textnormal{d}}{\mathbf{w}}_{t}d bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t + bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t + bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
𝐦0=m0subscript𝐦0subscript𝑚0\displaystyle{\mathbf{m}}_{0}=m_{0}bold_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT

is given by

𝐮t=𝐠𝐠𝖳𝐏t1(𝐦tΦ(t,1)𝐦1)subscriptsuperscript𝐮𝑡superscript𝐠𝐠𝖳superscriptsubscript𝐏𝑡1subscript𝐦𝑡Φ𝑡1subscript𝐦1\displaystyle{\mathbf{u}}^{*}_{t}=-{\mathbf{g}}{\mathbf{g}}^{\mathsf{T}}{% \mathbf{P}}_{t}^{-1}\left({\mathbf{m}}_{t}-\Phi(t,1){\mathbf{m}}_{1}\right)bold_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - bold_gg start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - roman_Φ ( italic_t , 1 ) bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

Where 𝐏tsubscript𝐏𝑡{\mathbf{P}}_{t}bold_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT follows Lyapunov equation (eq.19) with boundary condition 𝐏1=𝟎subscript𝐏10{\mathbf{P}}_{1}=\mathbf{0}bold_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = bold_0. and function Φ(t,s)Φ𝑡𝑠\Phi(t,s)roman_Φ ( italic_t , italic_s ) is the transition matrix from time-step s𝑠sitalic_s to time-step t𝑡titalic_t given uncontrolled dynamics.

And it is indeed the stochastic bridge of the following system:

d𝐦t=[0100]𝐦tdt+𝐮tdt+gd𝐰tdsubscript𝐦𝑡matrix0100subscript𝐦𝑡d𝑡subscript𝐮𝑡d𝑡𝑔dsubscript𝐰𝑡\displaystyle{\textnormal{d}}{\mathbf{m}}_{t}=\begin{bmatrix}0&1\\ 0&0\\ \end{bmatrix}{\mathbf{m}}_{t}{\textnormal{d}}t+{\mathbf{u}}_{t}{\textnormal{d}% }t+g{\textnormal{d}}{\mathbf{w}}_{t}d bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 0 end_CELL end_ROW end_ARG ] bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t + bold_u start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t + italic_g d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (22)
𝐦0=m0,𝐦1=m1formulae-sequencesubscript𝐦0subscript𝑚0subscript𝐦1subscript𝑚1\displaystyle{\mathbf{m}}_{0}=m_{0},\quad{\mathbf{m}}_{1}=m_{1}bold_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (23)
Proof.

See page 8 in Chen & Georgiou (2015). ∎

D.3 Mean and Covariance of SDE

By plugging the optimal control into the system, one can obtain the system as:

d𝐦tdsubscript𝐦𝑡\displaystyle{\textnormal{d}}{\mathbf{m}}_{t}d bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =[𝐯t𝐅t]dt+𝐠td𝐰tabsentmatrixsubscript𝐯𝑡subscript𝐅𝑡d𝑡subscript𝐠𝑡dsubscript𝐰𝑡\displaystyle=\begin{bmatrix}{\mathbf{v}}_{t}\\ {\mathbf{F}}_{t}\\ \end{bmatrix}{\textnormal{d}}t+{\mathbf{g}}_{t}{\textnormal{d}}{\mathbf{w}}_{t}= [ start_ARG start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] d italic_t + bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
=[𝐯tgt2P11(𝐱1𝐱t1t𝐯t)]dt+𝐠td𝐰tabsentmatrixsubscript𝐯𝑡superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐱1subscript𝐱𝑡1𝑡subscript𝐯𝑡d𝑡subscript𝐠𝑡dsubscript𝐰𝑡\displaystyle=\begin{bmatrix}{\mathbf{v}}_{t}\\ g_{t}^{2}P_{11}\left(\frac{{\mathbf{x}}_{1}-{\mathbf{x}}_{t}}{1-t}-{\mathbf{v}% }_{t}\right)\\ \end{bmatrix}{\textnormal{d}}t+{\mathbf{g}}_{t}{\textnormal{d}}{\mathbf{w}}_{t}= [ start_ARG start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG - bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ] d italic_t + bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
=[𝟎𝟏gt2P111tgt2P11]𝐅t~[𝐱t𝐯t]dt+[𝟎gt2P111t𝐱1]𝐃~tdt+𝐠td𝐰tabsentsubscriptmatrix01superscriptsubscript𝑔𝑡2subscript𝑃111𝑡superscriptsubscript𝑔𝑡2subscript𝑃11~subscript𝐅𝑡matrixsubscript𝐱𝑡subscript𝐯𝑡d𝑡subscriptmatrix0superscriptsubscript𝑔𝑡2subscript𝑃111𝑡subscript𝐱1subscript~𝐃𝑡d𝑡subscript𝐠𝑡dsubscript𝐰𝑡\displaystyle=\underbrace{\begin{bmatrix}{\mathbf{0}}&\mathbf{1}\\ -\frac{g_{t}^{2}P_{11}}{1-t}&-g_{t}^{2}P_{11}\\ \end{bmatrix}}_{\tilde{{\mathbf{F}}_{t}}}\begin{bmatrix}{\mathbf{x}}_{t}\\ {\mathbf{v}}_{t}\end{bmatrix}{\textnormal{d}}t+\underbrace{\begin{bmatrix}{% \mathbf{0}}\\ \frac{g_{t}^{2}P_{11}}{1-t}{\mathbf{x}}_{1}\end{bmatrix}}_{\tilde{{\mathbf{D}}% }_{t}}{\textnormal{d}}t+{\mathbf{g}}_{t}{\textnormal{d}}{\mathbf{w}}_{t}= under⏟ start_ARG [ start_ARG start_ROW start_CELL bold_0 end_CELL start_CELL bold_1 end_CELL end_ROW start_ROW start_CELL - divide start_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG end_CELL start_CELL - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT over~ start_ARG bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG end_POSTSUBSCRIPT [ start_ARG start_ROW start_CELL bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] d italic_t + under⏟ start_ARG [ start_ARG start_ROW start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL divide start_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] end_ARG start_POSTSUBSCRIPT over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT d italic_t + bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

We follow the recipe of Särkkä & Solin (2019). The mean 𝝁tsubscript𝝁𝑡{\bm{\mu}_{t}}bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and variance 𝚺tsubscript𝚺𝑡{\bm{\Sigma}_{t}}bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT of the matrix of random variable 𝐦tsubscript𝐦𝑡{\mathbf{m}}_{t}bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT obey the following respective ordinary differential equations (ODEs):

d𝝁tdsubscript𝝁𝑡\displaystyle{\textnormal{d}}{\bm{\mu}_{t}}d bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =𝐅~t𝝁tdt+𝐃~tdtabsentsubscript~𝐅𝑡subscript𝝁𝑡d𝑡subscript~𝐃𝑡d𝑡\displaystyle=\tilde{{\mathbf{F}}}_{t}{\bm{\mu}_{t}}{\textnormal{d}}t+\tilde{{% \mathbf{D}}}_{t}{\textnormal{d}}t= over~ start_ARG bold_F end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t + over~ start_ARG bold_D end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t
d𝚺tdsubscript𝚺𝑡\displaystyle{\textnormal{d}}{\bm{\Sigma}_{t}}d bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =𝐅~t𝚺tdt+[𝐅~t𝚺t]𝖳dt+𝐠𝐠𝖳dtabsentsubscript~𝐅𝑡subscript𝚺𝑡d𝑡superscriptdelimited-[]subscript~𝐅𝑡subscript𝚺𝑡𝖳d𝑡superscript𝐠𝐠𝖳d𝑡\displaystyle=\tilde{{\mathbf{F}}}_{t}{\bm{\Sigma}_{t}}{\textnormal{d}}t+\left% [\tilde{{\mathbf{F}}}_{t}{\bm{\Sigma}_{t}}\right]^{\mathsf{T}}{\textnormal{d}}% t+{\mathbf{g}}{\mathbf{g}}^{\mathsf{T}}{\textnormal{d}}t= over~ start_ARG bold_F end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d italic_t + [ over~ start_ARG bold_F end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT d italic_t + bold_gg start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT d italic_t

One can solve it by numerically simulating two ODEs whose dimension is just two. Or one can use software such as Inc. (2022) to get analytic solutions. If you opt to the later approach, you can get:

μtxsubscriptsuperscript𝜇𝑥𝑡\displaystyle{\mu^{x}_{t}}italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =13𝐱1t2(t24t+6)absent13subscript𝐱1superscript𝑡2superscript𝑡24𝑡6\displaystyle=\frac{1}{3}{\mathbf{x}}_{1}t^{2}(t^{2}-4t+6)= divide start_ARG 1 end_ARG start_ARG 3 end_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 italic_t + 6 )
μtvsubscriptsuperscript𝜇𝑣𝑡\displaystyle{\mu^{v}_{t}}italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =4t𝐱13(t23t+3)absent4𝑡subscript𝐱13superscript𝑡23𝑡3\displaystyle=\frac{4t{\mathbf{x}}_{1}}{3}(t^{2}-3t+3)= divide start_ARG 4 italic_t bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 3 end_ARG ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 3 italic_t + 3 )
ΣtxxsubscriptsuperscriptΣ𝑥𝑥𝑡\displaystyle{\Sigma^{xx}_{t}}roman_Σ start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =19{(1+t)2[9+2(1+k)t(3+(3+t)t)(3+t[3+(3+t)t])]}absent19superscript1𝑡2delimited-[]921𝑘𝑡33𝑡𝑡3𝑡delimited-[]33𝑡𝑡\displaystyle=-\frac{1}{9}\left\{(-1+t)^{2}\left[-9+2(-1+k)t\left(3+(-3+t)t% \right)\left(3+t\left[3+(-3+t)t\right]\right)\right]\right\}= - divide start_ARG 1 end_ARG start_ARG 9 end_ARG { ( - 1 + italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [ - 9 + 2 ( - 1 + italic_k ) italic_t ( 3 + ( - 3 + italic_t ) italic_t ) ( 3 + italic_t [ 3 + ( - 3 + italic_t ) italic_t ] ) ] }
ΣtxvsubscriptsuperscriptΣ𝑥𝑣𝑡\displaystyle{\Sigma^{xv}_{t}}roman_Σ start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =19{(1+t)[t(3+(3+t)t)(9+8t(3+(3+t)t))+k(9t(3+(3+t)t)(9+8t(3+(3+t)t)))]}absent191𝑡delimited-[]𝑡33𝑡𝑡98𝑡33𝑡𝑡𝑘9𝑡33𝑡𝑡98𝑡33𝑡𝑡\displaystyle=\frac{1}{9}\left\{(-1+t)\left[t\left(3+(-3+t)t\right)\left(9+8t% \left(3+(-3+t)t\right)\right)+k\left(9-t\left(3+(-3+t)t\right)\left(9+8t\left(% 3+(-3+t)t\right)\right)\right)\right]\right\}= divide start_ARG 1 end_ARG start_ARG 9 end_ARG { ( - 1 + italic_t ) [ italic_t ( 3 + ( - 3 + italic_t ) italic_t ) ( 9 + 8 italic_t ( 3 + ( - 3 + italic_t ) italic_t ) ) + italic_k ( 9 - italic_t ( 3 + ( - 3 + italic_t ) italic_t ) ( 9 + 8 italic_t ( 3 + ( - 3 + italic_t ) italic_t ) ) ) ] }
ΣtvvsubscriptsuperscriptΣ𝑣𝑣𝑡\displaystyle{\Sigma^{vv}_{t}}roman_Σ start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =189(1+k)t[3+(3+t)t]{3+4t(3+(3+t)t)}absent1891𝑘𝑡delimited-[]33𝑡𝑡34𝑡33𝑡𝑡\displaystyle=1-\frac{8}{9}(-1+k)t\left[3+(-3+t)t\right]\left\{-3+4t\left(3+(-% 3+t)t\right)\right\}= 1 - divide start_ARG 8 end_ARG start_ARG 9 end_ARG ( - 1 + italic_k ) italic_t [ 3 + ( - 3 + italic_t ) italic_t ] { - 3 + 4 italic_t ( 3 + ( - 3 + italic_t ) italic_t ) }
Remark 13.

The expressions above are too complicated. Hence, we provide the Python functional bracket in Appendix.E.1 with general initial covariance and diffusion coefficient for easy copy-paste. The equations above are ones we used throughout this paper and feel free to play around with other hyperparameters.

D.4 Derivation from SDE to ODE for phase dynamics

One can represent the dynamics in the form of,

[d𝐱td𝐯t]=[𝐯t𝐅t]dt+[𝟎𝟎𝟎gt]d𝐰ts.t𝐦0:=[𝐱0𝐯0]𝒩(𝝁0,𝚺0)formulae-sequencematrixdsubscript𝐱𝑡dsubscript𝐯𝑡matrixsubscript𝐯𝑡subscript𝐅𝑡d𝑡matrix000subscript𝑔𝑡dsubscript𝐰𝑡s.tassignsubscript𝐦0matrixsubscript𝐱0subscript𝐯0similar-to𝒩subscript𝝁0subscript𝚺0\displaystyle\begin{bmatrix}{\textnormal{d}}{\mathbf{x}}_{t}\\ {\textnormal{d}}{\mathbf{v}}_{t}\end{bmatrix}=\begin{bmatrix}{\mathbf{v}}_{t}% \\ {\mathbf{F}}_{t}\end{bmatrix}{\textnormal{d}}t+\begin{bmatrix}{\mathbf{0}}&{% \mathbf{0}}\\ {\mathbf{0}}&g_{t}\end{bmatrix}{\textnormal{d}}{\mathbf{w}}_{t}\quad\text{s.t}% \quad{\mathbf{m}}_{0}:=\begin{bmatrix}{\mathbf{x}}_{0}\\ {\mathbf{v}}_{0}\end{bmatrix}\sim\mathcal{N}({\bm{\mu}}_{0},{\bm{\Sigma}}_{0})[ start_ARG start_ROW start_CELL d bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL d bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] d italic_t + [ start_ARG start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT s.t bold_m start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := [ start_ARG start_ROW start_CELL bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_v start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ∼ caligraphic_N ( bold_italic_μ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) (24)

as

d𝐦t=f(𝐦t)dt+𝐠td𝐰tdsubscript𝐦𝑡𝑓subscript𝐦𝑡d𝑡subscript𝐠𝑡dsubscript𝐰𝑡\displaystyle{\textnormal{d}}{\mathbf{m}}_{t}=f({\mathbf{m}}_{t}){\textnormal{% d}}t+{\mathbf{g}}_{t}{\textnormal{d}}{\mathbf{w}}_{t}d bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_f ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) d italic_t + bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT d bold_w start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

And its corresponding Fokker-Planck Partial Differential Equation Øksendal (2003) reads,

ptt=d𝐦i[fi(𝐦,t)pt(𝐦t)]+12d2𝐦i𝐦j[d𝐠t𝐠t𝖳pt(𝐦t)]subscript𝑝𝑡𝑡subscript𝑑subscript𝐦𝑖delimited-[]subscript𝑓𝑖𝐦𝑡subscript𝑝𝑡subscript𝐦𝑡12subscript𝑑superscript2subscript𝐦𝑖subscript𝐦𝑗delimited-[]subscript𝑑subscript𝐠𝑡superscriptsubscript𝐠𝑡𝖳subscript𝑝𝑡subscript𝐦𝑡\displaystyle\frac{\partial p_{t}}{\partial t}=-\sum_{d}\frac{\partial}{% \partial{\mathbf{m}}_{i}}[f_{i}({\mathbf{m}},t)p_{t}({\mathbf{m}}_{t})]+\frac{% 1}{2}\sum_{d}\frac{\partial^{2}}{\partial{\mathbf{m}}_{i}{\mathbf{m}}_{j}}% \left[\sum_{d}{\mathbf{g}}_{t}{\mathbf{g}}_{t}^{\mathsf{T}}p_{t}({\mathbf{m}}_% {t})\right]divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_t end_ARG = - ∑ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ bold_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG [ italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_m , italic_t ) italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] + divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∑ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT divide start_ARG ∂ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT bold_m start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_ARG [ ∑ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] (25)

According to eq.(37) in Song et al. (2020b), One can rewrite such PDE,

pttsubscript𝑝𝑡𝑡\displaystyle\frac{\partial p_{t}}{\partial t}divide start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_t end_ARG =d𝐦i{fi(𝐦t,t)pt(𝐦t)12[p(𝐦t)𝐦(𝐠t𝐠t𝖳)+p(𝐦t)𝐠t𝐠t𝖳𝐦logp(𝐦t)]}absentsubscript𝑑subscript𝐦𝑖subscript𝑓𝑖subscript𝐦𝑡𝑡subscript𝑝𝑡subscript𝐦𝑡12delimited-[]𝑝subscript𝐦𝑡subscript𝐦subscript𝐠𝑡superscriptsubscript𝐠𝑡𝖳𝑝subscript𝐦𝑡subscript𝐠𝑡superscriptsubscript𝐠𝑡𝖳subscript𝐦𝑝subscript𝐦𝑡\displaystyle=-\sum_{d}\frac{\partial}{\partial{\mathbf{m}}_{i}}\left\{f_{i}({% \mathbf{m}}_{t},t)p_{t}({\mathbf{m}}_{t})-\frac{1}{2}\left[p({\mathbf{m}}_{t})% \nabla_{{\mathbf{m}}}\cdot({\mathbf{g}}_{t}{\mathbf{g}}_{t}^{\mathsf{T}})+p({% \mathbf{m}}_{t}){\mathbf{g}}_{t}{\mathbf{g}}_{t}^{\mathsf{T}}\nabla_{{\mathbf{% m}}}\log p({\mathbf{m}}_{t})\right]\right\}= - ∑ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ bold_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG { italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG [ italic_p ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∇ start_POSTSUBSCRIPT bold_m end_POSTSUBSCRIPT ⋅ ( bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ) + italic_p ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_m end_POSTSUBSCRIPT roman_log italic_p ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] } (26)
due to the fact 𝐠t[𝟎𝟎𝟎gt]due to the fact subscript𝐠𝑡matrix000subscript𝑔𝑡\displaystyle\text{due to the fact }{\mathbf{g}}_{t}\equiv\begin{bmatrix}{% \mathbf{0}}&{\mathbf{0}}\\ {\mathbf{0}}&g_{t}\end{bmatrix}due to the fact bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≡ [ start_ARG start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] (27)
=d𝐦i{fi(𝐦t,t)pt(𝐦t)12p(𝐦t)[gt2𝐯logp(𝐦t)]}absentsubscript𝑑subscript𝐦𝑖subscript𝑓𝑖subscript𝐦𝑡𝑡subscript𝑝𝑡subscript𝐦𝑡12𝑝subscript𝐦𝑡delimited-[]superscriptsubscript𝑔𝑡2subscript𝐯𝑝subscript𝐦𝑡\displaystyle=-\sum_{d}\frac{\partial}{\partial{\mathbf{m}}_{i}}\left\{f_{i}({% \mathbf{m}}_{t},t)p_{t}({\mathbf{m}}_{t})-\frac{1}{2}p({\mathbf{m}}_{t})\left[% g_{t}^{2}\nabla_{{\mathbf{v}}}\log p({\mathbf{m}}_{t})\right]\right\}= - ∑ start_POSTSUBSCRIPT italic_d end_POSTSUBSCRIPT divide start_ARG ∂ end_ARG start_ARG ∂ bold_m start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG { italic_f start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_p ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) [ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_v end_POSTSUBSCRIPT roman_log italic_p ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] } (28)

Then one can get the equivalent ODE:

d𝐦t=[f(𝐦t,t)12gt2𝐯logp(𝐦,t)]dtdsubscript𝐦𝑡delimited-[]𝑓subscript𝐦𝑡𝑡12superscriptsubscript𝑔𝑡2subscript𝐯𝑝𝐦𝑡d𝑡\displaystyle{\textnormal{d}}{\mathbf{m}}_{t}=\left[f({\mathbf{m}}_{t},t)-% \frac{1}{2}g_{t}^{2}\nabla_{{\mathbf{v}}}\log p({\mathbf{m}},t)\right]{% \textnormal{d}}td bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ italic_f ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT bold_v end_POSTSUBSCRIPT roman_log italic_p ( bold_m , italic_t ) ] d italic_t (29)

D.5 Decomposition of Covariance Matrix and representation of score

Here we follow the procedure in Dockhorn et al. (2021). Given the covariance matrix 𝚺tsubscript𝚺𝑡{\bm{\Sigma}_{t}}bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the decomposition of the positive definite symmetric matrix is,

𝚺t=𝐋t𝖳𝐋tsubscript𝚺𝑡superscriptsubscript𝐋𝑡𝖳subscript𝐋𝑡\displaystyle{\bm{\Sigma}_{t}}={\mathbf{L}}_{t}^{\mathsf{T}}{\mathbf{L}}_{t}bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT sansserif_T end_POSTSUPERSCRIPT bold_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (30)

Where,

𝐋t=[LtxxLtxvLtxvLtvv]=[Σtxx 0ΣtxvΣtxx ΣtxxΣtvvΣtvvΣtxx ]subscript𝐋𝑡matrixsubscriptsuperscript𝐿𝑥𝑥𝑡subscriptsuperscript𝐿𝑥𝑣𝑡subscriptsuperscript𝐿𝑥𝑣𝑡subscriptsuperscript𝐿𝑣𝑣𝑡matrixΣtxx 0superscriptsubscriptΣ𝑡𝑥𝑣Σtxx ΣtxxΣtvvΣtvvΣtxx \displaystyle{\mathbf{L}}_{t}=\begin{bmatrix}{L^{xx}_{t}}&{L^{xv}_{t}}\\ {L^{xv}_{t}}&{L^{vv}_{t}}\end{bmatrix}=\begin{bmatrix}\mathchoice{{\hbox{$% \displaystyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=8.03886% pt,depth=-6.43112pt}}}{{\hbox{$\textstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4% pt\hbox{\vrule height=8.03886pt,depth=-6.43112pt}}}{{\hbox{$\scriptstyle\sqrt{% \Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=5.64444pt,depth=-4.51558pt% }}}{{\hbox{$\scriptscriptstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{% \vrule height=4.27777pt,depth=-3.42224pt}}}&0\\ \frac{\Sigma_{t}^{xv}}{\mathchoice{{\hbox{$\displaystyle\sqrt{\Sigma_{t}^{xx}% \,}$}\lower 0.4pt\hbox{\vrule height=5.62721pt,depth=-4.5018pt}}}{{\hbox{$% \textstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=5.62721pt,% depth=-4.5018pt}}}{{\hbox{$\scriptstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt% \hbox{\vrule height=3.95111pt,depth=-3.1609pt}}}{{\hbox{$\scriptscriptstyle% \sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=2.99445pt,depth=-2.3% 9557pt}}}}&\mathchoice{{\hbox{$\displaystyle\sqrt{\frac{\Sigma_{t}^{xx}\Sigma_% {t}^{vv}-\Sigma_{t}^{vv}}{\Sigma_{t}^{xx}}\,}$}\lower 0.4pt\hbox{\vrule height% =14.64163pt,depth=-11.71336pt}}}{{\hbox{$\textstyle\sqrt{\frac{\Sigma_{t}^{xx}% \Sigma_{t}^{vv}-\Sigma_{t}^{vv}}{\Sigma_{t}^{xx}}\,}$}\lower 0.4pt\hbox{\vrule h% eight=10.3119pt,depth=-8.24956pt}}}{{\hbox{$\scriptstyle\sqrt{\frac{\Sigma_{t}% ^{xx}\Sigma_{t}^{vv}-\Sigma_{t}^{vv}}{\Sigma_{t}^{xx}}\,}$}\lower 0.4pt\hbox{% \vrule height=8.2619pt,depth=-6.60956pt}}}{{\hbox{$\scriptscriptstyle\sqrt{% \frac{\Sigma_{t}^{xx}\Sigma_{t}^{vv}-\Sigma_{t}^{vv}}{\Sigma_{t}^{xx}}\,}$}% \lower 0.4pt\hbox{\vrule height=8.2619pt,depth=-6.60956pt}}}\\ \end{bmatrix}bold_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL start_CELL italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL start_CELL italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] = [ start_ARG start_ROW start_CELL roman_Σtxx end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL divide start_ARG roman_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT end_ARG start_ARG roman_Σtxx end_ARG end_CELL start_CELL ΣtxxΣtvv-ΣtvvΣtxx end_CELL end_ROW end_ARG ] (31)

We borrow results from Dockhorn et al. (2021), the score function reads,

𝐦logp(𝐦t|𝐦1)subscript𝐦𝑝conditionalsubscript𝐦𝑡subscript𝐦1\displaystyle\nabla_{{\mathbf{m}}}\log p({\mathbf{m}}_{t}|{\mathbf{m}}_{1})∇ start_POSTSUBSCRIPT bold_m end_POSTSUBSCRIPT roman_log italic_p ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) =𝐦t12(𝐦t𝝁t)𝚺t1(𝐦t𝝁t)absentsubscriptsubscript𝐦𝑡12subscript𝐦𝑡subscript𝝁𝑡superscriptsubscript𝚺𝑡1subscript𝐦𝑡subscript𝝁𝑡\displaystyle=-\nabla_{{\mathbf{m}}_{t}}\frac{1}{2}({\mathbf{m}}_{t}-{\bm{\mu}% _{t}}){\bm{\Sigma}_{t}}^{-1}({\mathbf{m}}_{t}-{\bm{\mu}_{t}})= - ∇ start_POSTSUBSCRIPT bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 2 end_ARG ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
=𝚺t1(𝐦t𝝁t)absentsuperscriptsubscript𝚺𝑡1subscript𝐦𝑡subscript𝝁𝑡\displaystyle=-{\bm{\Sigma}_{t}}^{-1}({\mathbf{m}}_{t}-{\bm{\mu}_{t}})= - bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
Cholesky decomposition of 𝚺tsubscript𝚺𝑡{\bm{\Sigma}_{t}}bold_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
=𝐋T𝐋1(𝐦t𝝁t)absentsuperscript𝐋𝑇superscript𝐋1subscript𝐦𝑡subscript𝝁𝑡\displaystyle=-{\mathbf{L}}^{-T}{\mathbf{L}}^{-1}({\mathbf{m}}_{t}-{\bm{\mu}_{% t}})= - bold_L start_POSTSUPERSCRIPT - italic_T end_POSTSUPERSCRIPT bold_L start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
=𝐋Tϵabsentsuperscript𝐋𝑇italic-ϵ\displaystyle=-{\mathbf{L}}^{-T}\epsilon= - bold_L start_POSTSUPERSCRIPT - italic_T end_POSTSUPERSCRIPT italic_ϵ

The form of 𝐋𝐋{\mathbf{L}}bold_L reads,

𝐋t=[Σtxx 0ΣtxvΣtxx ΣtxxΣtvv(Σtxv)2Σtxx ]subscript𝐋𝑡matrixΣtxx 0superscriptsubscriptΣ𝑡𝑥𝑣Σtxx ΣtxxΣtvv(Σtxv)2Σtxx \displaystyle{\mathbf{L}}_{t}=\begin{bmatrix}\mathchoice{{\hbox{$\displaystyle% \sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=8.03886pt,depth=-6.4% 3112pt}}}{{\hbox{$\textstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule h% eight=8.03886pt,depth=-6.43112pt}}}{{\hbox{$\scriptstyle\sqrt{\Sigma_{t}^{xx}% \,}$}\lower 0.4pt\hbox{\vrule height=5.64444pt,depth=-4.51558pt}}}{{\hbox{$% \scriptscriptstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=4.% 27777pt,depth=-3.42224pt}}}&0\\ \frac{\Sigma_{t}^{xv}}{\mathchoice{{\hbox{$\displaystyle\sqrt{\Sigma_{t}^{xx}% \,}$}\lower 0.4pt\hbox{\vrule height=5.62721pt,depth=-4.5018pt}}}{{\hbox{$% \textstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=5.62721pt,% depth=-4.5018pt}}}{{\hbox{$\scriptstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt% \hbox{\vrule height=3.95111pt,depth=-3.1609pt}}}{{\hbox{$\scriptscriptstyle% \sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=2.99445pt,depth=-2.3% 9557pt}}}}&\mathchoice{{\hbox{$\displaystyle\sqrt{\frac{\Sigma_{t}^{xx}\Sigma_% {t}^{vv}-(\Sigma_{t}^{xv})^{2}}{\Sigma_{t}^{xx}}\,}$}\lower 0.4pt\hbox{\vrule h% eight=16.68498pt,depth=-13.34804pt}}}{{\hbox{$\textstyle\sqrt{\frac{\Sigma_{t}% ^{xx}\Sigma_{t}^{vv}-(\Sigma_{t}^{xv})^{2}}{\Sigma_{t}^{xx}}\,}$}\lower 0.4pt% \hbox{\vrule height=11.72618pt,depth=-9.38098pt}}}{{\hbox{$\scriptstyle\sqrt{% \frac{\Sigma_{t}^{xx}\Sigma_{t}^{vv}-(\Sigma_{t}^{xv})^{2}}{\Sigma_{t}^{xx}}\,% }$}\lower 0.4pt\hbox{\vrule height=9.04285pt,depth=-7.23431pt}}}{{\hbox{$% \scriptscriptstyle\sqrt{\frac{\Sigma_{t}^{xx}\Sigma_{t}^{vv}-(\Sigma_{t}^{xv})% ^{2}}{\Sigma_{t}^{xx}}\,}$}\lower 0.4pt\hbox{\vrule height=9.04285pt,depth=-7.% 23431pt}}}\end{bmatrix}bold_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = [ start_ARG start_ROW start_CELL roman_Σtxx end_CELL start_CELL 0 end_CELL end_ROW start_ROW start_CELL divide start_ARG roman_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT end_ARG start_ARG roman_Σtxx end_ARG end_CELL start_CELL ΣtxxΣtvv-(Σtxv)2Σtxx end_CELL end_ROW end_ARG ]

and the transpose inverse of 𝐋𝐋{\mathbf{L}}bold_L reads,

𝐋tT=[1(Σtxx+ϵxx) Σtxv(Σtxx) (Σtxx)(Σtvv+)(Σtxv)2 0Σtxx (Σtxx)(Σtvv)(Σtxv)2 ]superscriptsubscript𝐋𝑡𝑇matrix1(Σtxx+ϵxx) superscriptsubscriptΣ𝑡𝑥𝑣(Σtxx) (Σtxx)(Σtvv+)(Σtxv)2 0Σtxx (Σtxx)(Σtvv)(Σtxv)2 \displaystyle{\mathbf{L}}_{t}^{-T}=\begin{bmatrix}\frac{1}{\mathchoice{{\hbox{% $\displaystyle\sqrt{(\Sigma_{t}^{xx}+\epsilon_{xx})\,}$}\lower 0.4pt\hbox{% \vrule height=5.62721pt,depth=-4.5018pt}}}{{\hbox{$\textstyle\sqrt{(\Sigma_{t}% ^{xx}+\epsilon_{xx})\,}$}\lower 0.4pt\hbox{\vrule height=5.62721pt,depth=-4.50% 18pt}}}{{\hbox{$\scriptstyle\sqrt{(\Sigma_{t}^{xx}+\epsilon_{xx})\,}$}\lower 0% .4pt\hbox{\vrule height=3.95111pt,depth=-3.1609pt}}}{{\hbox{$% \scriptscriptstyle\sqrt{(\Sigma_{t}^{xx}+\epsilon_{xx})\,}$}\lower 0.4pt\hbox{% \vrule height=2.99445pt,depth=-2.39557pt}}}}&\frac{-\Sigma_{t}^{xv}}{% \mathchoice{{\hbox{$\displaystyle\sqrt{(\Sigma_{t}^{xx})\,}$}\lower 0.4pt\hbox% {\vrule height=5.62721pt,depth=-4.5018pt}}}{{\hbox{$\textstyle\sqrt{(\Sigma_{t% }^{xx})\,}$}\lower 0.4pt\hbox{\vrule height=5.62721pt,depth=-4.5018pt}}}{{% \hbox{$\scriptstyle\sqrt{(\Sigma_{t}^{xx})\,}$}\lower 0.4pt\hbox{\vrule height% =3.95111pt,depth=-3.1609pt}}}{{\hbox{$\scriptscriptstyle\sqrt{(\Sigma_{t}^{xx}% )\,}$}\lower 0.4pt\hbox{\vrule height=2.99445pt,depth=-2.39557pt}}}\mathchoice% {{\hbox{$\displaystyle\sqrt{(\Sigma_{t}^{xx})(\Sigma_{t}^{vv}+)-(\Sigma_{t}^{% xv})^{2}\,}$}\lower 0.4pt\hbox{\vrule height=6.5131pt,depth=-5.21051pt}}}{{% \hbox{$\textstyle\sqrt{(\Sigma_{t}^{xx})(\Sigma_{t}^{vv}+)-(\Sigma_{t}^{xv})^{% 2}\,}$}\lower 0.4pt\hbox{\vrule height=6.5131pt,depth=-5.21051pt}}}{{\hbox{$% \scriptstyle\sqrt{(\Sigma_{t}^{xx})(\Sigma_{t}^{vv}+)-(\Sigma_{t}^{xv})^{2}\,}% $}\lower 0.4pt\hbox{\vrule height=4.57721pt,depth=-3.66179pt}}}{{\hbox{$% \scriptscriptstyle\sqrt{(\Sigma_{t}^{xx})(\Sigma_{t}^{vv}+)-(\Sigma_{t}^{xv})^% {2}\,}$}\lower 0.4pt\hbox{\vrule height=3.52722pt,depth=-2.8218pt}}}}\\ 0&\frac{\mathchoice{{\hbox{$\displaystyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4% pt\hbox{\vrule height=5.62721pt,depth=-4.5018pt}}}{{\hbox{$\textstyle\sqrt{% \Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=5.62721pt,depth=-4.5018pt}% }}{{\hbox{$\scriptstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule heig% ht=3.95111pt,depth=-3.1609pt}}}{{\hbox{$\scriptscriptstyle\sqrt{\Sigma_{t}^{xx% }\,}$}\lower 0.4pt\hbox{\vrule height=2.99445pt,depth=-2.39557pt}}}}{% \mathchoice{{\hbox{$\displaystyle\sqrt{(\Sigma_{t}^{xx})(\Sigma_{t}^{vv})-(% \Sigma_{t}^{xv})^{2}\,}$}\lower 0.4pt\hbox{\vrule height=6.5131pt,depth=-5.210% 51pt}}}{{\hbox{$\textstyle\sqrt{(\Sigma_{t}^{xx})(\Sigma_{t}^{vv})-(\Sigma_{t}% ^{xv})^{2}\,}$}\lower 0.4pt\hbox{\vrule height=6.5131pt,depth=-5.21051pt}}}{{% \hbox{$\scriptstyle\sqrt{(\Sigma_{t}^{xx})(\Sigma_{t}^{vv})-(\Sigma_{t}^{xv})^% {2}\,}$}\lower 0.4pt\hbox{\vrule height=4.57721pt,depth=-3.66179pt}}}{{\hbox{$% \scriptscriptstyle\sqrt{(\Sigma_{t}^{xx})(\Sigma_{t}^{vv})-(\Sigma_{t}^{xv})^{% 2}\,}$}\lower 0.4pt\hbox{\vrule height=3.52722pt,depth=-2.8218pt}}}}\end{bmatrix}bold_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - italic_T end_POSTSUPERSCRIPT = [ start_ARG start_ROW start_CELL divide start_ARG 1 end_ARG start_ARG (Σtxx+ϵxx) end_ARG end_CELL start_CELL divide start_ARG - roman_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT end_ARG start_ARG (Σtxx) (Σtxx)(Σtvv+)-(Σtxv)2 end_ARG end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL divide start_ARG roman_Σtxx end_ARG start_ARG (Σtxx)(Σtvv)-(Σtxv)2 end_ARG end_CELL end_ROW end_ARG ]

Hence, the score function reads,

𝐯logp(𝐦t|𝐦1)subscript𝐯𝑝conditionalsubscript𝐦𝑡subscript𝐦1\displaystyle\nabla_{{\mathbf{v}}}\log p({\mathbf{m}}_{t}|{\mathbf{m}}_{1})∇ start_POSTSUBSCRIPT bold_v end_POSTSUBSCRIPT roman_log italic_p ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_m start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) =Σtxx (Σtxx+ϵxx)(Σtvv+ϵvv)(Σtxv)2 tϵ1absentsubscriptΣtxx (Σtxx+ϵxx)(Σtvv+ϵvv)(Σtxv)2 subscript𝑡subscriptbold-italic-ϵ1\displaystyle=-\underbrace{\frac{\mathchoice{{\hbox{$\displaystyle\sqrt{\Sigma% _{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=8.03886pt,depth=-6.43112pt}}}{{% \hbox{$\textstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=8.0% 3886pt,depth=-6.43112pt}}}{{\hbox{$\scriptstyle\sqrt{\Sigma_{t}^{xx}\,}$}% \lower 0.4pt\hbox{\vrule height=5.64444pt,depth=-4.51558pt}}}{{\hbox{$% \scriptscriptstyle\sqrt{\Sigma_{t}^{xx}\,}$}\lower 0.4pt\hbox{\vrule height=4.% 27777pt,depth=-3.42224pt}}}}{\mathchoice{{\hbox{$\displaystyle\sqrt{(\Sigma_{t% }^{xx}+\epsilon_{xx})(\Sigma_{t}^{vv}+\epsilon_{vv})-(\Sigma_{t}^{xv})^{2}\,}$% }\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{\hbox{$% \textstyle\sqrt{(\Sigma_{t}^{xx}+\epsilon_{xx})(\Sigma_{t}^{vv}+\epsilon_{vv})% -(\Sigma_{t}^{xv})^{2}\,}$}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.% 44359pt}}}{{\hbox{$\scriptstyle\sqrt{(\Sigma_{t}^{xx}+\epsilon_{xx})(\Sigma_{t% }^{vv}+\epsilon_{vv})-(\Sigma_{t}^{xv})^{2}\,}$}\lower 0.4pt\hbox{\vrule heigh% t=6.53888pt,depth=-5.23112pt}}}{{\hbox{$\scriptscriptstyle\sqrt{(\Sigma_{t}^{% xx}+\epsilon_{xx})(\Sigma_{t}^{vv}+\epsilon_{vv})-(\Sigma_{t}^{xv})^{2}\,}$}% \lower 0.4pt\hbox{\vrule height=5.03888pt,depth=-4.03113pt}}}}}_{\ell_{t}}{\bm% {\epsilon}_{1}}= - under⏟ start_ARG divide start_ARG roman_Σtxx end_ARG start_ARG (Σtxx+ϵxx)(Σtvv+ϵvv)-(Σtxv)2 end_ARG end_ARG start_POSTSUBSCRIPT roman_ℓ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

D.6 Representation of acceleration 𝐚tsubscript𝐚𝑡{\mathbf{a}}_{t}bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

As been shown in Proposition.3, the optimal control can be represented as,

𝐚tsuperscriptsubscript𝐚𝑡\displaystyle{\mathbf{a}}_{t}^{*}bold_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT =gt2P11(𝐱1𝐱t1t𝐯t)absentsuperscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐱1subscript𝐱𝑡1𝑡subscript𝐯𝑡\displaystyle=g_{t}^{2}P_{11}\left(\frac{{\mathbf{x}}_{1}-{\mathbf{x}}_{t}}{1-% t}-{\mathbf{v}}_{t}\right)= italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG - bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
=gt2P11𝐱11tgt2P11(𝐱t1t+𝐯t)absentsuperscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐱11𝑡superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐱𝑡1𝑡subscript𝐯𝑡\displaystyle=g_{t}^{2}P_{11}\frac{{\mathbf{x}}_{1}}{1-t}-g_{t}^{2}P_{11}\left% (\frac{{\mathbf{x}}_{t}}{1-t}+{\mathbf{v}}_{t}\right)= italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( divide start_ARG bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG + bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
=gt2P11𝐱11tgt2P11(μtx+Ltxxϵ01t+(μtv+Ltxvϵ0+Ltvvϵ1))absentsuperscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐱11𝑡superscriptsubscript𝑔𝑡2subscript𝑃11subscriptsuperscript𝜇𝑥𝑡subscriptsuperscript𝐿𝑥𝑥𝑡subscriptbold-italic-ϵ01𝑡subscriptsuperscript𝜇𝑣𝑡subscriptsuperscript𝐿𝑥𝑣𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑣𝑣𝑡subscriptbold-italic-ϵ1\displaystyle=g_{t}^{2}P_{11}\frac{{\mathbf{x}}_{1}}{1-t}-g_{t}^{2}P_{11}\left% (\frac{{\mu^{x}_{t}}+{L^{xx}_{t}}{\bm{\epsilon}_{0}}}{1-t}+({\mu^{v}_{t}}+{L^{% xv}_{t}}{\bm{\epsilon}_{0}}+{L^{vv}_{t}}{\bm{\epsilon}_{1}})\right)= italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( divide start_ARG italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG + ( italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) )
=gt2P11[(𝐱1μtx1tμtv)(Ltxx1tϵ0+Ltxvϵ0+Ltvvϵ1)]absentsuperscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]subscript𝐱1subscriptsuperscript𝜇𝑥𝑡1𝑡subscriptsuperscript𝜇𝑣𝑡subscriptsuperscript𝐿𝑥𝑥𝑡1𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑥𝑣𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑣𝑣𝑡subscriptbold-italic-ϵ1\displaystyle=g_{t}^{2}P_{11}\left[\left(\frac{{\mathbf{x}}_{1}-{\mu^{x}_{t}}}% {1-t}-{\mu^{v}_{t}}\right)-\left(\frac{{L^{xx}_{t}}}{1-t}{\bm{\epsilon}_{0}}+{% L^{xv}_{t}}{\bm{\epsilon}_{0}}+{L^{vv}_{t}}{\bm{\epsilon}_{1}}\right)\right]= italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ ( divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG - italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) - ( divide start_ARG italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ]
solving eq.D.3 we can get:μtx=13𝐱1t2(t24t+6),μtv=4t𝐱13(t23t+3):solving eq.D.3 we can getformulae-sequencesubscriptsuperscript𝜇𝑥𝑡13subscript𝐱1superscript𝑡2superscript𝑡24𝑡6subscriptsuperscript𝜇𝑣𝑡4𝑡subscript𝐱13superscript𝑡23𝑡3\displaystyle\text{solving eq.\ref{Appendix:mean-cov} we can get}:{\mu^{x}_{t}% }=\frac{1}{3}{\mathbf{x}}_{1}t^{2}(t^{2}-4t+6),{\mu^{v}_{t}}=\frac{4t{\mathbf{% x}}_{1}}{3}(t^{2}-3t+3)solving eq. we can get : italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 3 end_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 4 italic_t + 6 ) , italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG 4 italic_t bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 3 end_ARG ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 3 italic_t + 3 )
Plug in𝐱t,𝐯tPlug insubscript𝐱𝑡subscript𝐯𝑡\displaystyle\text{Plug in}{\mathbf{x}}_{t},{\mathbf{v}}_{t}Plug in bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
=gt2P11[(𝐱113𝐱1t2(64t+t2)1t4t𝐱13(t23t+3))(Ltxx1tϵ0+Ltxvϵ0+Ltvvϵ1)]absentsuperscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]subscript𝐱113subscript𝐱1superscript𝑡264𝑡superscript𝑡21𝑡4𝑡subscript𝐱13superscript𝑡23𝑡3subscriptsuperscript𝐿𝑥𝑥𝑡1𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑥𝑣𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑣𝑣𝑡subscriptbold-italic-ϵ1\displaystyle=g_{t}^{2}P_{11}\left[\left(\frac{{\mathbf{x}}_{1}-\frac{1}{3}{% \mathbf{x}}_{1}t^{2}\left(6-4t+t^{2}\right)}{1-t}-\frac{4t{\mathbf{x}}_{1}}{3}% (t^{2}-3t+3)\right)-\left(\frac{{L^{xx}_{t}}}{1-t}{\bm{\epsilon}_{0}}+{L^{xv}_% {t}}{\bm{\epsilon}_{0}}+{L^{vv}_{t}}{\bm{\epsilon}_{1}}\right)\right]= italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ ( divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - divide start_ARG 1 end_ARG start_ARG 3 end_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( 6 - 4 italic_t + italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 1 - italic_t end_ARG - divide start_ARG 4 italic_t bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG 3 end_ARG ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 3 italic_t + 3 ) ) - ( divide start_ARG italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ]
=gt2P11[((t4+4t36t2+3)3(1t)4t3(t23t+3))𝐱1(Ltxx1tϵ0+Ltxvϵ0+Ltvvϵ1)]absentsuperscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]superscript𝑡44superscript𝑡36superscript𝑡2331𝑡4𝑡3superscript𝑡23𝑡3subscript𝐱1subscriptsuperscript𝐿𝑥𝑥𝑡1𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑥𝑣𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑣𝑣𝑡subscriptbold-italic-ϵ1\displaystyle=g_{t}^{2}P_{11}\left[\left(\frac{(-t^{4}+4t^{3}-6t^{2}+3)}{3(1-t% )}-\frac{4t}{3}(t^{2}-3t+3)\right){\mathbf{x}}_{1}-\left(\frac{{L^{xx}_{t}}}{1% -t}{\bm{\epsilon}_{0}}+{L^{xv}_{t}}{\bm{\epsilon}_{0}}+{L^{vv}_{t}}{\bm{% \epsilon}_{1}}\right)\right]= italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ ( divide start_ARG ( - italic_t start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT + 4 italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - 6 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 ) end_ARG start_ARG 3 ( 1 - italic_t ) end_ARG - divide start_ARG 4 italic_t end_ARG start_ARG 3 end_ARG ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 3 italic_t + 3 ) ) bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ( divide start_ARG italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ]
=gt2P11[((t1)(t33t2+3t+3)3(1t)4t3(t23t+3))𝐱1(Ltxx1tϵ0+Ltxvϵ0+Ltvvϵ1)]absentsuperscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]𝑡1superscript𝑡33superscript𝑡23𝑡331𝑡4𝑡3superscript𝑡23𝑡3subscript𝐱1subscriptsuperscript𝐿𝑥𝑥𝑡1𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑥𝑣𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑣𝑣𝑡subscriptbold-italic-ϵ1\displaystyle=g_{t}^{2}P_{11}\left[\left(\frac{-(t-1)(t^{3}-3t^{2}+3t+3)}{3(1-% t)}-\frac{4t}{3}(t^{2}-3t+3)\right){\mathbf{x}}_{1}-\left(\frac{{L^{xx}_{t}}}{% 1-t}{\bm{\epsilon}_{0}}+{L^{xv}_{t}}{\bm{\epsilon}_{0}}+{L^{vv}_{t}}{\bm{% \epsilon}_{1}}\right)\right]= italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ ( divide start_ARG - ( italic_t - 1 ) ( italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - 3 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_t + 3 ) end_ARG start_ARG 3 ( 1 - italic_t ) end_ARG - divide start_ARG 4 italic_t end_ARG start_ARG 3 end_ARG ( italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - 3 italic_t + 3 ) ) bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ( divide start_ARG italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ]
=gt2P11[((t33t2+3t+3)313(4t312t2+12t))𝐱1(Ltxx1tϵ0+Ltxvϵ0+Ltvvϵ1)]absentsuperscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]superscript𝑡33superscript𝑡23𝑡33134superscript𝑡312superscript𝑡212𝑡subscript𝐱1subscriptsuperscript𝐿𝑥𝑥𝑡1𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑥𝑣𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑣𝑣𝑡subscriptbold-italic-ϵ1\displaystyle=g_{t}^{2}P_{11}\left[\left(\frac{(t^{3}-3t^{2}+3t+3)}{3}-\frac{1% }{3}(4t^{3}-12t^{2}+12t)\right){\mathbf{x}}_{1}-\left(\frac{{L^{xx}_{t}}}{1-t}% {\bm{\epsilon}_{0}}+{L^{xv}_{t}}{\bm{\epsilon}_{0}}+{L^{vv}_{t}}{\bm{\epsilon}% _{1}}\right)\right]= italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ ( divide start_ARG ( italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - 3 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 3 italic_t + 3 ) end_ARG start_ARG 3 end_ARG - divide start_ARG 1 end_ARG start_ARG 3 end_ARG ( 4 italic_t start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT - 12 italic_t start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 12 italic_t ) ) bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ( divide start_ARG italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ]
=gt2P11[(1t)3𝐱1(Ltxx1tϵ0+Ltxvϵ0+Ltvvϵ1)]absentsuperscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]superscript1𝑡3subscript𝐱1subscriptsuperscript𝐿𝑥𝑥𝑡1𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑥𝑣𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑣𝑣𝑡subscriptbold-italic-ϵ1\displaystyle=g_{t}^{2}P_{11}\left[(1-t)^{3}{\mathbf{x}}_{1}-\left(\frac{{L^{% xx}_{t}}}{1-t}{\bm{\epsilon}_{0}}+{L^{xv}_{t}}{\bm{\epsilon}_{0}}+{L^{vv}_{t}}% {\bm{\epsilon}_{1}}\right)\right]= italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ ( 1 - italic_t ) start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - ( divide start_ARG italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ]
=4(1t)2𝐱1+gt2P11(Ltxx1tϵ0+Ltxvϵ0+Ltvvϵ1)absent4superscript1𝑡2subscript𝐱1superscriptsubscript𝑔𝑡2subscript𝑃11subscriptsuperscript𝐿𝑥𝑥𝑡1𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑥𝑣𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑣𝑣𝑡subscriptbold-italic-ϵ1\displaystyle=4(1-t)^{2}{\mathbf{x}}_{1}+g_{t}^{2}P_{11}\left(\frac{{L^{xx}_{t% }}}{1-t}{\bm{\epsilon}_{0}}+{L^{xv}_{t}}{\bm{\epsilon}_{0}}+{L^{vv}_{t}}{\bm{% \epsilon}_{1}}\right)= 4 ( 1 - italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( divide start_ARG italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )

D.7 Loss Reweight

In practice, we use the following loss function

=minθ𝔼t[0,1]𝔼𝐱1pdata𝔼𝐦tpt(𝐦t|𝐱1)λ(t)[𝐅tθ(𝐦t,t;θ)𝐅t(𝐦t,t)22]subscript𝜃subscript𝔼𝑡01subscript𝔼similar-tosubscript𝐱1subscript𝑝datasubscript𝔼similar-tosubscript𝐦𝑡subscript𝑝𝑡conditionalsubscript𝐦𝑡subscript𝐱1𝜆𝑡delimited-[]superscriptsubscriptdelimited-∥∥superscriptsubscript𝐅𝑡𝜃subscript𝐦𝑡𝑡𝜃subscript𝐅𝑡subscript𝐦𝑡𝑡22\displaystyle\mathcal{L}=\min_{\theta}\mathbb{E}_{t\in[0,1]}\mathbb{E}_{{% \mathbf{x}}_{1}\sim p_{\rm{data}}}\mathbb{E}_{{\mathbf{m}}_{t}\sim p_{t}({% \mathbf{m}}_{t}|{\mathbf{x}}_{1})}\lambda(t)\left[\lVert{\mathbf{F}}_{t}^{% \theta}({\mathbf{m}}_{t},t;\theta)-{\mathbf{F}}_{t}({\mathbf{m}}_{t},t)\rVert_% {2}^{2}\right]caligraphic_L = roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT italic_λ ( italic_t ) [ ∥ bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ; italic_θ ) - bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (32)
minθ𝔼t[0,1]𝔼𝐱1pdata𝔼𝐦tpt(𝐦t|𝐱1)11t[𝐅tθ(𝐦t,t;θ)𝐅t(𝐦t,t)/𝐳t22]proportional-toabsentsubscript𝜃subscript𝔼𝑡01subscript𝔼similar-tosubscript𝐱1subscript𝑝datasubscript𝔼similar-tosubscript𝐦𝑡subscript𝑝𝑡conditionalsubscript𝐦𝑡subscript𝐱111𝑡delimited-[]superscriptsubscriptdelimited-∥∥superscriptsubscript𝐅𝑡𝜃subscript𝐦𝑡𝑡𝜃subscript𝐅𝑡subscript𝐦𝑡𝑡subscript𝐳𝑡22\displaystyle\propto\min_{\theta}\mathbb{E}_{t\in[0,1]}\mathbb{E}_{{\mathbf{x}% }_{1}\sim p_{\rm{data}}}\mathbb{E}_{{\mathbf{m}}_{t}\sim p_{t}({\mathbf{m}}_{t% }|{\mathbf{x}}_{1})}\frac{1}{1-t}\left[\lVert{\mathbf{F}}_{t}^{\theta}({% \mathbf{m}}_{t},t;\theta)-{\mathbf{F}}_{t}({\mathbf{m}}_{t},t)/{\mathbf{z}}_{t% }\rVert_{2}^{2}\right]∝ roman_min start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT italic_t ∈ [ 0 , 1 ] end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT roman_data end_POSTSUBSCRIPT end_POSTSUBSCRIPT blackboard_E start_POSTSUBSCRIPT bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∼ italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) end_POSTSUBSCRIPT divide start_ARG 1 end_ARG start_ARG 1 - italic_t end_ARG [ ∥ bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ; italic_θ ) - bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) / bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∥ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] (33)

We admit that this might not be an optimal selection. The motivation behind this is simply increasing the weight of training when t1𝑡1t\rightarrow 1italic_t → 1 and normalize the label with normalizer 𝐳tsubscript𝐳𝑡{\mathbf{z}}_{t}bold_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT.

D.8 Normalizer of AGM-SDE and AGM-ODE

Since the optimal control term can be represented as,

𝐚(𝐦t,t)=4𝐱1(1t)2gt2P11[(Ltxx1t+Ltxv)ϵ0+Ltvvϵ1].superscript𝐚subscript𝐦𝑡𝑡4subscript𝐱1superscript1𝑡2superscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]superscriptsubscript𝐿𝑡𝑥𝑥1𝑡superscriptsubscript𝐿𝑡𝑥𝑣subscriptbold-italic-ϵ0superscriptsubscript𝐿𝑡𝑣𝑣subscriptbold-italic-ϵ1\displaystyle{\mathbf{a}}^{*}({\mathbf{m}}_{t},t)=4{\mathbf{x}}_{1}(1-t)^{2}-g% _{t}^{2}P_{11}\left[\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}\right){\bm{% \epsilon}_{0}}+L_{t}^{vv}{\bm{\epsilon}_{1}}\right].bold_a start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) = 4 bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ ( divide start_ARG italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_t end_ARG + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT ) bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] .

Then we introduce the normalizer as

𝐳SDEsubscript𝐳𝑆𝐷𝐸\displaystyle{\mathbf{z}}_{SDE}bold_z start_POSTSUBSCRIPT italic_S italic_D italic_E end_POSTSUBSCRIPT =(4(1t)2σdata)2+gt2P11[(Ltxx1t+Ltxv)2+(Ltvv)2] absent(4(1t)2σdata)2+gt2P11[(Ltxx1t+Ltxv)2+(Ltvv)2] \displaystyle=\mathchoice{{\hbox{$\displaystyle\sqrt{(4(1-t)^{2}\cdot\sigma_{% data})^{2}+g_{t}^{2}P_{11}\left[\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}\right)% ^{2}+(L_{t}^{vv})^{2}\right]\,}$}\lower 0.4pt\hbox{\vrule height=12.9833pt,dep% th=-10.38669pt}}}{{\hbox{$\textstyle\sqrt{(4(1-t)^{2}\cdot\sigma_{data})^{2}+g% _{t}^{2}P_{11}\left[\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}\right)^{2}+(L_{t}^% {vv})^{2}\right]\,}$}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359% pt}}}{{\hbox{$\scriptstyle\sqrt{(4(1-t)^{2}\cdot\sigma_{data})^{2}+g_{t}^{2}P_% {11}\left[\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}\right)^{2}+(L_{t}^{vv})^{2}% \right]\,}$}\lower 0.4pt\hbox{\vrule height=7.11903pt,depth=-5.69525pt}}}{{% \hbox{$\scriptscriptstyle\sqrt{(4(1-t)^{2}\cdot\sigma_{data})^{2}+g_{t}^{2}P_{% 11}\left[\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}\right)^{2}+(L_{t}^{vv})^{2}% \right]\,}$}\lower 0.4pt\hbox{\vrule height=7.11903pt,depth=-5.69525pt}}}= (4(1-t)2⋅σdata)2+gt2P11[(Ltxx1-t+Ltxv)2+(Ltvv)2]
𝐳ODEsubscript𝐳𝑂𝐷𝐸\displaystyle{\mathbf{z}}_{ODE}bold_z start_POSTSUBSCRIPT italic_O italic_D italic_E end_POSTSUBSCRIPT =(4(1t)2σdata)2+gt2P11+gt2P11(Ltxx1t+Ltxv)2+[(gt2P11Ltvv12gt2t)2] absent(4(1t)2σdata)2+gt2P11+gt2P11(Ltxx1t+Ltxv)2+[(gt2P11Ltvv12gt2t)2] \displaystyle=\mathchoice{{\hbox{$\displaystyle\sqrt{(4(1-t)^{2}\cdot\sigma_{% data})^{2}+g_{t}^{2}P_{11}+g_{t}^{2}P_{11}\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{% xv}\right)^{2}+\left[\left(g_{t}^{2}P_{11}L_{t}^{vv}-\frac{1}{2}g_{t}^{2}\ell_% {t}\right)^{2}\right]\,}$}\lower 0.4pt\hbox{\vrule height=12.9833pt,depth=-10.% 38669pt}}}{{\hbox{$\textstyle\sqrt{(4(1-t)^{2}\cdot\sigma_{data})^{2}+g_{t}^{2% }P_{11}+g_{t}^{2}P_{11}\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}\right)^{2}+% \left[\left(g_{t}^{2}P_{11}L_{t}^{vv}-\frac{1}{2}g_{t}^{2}\ell_{t}\right)^{2}% \right]\,}$}\lower 0.4pt\hbox{\vrule height=9.30444pt,depth=-7.44359pt}}}{{% \hbox{$\scriptstyle\sqrt{(4(1-t)^{2}\cdot\sigma_{data})^{2}+g_{t}^{2}P_{11}+g_% {t}^{2}P_{11}\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}\right)^{2}+\left[\left(g_% {t}^{2}P_{11}L_{t}^{vv}-\frac{1}{2}g_{t}^{2}\ell_{t}\right)^{2}\right]\,}$}% \lower 0.4pt\hbox{\vrule height=7.11903pt,depth=-5.69525pt}}}{{\hbox{$% \scriptscriptstyle\sqrt{(4(1-t)^{2}\cdot\sigma_{data})^{2}+g_{t}^{2}P_{11}+g_{% t}^{2}P_{11}\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}\right)^{2}+\left[\left(g_{% t}^{2}P_{11}L_{t}^{vv}-\frac{1}{2}g_{t}^{2}\ell_{t}\right)^{2}\right]\,}$}% \lower 0.4pt\hbox{\vrule height=7.11903pt,depth=-5.69525pt}}}= (4(1-t)2⋅σdata)2+gt2P11+gt2P11(Ltxx1-t+Ltxv)2+[(gt2P11Ltvv-12gt2ℓt)2]

Where :=ΣtxxΣtxxΣtvv(Σtxv)2 assignΣtxxΣtxxΣtvv(Σtxv)2 \ell:=\mathchoice{{\hbox{$\displaystyle\sqrt{\frac{{\Sigma^{xx}_{t}}}{{\Sigma^% {xx}_{t}}{\Sigma^{vv}_{t}}-({\Sigma^{xv}_{t}})^{2}}\,}$}\lower 0.4pt\hbox{% \vrule height=15.66331pt,depth=-12.5307pt}}}{{\hbox{$\textstyle\sqrt{\frac{{% \Sigma^{xx}_{t}}}{{\Sigma^{xx}_{t}}{\Sigma^{vv}_{t}}-({\Sigma^{xv}_{t}})^{2}}% \,}$}\lower 0.4pt\hbox{\vrule height=11.01904pt,depth=-8.81528pt}}}{{\hbox{$% \scriptstyle\sqrt{\frac{{\Sigma^{xx}_{t}}}{{\Sigma^{xx}_{t}}{\Sigma^{vv}_{t}}-% ({\Sigma^{xv}_{t}})^{2}}\,}$}\lower 0.4pt\hbox{\vrule height=8.65237pt,depth=-% 6.92194pt}}}{{\hbox{$\scriptscriptstyle\sqrt{\frac{{\Sigma^{xx}_{t}}}{{\Sigma^% {xx}_{t}}{\Sigma^{vv}_{t}}-({\Sigma^{xv}_{t}})^{2}}\,}$}\lower 0.4pt\hbox{% \vrule height=8.65237pt,depth=-6.92194pt}}}roman_ℓ := ΣxxtΣxxtΣvvt-(Σxvt)2

D.9 Exponential Integrator Derivation

As suggested by Zhang & Chen (2022), one can write the discretized dynamics as,

[𝐱ti+1𝐯ti+1]matrixsubscript𝐱subscript𝑡𝑖1subscript𝐯subscript𝑡𝑖1\displaystyle\begin{bmatrix}{\mathbf{x}}_{t_{i+1}}\\ {\mathbf{v}}_{t_{i+1}}\end{bmatrix}[ start_ARG start_ROW start_CELL bold_x start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] =Φ(ti+1,ti)[𝐱t𝐯t]+j=0rCi,j[𝟎𝐬θ(𝐦tij,tij)]absentΦsubscript𝑡𝑖1subscript𝑡𝑖matrixsubscript𝐱𝑡subscript𝐯𝑡superscriptsubscript𝑗0𝑟subscript𝐶𝑖𝑗matrix0subscript𝐬𝜃subscript𝐦subscript𝑡𝑖𝑗subscript𝑡𝑖𝑗\displaystyle=\Phi(t_{i+1},t_{i})\begin{bmatrix}{\mathbf{x}}_{t}\\ {\mathbf{v}}_{t}\end{bmatrix}+\sum_{j=0}^{r}C_{i,j}\begin{bmatrix}{\mathbf{0}}% \\ {\mathbf{s}}_{\theta}({\mathbf{m}}_{t_{i-j}},t_{i-j})\end{bmatrix}= roman_Φ ( italic_t start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) [ start_ARG start_ROW start_CELL bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] + ∑ start_POSTSUBSCRIPT italic_j = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT [ start_ARG start_ROW start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_s start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( bold_m start_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_i - italic_j end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_t start_POSTSUBSCRIPT italic_i - italic_j end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARG ] (34)
WhereCi,jWheresubscript𝐶𝑖𝑗\displaystyle\text{Where}\ C_{i,j}Where italic_C start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT =tt+δtΦ(t+δt,τ)[𝟎𝟎𝟎𝐳τ]kj[τtiktijtik]dτ,Φ(t,s)=[1ts01]formulae-sequenceabsentsuperscriptsubscript𝑡𝑡subscript𝛿𝑡Φ𝑡subscript𝛿𝑡𝜏matrix000subscript𝐳𝜏subscriptproduct𝑘𝑗delimited-[]𝜏subscript𝑡𝑖𝑘subscript𝑡𝑖𝑗subscript𝑡𝑖𝑘d𝜏Φ𝑡𝑠matrix1𝑡𝑠01\displaystyle=\int_{t}^{t+\delta_{t}}\Phi(t+\delta_{t},\tau)\begin{bmatrix}{% \mathbf{0}}&{\mathbf{0}}\\ {\mathbf{0}}&{\mathbf{z}}_{\tau}\end{bmatrix}\prod_{k\neq j}\left[\frac{\tau-t% _{i-k}}{t_{i-j}-t_{i-k}}\right]{\textnormal{d}}\tau,\quad\Phi(t,s)=\begin{% bmatrix}1&t-s\\ 0&1\end{bmatrix}= ∫ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t + italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT roman_Φ ( italic_t + italic_δ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_τ ) [ start_ARG start_ROW start_CELL bold_0 end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL bold_z start_POSTSUBSCRIPT italic_τ end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ] ∏ start_POSTSUBSCRIPT italic_k ≠ italic_j end_POSTSUBSCRIPT [ divide start_ARG italic_τ - italic_t start_POSTSUBSCRIPT italic_i - italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_t start_POSTSUBSCRIPT italic_i - italic_j end_POSTSUBSCRIPT - italic_t start_POSTSUBSCRIPT italic_i - italic_k end_POSTSUBSCRIPT end_ARG ] d italic_τ , roman_Φ ( italic_t , italic_s ) = [ start_ARG start_ROW start_CELL 1 end_CELL start_CELL italic_t - italic_s end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL 1 end_CELL end_ROW end_ARG ]

After plugging in the transition kernel Φ(t,s)Φ𝑡𝑠\Phi(t,s)roman_Φ ( italic_t , italic_s ), one can easily obtain the results shown in (11).

Remark 14.

In light of the momentum system, there are numerous methods for achieving high accuracy in its resolution. However, the practical performance in generative modeling remains untested. We recommend that readers consult the classical numerical physics text book or recent momentum dynamics solver (Pandey et al., 2023; Dockhorn et al., 2021).

D.10 Proof of Proposition.5

The estimated data point 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT can be represented as

𝐱~1SDE=(1t)(𝐅tθ+𝐯t)gt2P11+𝐱t,superscriptsubscript~𝐱1𝑆𝐷𝐸1𝑡superscriptsubscript𝐅𝑡𝜃subscript𝐯𝑡superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐱𝑡\displaystyle\tilde{{\mathbf{x}}}_{1}^{SDE}=\frac{(1-t)({\mathbf{F}}_{t}^{% \theta}+{\mathbf{v}}_{t})}{g_{t}^{2}P_{11}}+{\mathbf{x}}_{t},\ \ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_S italic_D italic_E end_POSTSUPERSCRIPT = divide start_ARG ( 1 - italic_t ) ( bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT + bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG + bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , or𝐱~1ODE=𝐅tθ+gt2P11(αt𝐱t+βt𝐯t)4(t1)2+gt2P11(αtμtx+βtμtv)orsubscriptsuperscript~𝐱𝑂𝐷𝐸1superscriptsubscript𝐅𝑡𝜃superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝛼𝑡subscript𝐱𝑡subscript𝛽𝑡subscript𝐯𝑡4superscript𝑡12superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝛼𝑡subscriptsuperscript𝜇𝑥𝑡subscript𝛽𝑡subscriptsuperscript𝜇𝑣𝑡\displaystyle\text{or}\quad\tilde{{\mathbf{x}}}^{ODE}_{1}=\frac{{\mathbf{F}}_{% t}^{\theta}+g_{t}^{2}P_{11}(\alpha_{t}{\mathbf{x}}_{t}+\beta_{t}{\mathbf{v}}_{% t})}{4(t-1)^{2}+g_{t}^{2}P_{11}(\alpha_{t}{\mu^{x}_{t}}+\beta_{t}{\mu^{v}_{t}})}or over~ start_ARG bold_x end_ARG start_POSTSUPERSCRIPT italic_O italic_D italic_E end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = divide start_ARG bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG 4 ( italic_t - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG (35)

for SDE and probablistic ODE dynamics respectively, and βt=Ltvv+12P11subscript𝛽𝑡subscriptsuperscript𝐿𝑣𝑣𝑡12subscript𝑃11\beta_{t}={L^{vv}_{t}}+\frac{1}{2P_{11}}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG,αt=(Ltxx1t+Ltxv)βtLtxvLtxxsubscript𝛼𝑡superscriptsubscript𝐿𝑡𝑥𝑥1𝑡superscriptsubscript𝐿𝑡𝑥𝑣subscript𝛽𝑡subscriptsuperscript𝐿𝑥𝑣𝑡subscriptsuperscript𝐿𝑥𝑥𝑡\alpha_{t}=\frac{(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv})-\beta_{t}L^{xv}_{t}}{L^{% xx}_{t}}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = divide start_ARG ( divide start_ARG italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_t end_ARG + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT ) - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG.

Proof.

It is easy to derive the representation of 𝐱1subscript𝐱1{\mathbf{x}}_{1}bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT of the SDE due to the fact that the network is essentially estimating:

𝐅tθgt2P11(𝐱1𝐱t1t𝐯t)superscriptsubscript𝐅𝑡𝜃superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐱1subscript𝐱𝑡1𝑡subscript𝐯𝑡\displaystyle{\mathbf{F}}_{t}^{\theta}\approx g_{t}^{2}P_{11}\left(\frac{{% \mathbf{x}}_{1}-{\mathbf{x}}_{t}}{1-t}-{\mathbf{v}}_{t}\right)bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT ≈ italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( divide start_ARG bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG 1 - italic_t end_ARG - bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT )
𝐱1(1t)(𝐅tθ+𝐯t)gt2P11+𝐱tabsentsubscript𝐱11𝑡superscriptsubscript𝐅𝑡𝜃subscript𝐯𝑡superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝐱𝑡\displaystyle\Leftrightarrow{\mathbf{x}}_{1}\approx\frac{(1-t)({\mathbf{F}}_{t% }^{\theta}+{\mathbf{v}}_{t})}{g_{t}^{2}P_{11}}+{\mathbf{x}}_{t}⇔ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ≈ divide start_ARG ( 1 - italic_t ) ( bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT + bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG + bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT

It will become slightly more complicated for probabilistic ODE cases. We notice that

𝐦tsubscript𝐦𝑡\displaystyle{\mathbf{m}}_{t}bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =𝝁t+𝐋ϵabsentsubscript𝝁𝑡𝐋italic-ϵ\displaystyle={\bm{\mu}_{t}}+{\mathbf{L}}\epsilon= bold_italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + bold_L italic_ϵ
𝐱t=μtx+Ltxxϵ1,subscript𝐱𝑡subscriptsuperscript𝜇𝑥𝑡superscriptsubscript𝐿𝑡𝑥𝑥subscriptbold-italic-ϵ1\displaystyle\Leftrightarrow\quad{\mathbf{x}}_{t}={\mu^{x}_{t}}+L_{t}^{xx}{\bm% {\epsilon}_{1}},⇔ bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , 𝐯t=μtv+Ltxvϵ0+Ltvvϵ1subscript𝐯𝑡subscriptsuperscript𝜇𝑣𝑡subscriptsuperscript𝐿𝑥𝑣𝑡subscriptbold-italic-ϵ0subscriptsuperscript𝐿𝑣𝑣𝑡subscriptbold-italic-ϵ1\displaystyle\quad{\mathbf{v}}_{t}={\mu^{v}_{t}}+{L^{xv}_{t}}{\bm{\epsilon}_{0% }}+{L^{vv}_{t}}{\bm{\epsilon}_{1}}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

In probabilistic ODE case, the force term can be represented as,

𝐅(𝐦t,t)=4𝐱1(1t)2gt2P11[(Ltxx1t+Ltxv)ϵ0+Ltvvϵ1]12gt2ϵ1𝐅subscript𝐦𝑡𝑡4subscript𝐱1superscript1𝑡2superscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]superscriptsubscript𝐿𝑡𝑥𝑥1𝑡superscriptsubscript𝐿𝑡𝑥𝑣subscriptbold-italic-ϵ0superscriptsubscript𝐿𝑡𝑣𝑣subscriptbold-italic-ϵ112superscriptsubscript𝑔𝑡2subscriptbold-italic-ϵ1\displaystyle{\mathbf{F}}({\mathbf{m}}_{t},t)=4{\mathbf{x}}_{1}(1-t)^{2}-g_{t}% ^{2}P_{11}\left[\left(\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}\right){\bm{\epsilon}_{% 0}}+L_{t}^{vv}{\bm{\epsilon}_{1}}\right]-\frac{1}{2}g_{t}^{2}\ell{\bm{\epsilon% }_{1}}bold_F ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) = 4 bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ ( divide start_ARG italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_t end_ARG + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT ) bold_italic_ϵ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] - divide start_ARG 1 end_ARG start_ARG 2 end_ARG italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT roman_ℓ bold_italic_ϵ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT

In order to use linear combination of 𝐱tsubscript𝐱𝑡{\mathbf{x}}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and 𝐯tsubscript𝐯𝑡{\mathbf{v}}_{t}bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to represent 𝐅𝐅{\mathbf{F}}bold_F one needs to match the stochastic term in 𝐅tsubscript𝐅𝑡{\mathbf{F}}_{t}bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT by using

αtLtxx+βtLtxvsubscript𝛼𝑡subscriptsuperscript𝐿𝑥𝑥𝑡subscript𝛽𝑡subscriptsuperscript𝐿𝑥𝑣𝑡\displaystyle\alpha_{t}{L^{xx}_{t}}+\beta_{t}{L^{xv}_{t}}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =Ltxx1t+Ltxvζ^t,absentsubscriptsuperscriptsubscript𝐿𝑡𝑥𝑥1𝑡superscriptsubscript𝐿𝑡𝑥𝑣subscript^𝜁𝑡\displaystyle=\underbrace{\frac{L_{t}^{xx}}{1-t}+L_{t}^{xv}}_{\hat{\zeta}_{t}},= under⏟ start_ARG divide start_ARG italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT end_ARG start_ARG 1 - italic_t end_ARG + italic_L start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT end_ARG start_POSTSUBSCRIPT over^ start_ARG italic_ζ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT ,
βtLtvvsubscript𝛽𝑡subscriptsuperscript𝐿𝑣𝑣𝑡\displaystyle\beta_{t}{L^{vv}_{t}}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =Ltvv+12P11ζt.absentsubscriptsubscriptsuperscript𝐿𝑣𝑣𝑡12subscript𝑃11subscript𝜁𝑡\displaystyle=\underbrace{{L^{vv}_{t}}+\frac{1}{2P_{11}}}_{\zeta_{t}}.= under⏟ start_ARG italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + divide start_ARG 1 end_ARG start_ARG 2 italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT end_ARG end_ARG start_POSTSUBSCRIPT italic_ζ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT .

The solution can be obtained by:

βtsubscript𝛽𝑡\displaystyle\beta_{t}italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =ζtLtvvabsentsubscript𝜁𝑡subscriptsuperscript𝐿𝑣𝑣𝑡\displaystyle=\frac{\zeta_{t}}{{L^{vv}_{t}}}= divide start_ARG italic_ζ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_L start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG
αtsubscript𝛼𝑡\displaystyle\alpha_{t}italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT =ζ^tβtLtxvLtxxabsentsubscript^𝜁𝑡subscript𝛽𝑡subscriptsuperscript𝐿𝑥𝑣𝑡subscriptsuperscript𝐿𝑥𝑥𝑡\displaystyle=\frac{\hat{\zeta}_{t}-\beta_{t}{L^{xv}_{t}}}{{L^{xx}_{t}}}= divide start_ARG over^ start_ARG italic_ζ end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_L start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_L start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG

By subsitute it back to 𝐅tsubscript𝐅𝑡{\mathbf{F}}_{t}bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, one can get:

𝐅(𝐦t,t)𝐅subscript𝐦𝑡𝑡\displaystyle{\mathbf{F}}({\mathbf{m}}_{t},t)bold_F ( bold_m start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_t ) =4𝐱1(1t)2gt2P11[αt(𝐱tμtx)+βt(𝐯tμtv)]absent4subscript𝐱1superscript1𝑡2superscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]subscript𝛼𝑡subscript𝐱𝑡subscriptsuperscript𝜇𝑥𝑡subscript𝛽𝑡subscript𝐯𝑡subscriptsuperscript𝜇𝑣𝑡\displaystyle=4{\mathbf{x}}_{1}(1-t)^{2}-g_{t}^{2}P_{11}\left[\alpha_{t}({% \mathbf{x}}_{t}-{\mu^{x}_{t}})+\beta_{t}({\mathbf{v}}_{t}-{\mu^{v}_{t}})\right]= 4 bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( 1 - italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) + italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ]
=[4(1t)2+gt2P11(αtμtx+βtμtv)]𝐱1gt2P11[αt𝐱t+βt𝐯t]absentdelimited-[]4superscript1𝑡2superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝛼𝑡subscriptsuperscript𝜇𝑥𝑡subscript𝛽𝑡subscriptsuperscript𝜇𝑣𝑡subscript𝐱1superscriptsubscript𝑔𝑡2subscript𝑃11delimited-[]subscript𝛼𝑡subscript𝐱𝑡subscript𝛽𝑡subscript𝐯𝑡\displaystyle=\left[4(1-t)^{2}+g_{t}^{2}P_{11}(\alpha_{t}{\mu^{x}_{t}}+\beta_{% t}{\mu^{v}_{t}})\right]{\mathbf{x}}_{1}-g_{t}^{2}P_{11}\left[\alpha_{t}{% \mathbf{x}}_{t}+\beta_{t}{\mathbf{v}}_{t}\right]= [ 4 ( 1 - italic_t ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ] bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT [ italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ]
𝐱1absentsubscript𝐱1\displaystyle\Leftrightarrow{\mathbf{x}}_{1}⇔ bold_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT =𝐅tθ+gt2P11(αt𝐱t+βt𝐯t)4(t1)2+gt2P11(αtμtx+βtμtv)absentsuperscriptsubscript𝐅𝑡𝜃superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝛼𝑡subscript𝐱𝑡subscript𝛽𝑡subscript𝐯𝑡4superscript𝑡12superscriptsubscript𝑔𝑡2subscript𝑃11subscript𝛼𝑡subscriptsuperscript𝜇𝑥𝑡subscript𝛽𝑡subscriptsuperscript𝜇𝑣𝑡\displaystyle=\frac{{\mathbf{F}}_{t}^{\theta}+g_{t}^{2}P_{11}(\alpha_{t}{% \mathbf{x}}_{t}+\beta_{t}{\mathbf{v}}_{t})}{4(t-1)^{2}+g_{t}^{2}P_{11}(\alpha_% {t}{\mu^{x}_{t}}+\beta_{t}{\mu^{v}_{t}})}= divide start_ARG bold_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_θ end_POSTSUPERSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG start_ARG 4 ( italic_t - 1 ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT italic_P start_POSTSUBSCRIPT 11 end_POSTSUBSCRIPT ( italic_α start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_β start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_μ start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) end_ARG

Appendix E Experimental Details

Training: We stick with hyperparameters introduced in the section.4. We use AdamW(Loshchilov & Hutter, 2017) as our optimizer and Exponential Moving Averaging with the exponential decay rate of 0.9999. We use 8 ×\times× Nvidia A100 GPU for all experiments. For further, training setup, please refer to Table.6.

Table 6: Additional experimental details
dataset Training Iter Learning rate Batch Size network architecture
toy 0.05M 1e-3 1024 ResNet(Dockhorn et al., 2021)
CIFAR-10 0.5M 1e-3 512 NCSN++(Karras et al., 2022)
AFHQv2 0.5M 1e-3 512 NCSN++(Karras et al., 2022)
ImageNet-64 1.6M 2e-4 512 ADM(Dhariwal & Nichol, 2021)

Sampling: For Exponential Integrator, we choose the multistep order w=2𝑤2w=2italic_w = 2 consistently for all experiments. Different from previous work (Dockhorn et al., 2021; Karras et al., 2022; Zhang et al., 2023), we use quadratic timesteps scheme with κ=2𝜅2\kappa=2italic_κ = 2:

ti=(NiNt01κ+iNtN1κ)κsubscript𝑡𝑖superscript𝑁𝑖𝑁superscriptsubscript𝑡01𝜅𝑖𝑁superscriptsubscript𝑡𝑁1𝜅𝜅\displaystyle t_{i}=\left(\frac{N-i}{N}t_{0}^{\frac{1}{\kappa}}+\frac{i}{N}t_{% N}^{\frac{1}{\kappa}}\right)^{\kappa}italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( divide start_ARG italic_N - italic_i end_ARG start_ARG italic_N end_ARG italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_κ end_ARG end_POSTSUPERSCRIPT + divide start_ARG italic_i end_ARG start_ARG italic_N end_ARG italic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT start_POSTSUPERSCRIPT divide start_ARG 1 end_ARG start_ARG italic_κ end_ARG end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT italic_κ end_POSTSUPERSCRIPT

Which is opposite to the classical DM. Namely, the time discretization will get larger when the dynamics is propagated close to data. For numerical stability, we use t0=1E5subscript𝑡01𝐸5t_{0}=1E-5italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = 1 italic_E - 5 for all experiments. For NFE=5𝑁𝐹𝐸5NFE=5italic_N italic_F italic_E = 5, we use tN=0.5subscript𝑡𝑁0.5t_{N}=0.5italic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = 0.5 and NFE=10𝑁𝐹𝐸10NFE=10italic_N italic_F italic_E = 10, TN=0.7subscript𝑇𝑁0.7T_{N}=0.7italic_T start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = 0.7. For the rest of the sampling, we use tN=0.999subscript𝑡𝑁0.999t_{N}=0.999italic_t start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT = 0.999.

Due to the fact that EDM(Karras et al., 2022) is using second-order ODE solver, in practice, we allow it to have an extra one NFE as reported for all the tables.

E.1 Code Example for Covariance

We will abuse the notation in this coding section. Here we provide the example code for computing the covariance matrix. Here we consider the general case where 𝚺0:=[mkmn kmn n]assignsubscript𝚺0matrix𝑚𝑘mn 𝑘mn 𝑛{\bm{\Sigma}}_{0}:=\begin{bmatrix}m&-k\mathchoice{{\hbox{$\displaystyle\sqrt{% mn\,}$}\lower 0.4pt\hbox{\vrule height=4.30554pt,depth=-3.44446pt}}}{{\hbox{$% \textstyle\sqrt{mn\,}$}\lower 0.4pt\hbox{\vrule height=4.30554pt,depth=-3.4444% 6pt}}}{{\hbox{$\scriptstyle\sqrt{mn\,}$}\lower 0.4pt\hbox{\vrule height=3.0138% 9pt,depth=-2.41113pt}}}{{\hbox{$\scriptscriptstyle\sqrt{mn\,}$}\lower 0.4pt% \hbox{\vrule height=2.15277pt,depth=-1.72223pt}}}\\ -k\mathchoice{{\hbox{$\displaystyle\sqrt{mn\,}$}\lower 0.4pt\hbox{\vrule heigh% t=4.30554pt,depth=-3.44446pt}}}{{\hbox{$\textstyle\sqrt{mn\,}$}\lower 0.4pt% \hbox{\vrule height=4.30554pt,depth=-3.44446pt}}}{{\hbox{$\scriptstyle\sqrt{mn% \,}$}\lower 0.4pt\hbox{\vrule height=3.01389pt,depth=-2.41113pt}}}{{\hbox{$% \scriptscriptstyle\sqrt{mn\,}$}\lower 0.4pt\hbox{\vrule height=2.15277pt,depth% =-1.72223pt}}}&n\\ \end{bmatrix}bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT := [ start_ARG start_ROW start_CELL italic_m end_CELL start_CELL - italic_k roman_mn end_CELL end_ROW start_ROW start_CELL - italic_k roman_mn end_CELL start_CELL italic_n end_CELL end_ROW end_ARG ] and the diffusion coefficient is g(t):=p(ttt)assign𝑔𝑡𝑝𝑡𝑡𝑡g(t):=p(tt-t)italic_g ( italic_t ) := italic_p ( italic_t italic_t - italic_t ) where p𝑝pitalic_p is the scaling coefficient and tt𝑡𝑡ttitalic_t italic_t is the dam** coefficient.

    def Sigmaxx(t,p,tt,m,n):
        return  \
        (t - 1)**2*(30*m*(t**3 - 3*t**2 + 3*t + 3)**2\
        - 60*p**2*(t - 1)**3*torch.log(1 - t) \
        - t*(60*k*np.sqrt(m*n)*(t**5 - 6*t**4 + 15*t**3 - 15*t**2 + 9)\
        - 30*n*t*(t**2 - 3*t + 3)**2 + p**2*(t**5*(6*tt**2 + 3*tt + 1) \
        - 6*t**4*(6*tt**2 + 3*tt + 1)\
        + 15*t**3*(6*tt**2 + 3*tt + 1)\
        - 10*t**2*(9*tt**2 + 11) + 150*t - 60)))/270

    def Sigmaxv(t,p,tt,m,n):
        return  \
        (1/270 - t/270)*(30*k*np.sqrt(m*n)*(8*t**6 - 48*t**5\
        + 120*t**4 - 135*t**3 + 45*t**2 + 27*t - 9) +\
        150*p**2*(t - 1)**3*torch.log(1 - t)\
        + t*(-120*m*(t**5 - 6*t**4 + 15*t**3 - 15*t**2 + 9)\
        - 30*n*(4*t**5 - 24*t**4 + 60*t**3 - 75*t**2 + 45*t - 9)\
        + p**2*(4*t**5*(6*tt**2 + 3*tt + 1) - 24*t**4*(6*tt**2 + 3*tt + 1)\
        + 60*t**3*(6*tt**2 + 3*tt + 1) - 5*t**2*(81*tt**2 + 18*tt + 55)\
        + 15*t*(9*tt**2 + 25) - 150)))

    def Sigmavv(t,p,tt,m,n):
        return  \
        n*(-4*t**3 + 12*t**2 - 12*t + 3)**2/9\
        - 8*p**2*(t - 1)**3*torch.log(1 - t)/9\
        + t*(-120*k*np.sqrt(m*n)*(4*t**5 - 24*t**4 + 60*t**3\
        - 75*t**2 + 45*t - 9) + 240*m*t*(t**2 - 3*t + 3)**2 \
        + p**2*(-8*t**5*(6*tt**2 + 3*tt + 1) + 48*t**4*(6*tt**2 + 3*tt + 1)\
        - 120*t**3*(6*tt**2 + 3*tt + 1) + 5*t**2*(180*tt**2 + 72*tt + 53)\
        - 15*t*(36*tt**2 + 9*tt + 20) + 135*tt**2 + 120))/135
    

Appendix F Conditional Generation Details

Here we provide the details of conditional generation details.

F.1 Storke Based Generation

For stroke-based generation, we provide two types of conditional generation.

initial Velocity (IV):Please refer to section.4.
Dynamics Velocity (dyn-V): Since the mean and variance of velocity and position are available, one can specify the velocity which is valid. In this case, we can set the velocity as

vt=μtvt|xt+Σtvt|xtϵsubscript𝑣𝑡superscriptsubscript𝜇𝑡conditionalsubscript𝑣𝑡subscript𝑥𝑡subscriptsuperscriptΣconditionalsubscript𝑣𝑡subscript𝑥𝑡𝑡italic-ϵ\displaystyle v_{t}=\mu_{t}^{v_{t}|x_{t}}+\Sigma^{v_{t}|x_{t}}_{t}\epsilonitalic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT + roman_Σ start_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT italic_ϵ (36)

In which,

μtvt|xtsuperscriptsubscript𝜇𝑡conditionalsubscript𝑣𝑡subscript𝑥𝑡\displaystyle\mu_{t}^{v_{t}|x_{t}}italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT =μtv+ΣtxvΣtxx(𝐱tμtx)absentsuperscriptsubscript𝜇𝑡𝑣subscriptsuperscriptΣ𝑥𝑣𝑡subscriptsuperscriptΣ𝑥𝑥𝑡subscript𝐱𝑡subscriptsuperscript𝜇𝑥𝑡\displaystyle=\mu_{t}^{v}+\frac{{\Sigma^{xv}_{t}}}{{\Sigma^{xx}_{t}}}({\mathbf% {x}}_{t}-{\mu^{x}_{t}})= italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v end_POSTSUPERSCRIPT + divide start_ARG roman_Σ start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG roman_Σ start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - italic_μ start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) (37)
Σtvt|xtsuperscriptsubscriptΣ𝑡conditionalsubscript𝑣𝑡subscript𝑥𝑡\displaystyle\Sigma_{t}^{v_{t}|x_{t}}roman_Σ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_v start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_POSTSUPERSCRIPT =ΣtvvΣtxv2ΣtxxabsentsubscriptsuperscriptΣ𝑣𝑣𝑡superscriptsubscriptsuperscriptΣ𝑥𝑣𝑡2subscriptsuperscriptΣ𝑥𝑥𝑡\displaystyle={\Sigma^{vv}_{t}}-\frac{{\Sigma^{xv}_{t}}^{2}}{{\Sigma^{xx}_{t}}}= roman_Σ start_POSTSUPERSCRIPT italic_v italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - divide start_ARG roman_Σ start_POSTSUPERSCRIPT italic_x italic_v end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG start_ARG roman_Σ start_POSTSUPERSCRIPT italic_x italic_x end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG (38)

when tc𝑡𝑐t\leq citalic_t ≤ italic_c. The c𝑐citalic_c is the guidance length. We typically set it to be c=0.25𝑐0.25c=0.25italic_c = 0.25.

F.2 Inpainting

In the inpainting case, we apply a similar strategy as dyn-V. Specifically, in this case, the 𝐱~1subscript~𝐱1\tilde{{\mathbf{x}}}_{1}over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT will be represented as:

𝐱^1subscript^𝐱1\displaystyle\hat{{\mathbf{x}}}_{1}over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT :=MASKμtx+(1MASK)𝐱~1assignabsentdirect-productMASKsuperscriptsubscript𝜇𝑡𝑥direct-product1MASKsubscript~𝐱1\displaystyle:=\text{MASK}\odot\mu_{t}^{x}+(1-\text{MASK})\odot\tilde{{\mathbf% {x}}}_{1}:= MASK ⊙ italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT + ( 1 - MASK ) ⊙ over~ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (39)

where MASK represents the mask matrix which zero out the pixel of the original image. Such 𝐱^1subscript^𝐱1\hat{{\mathbf{x}}}_{1}over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT will serve as the source to estimate μtxsuperscriptsubscript𝜇𝑡𝑥\mu_{t}^{x}italic_μ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_x end_POSTSUPERSCRIPT in eq.37.

F.3 inpainting Based Generation

For stroke-based generation, we provide two types of conditional generation.

Appendix G Ablation Study of Stoke-Based Conditional Generation

In order to investigate the diversity and faithfulness of stoke-based conditional generation, we conduct the ablation study with respect to the hyperparameter ξ𝜉\xiitalic_ξ.

Refer to caption
Figure 7: Ablation study for the stoke-based conditional generation. When ξ=0𝜉0\xi=0italic_ξ = 0, it is unconditional generation. Notably, the diversity of the generation will decay when we increase ξ𝜉\xiitalic_ξ. In order to achieve a balance between faithfulness and diversity, one needs to tune the hyperparameter ξ𝜉\xiitalic_ξ.

Appendix H Additional Figures

We demonstrate the samples for different datasets with varying NFE.

H.1 Toy dataset compared with CLD

Refer to caption
Figure 8: The comparison with CLD(Dockhorn et al., 2021) using same network and stochastic sampler SSS, for Multi-Swiss-Roll and Mixture of Gaussian datasets. We achieve visually better results with one order less NFEs.

H.2 AFHQv2 Inpainting Generation

Refer to caption
Figure 9: AGM-ODE Uncured inpainting generation

H.3 AFHQv2 Stroke Based Generation

Refer to caption
Figure 10: AGM-ODE Uncured stroke-based generation

H.4 CIFAR-10

Refer to caption
Figure 11: AGM-ODE Uncurated CIFAR-10 samples with NFE=5
Refer to caption
Figure 12: AGM-ODE Uncurated CIFAR-10 samples with NFE=10
Refer to caption
Figure 13: AGM-ODE Uncurated CIFAR-10 samples with NFE=20
Refer to caption
Figure 14: AGM-ODE Uncurated CIFAR-10 samples with NFE=50

H.5 AFHQv2

Refer to caption
Figure 15: AGM-ODE Uncurated AFHQv2 samples with NFE=5
Refer to caption
Figure 16: AGM-ODE Uncurated AFHQv2 samples with NFE=10
Refer to caption
Figure 17: AGM-ODE Uncurated AFHQv2 samples with NFE=20
Refer to caption
Figure 18: AGM-ODE Uncurated AFHQv2 samples with NFE=50

H.6 Imagenet-64

Refer to caption
Figure 19: AGM-ODE Uncurated Imagenet-64 samples with NFE=10
Refer to caption
Figure 20: AGM-ODE Uncurated Imagenet-64 samples with NFE=20
Refer to caption
Figure 21: AGM-ODE Uncurated Imagenet-64 samples with NFE=50