Deep Neural Networks with Symplectic Preservation Properties

Qing He Wei Cai
Abstract

We propose a deep neural network architecture designed such that its output forms an invertible symplectomorphism of the input. This design draws an analogy to the real-valued non-volume-preserving (real NVP) method used in normalizing flow techniques. Utilizing this neural network type allows for learning tasks on unknown Hamiltonian systems without breaking the inherent symplectic structure of the phase space.

Key Words: Deep learning, Symplecticomorphism, Structure-Preserving

AMS Classifications: 37J11, 70H15, 68T07

1 Introduction

For an unknown Hamiltonian system, our objective is to learn the flow map** over a fixed time period T𝑇Titalic_T. Specifically, we seek to determine the map ΦTsubscriptΦ𝑇\Phi_{T}roman_Φ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT that computes (q,p)t=Tsubscript𝑞𝑝𝑡𝑇(q,p)_{t=T}( italic_q , italic_p ) start_POSTSUBSCRIPT italic_t = italic_T end_POSTSUBSCRIPT given an initial condition (q,p)t=0=(q0,p0)subscript𝑞𝑝𝑡0subscript𝑞0subscript𝑝0(q,p)_{t=0}=(q_{0},p_{0})( italic_q , italic_p ) start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT = ( italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ). Such problems arise, for instance, when analyzing a sequence of system snapshots at times 0,T,2T,3T,0𝑇2𝑇3𝑇0,T,2T,3T,\ldots0 , italic_T , 2 italic_T , 3 italic_T , …. The key information we possess about this map** is its property as a symplectomorphism (or canonical transformation), implying that the Jacobian of ΦTsubscriptΦ𝑇\Phi_{T}roman_Φ start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT belongs to the symplectic group Sp(2n)𝑆𝑝2𝑛Sp(2n)italic_S italic_p ( 2 italic_n ), where n𝑛nitalic_n is the dimensionality of the system’s configuration space [2, 4].

In this study, we propose a neural network structure designed to ensure that its output is precisely a symplectomorphism of the input. ”Precisely” here means that the Jacobian of the map** defined by the neural network is exactly a symplectic matrix, accounting only for minimal rounding errors inherent to floating-point arithmetic. Importantly, this framework eliminates the need to introduce an additional ”deviation-from-symplecticity penalty term” in our learning objective because the inherent structure of the network guarantees that the symplectomorphism condition cannot be violated.

The approach draws inspiration from the real NVP method [3], which is primarily used for density estimation of probability measures and differs significantly in purpose from our intended application. Nonetheless, this work leverages real NVP’s elegant methodology for constructing explicitly invertible neural networks. The method we propose represents a ”symplectic adaptation” of this technique, employing building blocks akin to those in real NVP while ensuring the preservation of symplecticity throughout. This adaptation involves replacing components that could potentially compromise the symplectic property of the map**.

2 Preliminaries

2.1 Symplectic Structures and Symplectomorphism

On 2nsuperscript2𝑛\mathbb{R}^{2n}blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT, we denote the standard Cartesian coordinates as q1,,qn,p1,,pnsubscript𝑞1subscript𝑞𝑛subscript𝑝1subscript𝑝𝑛q_{1},\cdots,q_{n},p_{1},\cdots,p_{n}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT, corresponding to the ”position” and ”momentum” coordinates in Hamiltonian mechanics. The standard symplectic form on 2nsuperscript2𝑛\mathbb{R}^{2n}blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT is the differential 2-form

ω=i=1ndqidpi,𝜔superscriptsubscript𝑖1𝑛dsubscript𝑞𝑖dsubscript𝑝𝑖\omega=\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i},italic_ω = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (1)

and a transformation φ:2n2n:𝜑superscript2𝑛superscript2𝑛\varphi:\mathbb{R}^{2n}\rightarrow\mathbb{R}^{2n}italic_φ : blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT is called a symplectomorphism if φω=ωsuperscript𝜑𝜔𝜔\varphi^{*}\omega=\omegaitalic_φ start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT italic_ω = italic_ω. This means

i=1ndQidPi=i=1ndqidpi,superscriptsubscript𝑖1𝑛dsubscript𝑄𝑖dsubscript𝑃𝑖superscriptsubscript𝑖1𝑛dsubscript𝑞𝑖dsubscript𝑝𝑖\sum_{i=1}^{n}\mathrm{d}Q_{i}\land\mathrm{d}P_{i}=\sum_{i=1}^{n}\mathrm{d}q_{i% }\land\mathrm{d}p_{i},∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , (2)

where

(Q1,,Qn,P1,,Pn)=φ(q1,,qn,p1,,pn),subscript𝑄1subscript𝑄𝑛subscript𝑃1subscript𝑃𝑛𝜑subscript𝑞1subscript𝑞𝑛subscript𝑝1subscript𝑝𝑛(Q_{1},\cdots,Q_{n},P_{1},\cdots,P_{n})=\varphi(q_{1},\cdots,q_{n},p_{1},% \cdots,p_{n}),( italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = italic_φ ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) , (3)

or equivalently,

JφΩJφ=Ω,superscriptsubscript𝐽𝜑topΩsubscript𝐽𝜑ΩJ_{\varphi}^{\top}\Omega J_{\varphi}=\Omega,italic_J start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_Ω italic_J start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT = roman_Ω , (4)

where

Jφ=(Q1q1Q1qnQ1p1Q1pnQnq1QnqnQnp1QnpnP1q1P1qnP1p1P1pnPnq1PnqnPnp1Pnpn)subscript𝐽𝜑matrixsubscript𝑄1subscript𝑞1subscript𝑄1subscript𝑞𝑛subscript𝑄1subscript𝑝1subscript𝑄1subscript𝑝𝑛subscript𝑄𝑛subscript𝑞1subscript𝑄𝑛subscript𝑞𝑛subscript𝑄𝑛subscript𝑝1subscript𝑄𝑛subscript𝑝𝑛subscript𝑃1subscript𝑞1subscript𝑃1subscript𝑞𝑛subscript𝑃1subscript𝑝1subscript𝑃1subscript𝑝𝑛subscript𝑃𝑛subscript𝑞1subscript𝑃𝑛subscript𝑞𝑛subscript𝑃𝑛subscript𝑝1subscript𝑃𝑛subscript𝑝𝑛J_{\varphi}=\begin{pmatrix}\frac{\partial Q_{1}}{\partial q_{1}}&\cdots&\frac{% \partial Q_{1}}{\partial q_{n}}&\frac{\partial Q_{1}}{\partial p_{1}}&\cdots&% \frac{\partial Q_{1}}{\partial p_{n}}\\ \vdots&\ddots&\vdots&\vdots&\ddots&\vdots\\ \frac{\partial Q_{n}}{\partial q_{1}}&\cdots&\frac{\partial Q_{n}}{\partial q_% {n}}&\frac{\partial Q_{n}}{\partial p_{1}}&\cdots&\frac{\partial Q_{n}}{% \partial p_{n}}\\ \frac{\partial P_{1}}{\partial q_{1}}&\cdots&\frac{\partial P_{1}}{\partial q_% {n}}&\frac{\partial P_{1}}{\partial p_{1}}&\cdots&\frac{\partial P_{1}}{% \partial p_{n}}\\ \vdots&\ddots&\vdots&\vdots&\ddots&\vdots\\ \frac{\partial P_{n}}{\partial q_{1}}&\cdots&\frac{\partial P_{n}}{\partial q_% {n}}&\frac{\partial P_{n}}{\partial p_{1}}&\cdots&\frac{\partial P_{n}}{% \partial p_{n}}\end{pmatrix}italic_J start_POSTSUBSCRIPT italic_φ end_POSTSUBSCRIPT = ( start_ARG start_ROW start_CELL divide start_ARG ∂ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL ⋯ end_CELL start_CELL divide start_ARG ∂ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_CELL start_CELL divide start_ARG ∂ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL ⋯ end_CELL start_CELL divide start_ARG ∂ italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL ⋯ end_CELL start_CELL divide start_ARG ∂ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_CELL start_CELL divide start_ARG ∂ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL ⋯ end_CELL start_CELL divide start_ARG ∂ italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL ⋯ end_CELL start_CELL divide start_ARG ∂ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_CELL start_CELL divide start_ARG ∂ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL ⋯ end_CELL start_CELL divide start_ARG ∂ italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL divide start_ARG ∂ italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL ⋯ end_CELL start_CELL divide start_ARG ∂ italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_CELL start_CELL divide start_ARG ∂ italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG end_CELL start_CELL ⋯ end_CELL start_CELL divide start_ARG ∂ italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG end_CELL end_ROW end_ARG ) (5)

is the Jacobian matrix of φ𝜑\varphiitalic_φ, and

Ω=(0n×nIn×nIn×n0n×n)Ωmatrixsubscript0𝑛𝑛subscript𝐼𝑛𝑛subscript𝐼𝑛𝑛subscript0𝑛𝑛\Omega=\begin{pmatrix}0_{n\times n}&I_{n\times n}\\ -I_{n\times n}&0_{n\times n}\end{pmatrix}roman_Ω = ( start_ARG start_ROW start_CELL 0 start_POSTSUBSCRIPT italic_n × italic_n end_POSTSUBSCRIPT end_CELL start_CELL italic_I start_POSTSUBSCRIPT italic_n × italic_n end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL - italic_I start_POSTSUBSCRIPT italic_n × italic_n end_POSTSUBSCRIPT end_CELL start_CELL 0 start_POSTSUBSCRIPT italic_n × italic_n end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) (6)

is the matrix of the standard symplectic form ω𝜔\omegaitalic_ω.

The most essential property of a Hamiltonian system

{dqidt=Hpi,dpidt=Hqi,i=1,2,,n,casesdsubscript𝑞𝑖d𝑡𝐻subscript𝑝𝑖otherwisedsubscript𝑝𝑖d𝑡𝐻subscript𝑞𝑖otherwise𝑖12𝑛\begin{cases}\displaystyle\frac{\mathrm{d}q_{i}}{\mathrm{d}t}=\frac{\partial H% }{\partial p_{i}},&\\[10.0pt] \displaystyle\frac{\mathrm{d}p_{i}}{\mathrm{d}t}=-\frac{\partial H}{\partial q% _{i}},&\end{cases}i=1,2,\cdots,n,{ start_ROW start_CELL divide start_ARG roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG roman_d italic_t end_ARG = divide start_ARG ∂ italic_H end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL divide start_ARG roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG roman_d italic_t end_ARG = - divide start_ARG ∂ italic_H end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG , end_CELL start_CELL end_CELL end_ROW italic_i = 1 , 2 , ⋯ , italic_n , (7)

where

H=H(q1,,qn,p1,,pn,t)C2(2n+1)𝐻𝐻subscript𝑞1subscript𝑞𝑛subscript𝑝1subscript𝑝𝑛𝑡superscript𝐶2superscript2𝑛1H=H(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n},t)\in C^{2}(\mathbb{R}^{2n+1})italic_H = italic_H ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t ) ∈ italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_R start_POSTSUPERSCRIPT 2 italic_n + 1 end_POSTSUPERSCRIPT )

is that its flow map defines a family of symplectomorphisms. This means that if we solve (7) from time t0subscript𝑡0t_{0}italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT to time t1subscript𝑡1t_{1}italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, then the map** defined by (q(t0),p(t0))(q(t1),p(t1))𝑞subscript𝑡0𝑝subscript𝑡0𝑞subscript𝑡1𝑝subscript𝑡1(q(t_{0}),p(t_{0}))\to(q(t_{1}),p(t_{1}))( italic_q ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) , italic_p ( italic_t start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ) ) → ( italic_q ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) , italic_p ( italic_t start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ) is an 2n2nsuperscript2𝑛superscript2𝑛\mathbb{R}^{2n}\to\mathbb{R}^{2n}blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT symplectomorphism. The inverse is also true: If a differential equation system on 2n2nsuperscript2𝑛superscript2𝑛\mathbb{R}^{2n}\to\mathbb{R}^{2n}blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT satisfies than the flow maps are symlectomorphisms, then there exists a function HC2(2n+1)𝐻superscript𝐶2superscript2𝑛1H\in C^{2}(\mathbb{R}^{2n+1})italic_H ∈ italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( blackboard_R start_POSTSUPERSCRIPT 2 italic_n + 1 end_POSTSUPERSCRIPT ) such that the system can be written as Hamiltonian system (7).

2.1.1 Example: Shearing

One simplest example of symplecticomorphism comes from the symplectic Euler method for separable Hamiltonian. Suppose F:n:𝐹superscript𝑛F:\mathbb{R}^{n}\rightarrow\mathbb{R}italic_F : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R is a smooth function, then

{Qi=qiPi=pi+Fqi(q1,,qn)casessubscript𝑄𝑖subscript𝑞𝑖otherwisesubscript𝑃𝑖subscript𝑝𝑖𝐹subscript𝑞𝑖subscript𝑞1subscript𝑞𝑛otherwise\begin{cases}Q_{i}=q_{i}&\\ P_{i}=p_{i}+\frac{\partial F}{\partial q_{i}}(q_{1},\cdots,q_{n})&\end{cases}{ start_ROW start_CELL italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG ∂ italic_F end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL start_CELL end_CELL end_ROW (8)

is a symplectic transformation, because

i=1ndQidPi=superscriptsubscript𝑖1𝑛dsubscript𝑄𝑖dsubscript𝑃𝑖absent\displaystyle\sum_{i=1}^{n}\mathrm{d}Q_{i}\land\mathrm{d}P_{i}=∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = i=1ndqid(pi+Fqi(q1,,qn))superscriptsubscript𝑖1𝑛dsubscript𝑞𝑖dsubscript𝑝𝑖𝐹subscript𝑞𝑖subscript𝑞1subscript𝑞𝑛\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}\left(p_{i}+\frac{% \partial F}{\partial q_{i}}(q_{1},\cdots,q_{n})\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG ∂ italic_F end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) )
=\displaystyle== i=1ndqidpi+i=1ndqidFqi(q1,,qn)superscriptsubscript𝑖1𝑛dsubscript𝑞𝑖dsubscript𝑝𝑖superscriptsubscript𝑖1𝑛dsubscript𝑞𝑖d𝐹subscript𝑞𝑖subscript𝑞1subscript𝑞𝑛\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}+\sum_{i=1}^{n}% \mathrm{d}q_{i}\land\mathrm{d}\frac{\partial F}{\partial q_{i}}(q_{1},\cdots,q% _{n})∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d divide start_ARG ∂ italic_F end_ARG start_ARG ∂ italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )
=\displaystyle== i=1ndqidpid(dF(q1,,qn))superscriptsubscript𝑖1𝑛dsubscript𝑞𝑖dsubscript𝑝𝑖dd𝐹subscript𝑞1subscript𝑞𝑛\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}-\mathrm{d}(% \mathrm{d}F(q_{1},\cdots,q_{n}))∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - roman_d ( roman_d italic_F ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) )

and the result comes from the identity d(dF)=0dd𝐹0\mathrm{d}(\mathrm{d}F)=0roman_d ( roman_d italic_F ) = 0. And similarly,

{Qi=qi+Gpi(p1,,pn)Pi=picasessubscript𝑄𝑖subscript𝑞𝑖𝐺subscript𝑝𝑖subscript𝑝1subscript𝑝𝑛otherwisesubscript𝑃𝑖subscript𝑝𝑖otherwise\begin{cases}Q_{i}=q_{i}+\frac{\partial G}{\partial p_{i}}(p_{1},\cdots,p_{n})% &\\ P_{i}=p_{i}&\end{cases}{ start_ROW start_CELL italic_Q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG ∂ italic_G end_ARG start_ARG ∂ italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW (9)

is also a symplectomorphism, where G:n:𝐺superscript𝑛G:\mathbb{R}^{n}\rightarrow\mathbb{R}italic_G : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R is a smooth function. We call the symplectomorphism given by (8) or (9) a symplectic shearing.

2.1.2 Example: Stretching

Another example is the ”coordinate stretching” transformation. A diagonal linear transformation on 2nsuperscript2𝑛\mathbb{R}^{2n}blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT is symplectic if and only if it has the form

(q1,,qn,p1,,pn)(k1q1,,knqn,p1k1,,pnkn),maps-tosubscript𝑞1subscript𝑞𝑛subscript𝑝1subscript𝑝𝑛subscript𝑘1subscript𝑞1subscript𝑘𝑛subscript𝑞𝑛subscript𝑝1subscript𝑘1subscript𝑝𝑛subscript𝑘𝑛(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n})\mapsto\left(k_{1}q_{1},\cdots,k_{n}q_{% n},\frac{p_{1}}{k_{1}},\cdots,\frac{p_{n}}{k_{n}}\right),( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ↦ ( italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , divide start_ARG italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_ARG , ⋯ , divide start_ARG italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG ) , (10)

where k1,,knsubscript𝑘1subscript𝑘𝑛k_{1},\cdots,k_{n}italic_k start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_k start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT are nonzero constants. Now we make it more general, supposing that each kisubscript𝑘𝑖k_{i}italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s are functions of the coordinates q1,,qn,p1,,pnsubscript𝑞1subscript𝑞𝑛subscript𝑝1subscript𝑝𝑛q_{1},\cdots,q_{n},p_{1},\cdots,p_{n}italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT. Then

i=1nd(kiqi)dpiki=superscriptsubscript𝑖1𝑛dsubscript𝑘𝑖subscript𝑞𝑖dsubscript𝑝𝑖subscript𝑘𝑖absent\displaystyle\sum_{i=1}^{n}\mathrm{d}(k_{i}q_{i})\land\mathrm{d}\frac{p_{i}}{k% _{i}}=∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d ( italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∧ roman_d divide start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG = i=1n(kidqi+qidki)(dpikipidkiki2)superscriptsubscript𝑖1𝑛subscript𝑘𝑖dsubscript𝑞𝑖subscript𝑞𝑖dsubscript𝑘𝑖dsubscript𝑝𝑖subscript𝑘𝑖subscript𝑝𝑖dsubscript𝑘𝑖superscriptsubscript𝑘𝑖2\displaystyle\sum_{i=1}^{n}(k_{i}\mathrm{d}q_{i}+q_{i}\mathrm{d}k_{i})\land% \left(\frac{\mathrm{d}p_{i}}{k_{i}}-\frac{p_{i}\mathrm{d}k_{i}}{k_{i}^{2}}\right)∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT ( italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_d italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∧ ( divide start_ARG roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG - divide start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_d italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) (11)
=\displaystyle== i=1ndqidpi+qikidkidpipidqidkiki+0superscriptsubscript𝑖1𝑛dsubscript𝑞𝑖dsubscript𝑝𝑖subscript𝑞𝑖subscript𝑘𝑖dsubscript𝑘𝑖dsubscript𝑝𝑖subscript𝑝𝑖dsubscript𝑞𝑖dsubscript𝑘𝑖subscript𝑘𝑖0\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}+\frac{q_{i}}{k_% {i}}\mathrm{d}k_{i}\land\mathrm{d}p_{i}-\frac{p_{i}\mathrm{d}q_{i}\land\mathrm% {d}k_{i}}{k_{i}}+0∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + divide start_ARG italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG roman_d italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG + 0
=\displaystyle== i=1ndqidpiqidpi+pidqikidkisuperscriptsubscript𝑖1𝑛dsubscript𝑞𝑖dsubscript𝑝𝑖subscript𝑞𝑖dsubscript𝑝𝑖subscript𝑝𝑖dsubscript𝑞𝑖subscript𝑘𝑖dsubscript𝑘𝑖\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}-\frac{q_{i}% \mathrm{d}p_{i}+p_{i}\mathrm{d}q_{i}}{k_{i}}\land\mathrm{d}k_{i}∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∧ roman_d italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT
=\displaystyle== i=1ndqidpid(piqi)kidki,superscriptsubscript𝑖1𝑛dsubscript𝑞𝑖dsubscript𝑝𝑖dsubscript𝑝𝑖subscript𝑞𝑖subscript𝑘𝑖dsubscript𝑘𝑖\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}-\frac{\mathrm{d% }(p_{i}q_{i})}{k_{i}}\land\mathrm{d}k_{i},∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∧ roman_d italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - divide start_ARG roman_d ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∧ roman_d italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ,

therefore, a transformation given as (10) is symplectic if and only if the condition

i=1nd(piqi)kidki=0superscriptsubscript𝑖1𝑛dsubscript𝑝𝑖subscript𝑞𝑖subscript𝑘𝑖dsubscript𝑘𝑖0\sum_{i=1}^{n}\frac{\mathrm{d}(p_{i}q_{i})}{k_{i}}\land\mathrm{d}k_{i}=0∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT divide start_ARG roman_d ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) end_ARG start_ARG italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ∧ roman_d italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = 0 (12)

is satisfied, the map** (10) is symplectic. Note that (12) can be written as

i=1nd(piqi)dln|ki|=0,superscriptsubscript𝑖1𝑛dsubscript𝑝𝑖subscript𝑞𝑖dsubscript𝑘𝑖0\sum_{i=1}^{n}\mathrm{d}(p_{i}q_{i})\land\mathrm{d}\ln|k_{i}|=0,∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_d ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∧ roman_d roman_ln | italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | = 0 ,

and accoring to Poincaré’s Lemma, (12) is satisfied if

i=1nln|ki|d(piqi)=dφsuperscriptsubscript𝑖1𝑛subscript𝑘𝑖dsubscript𝑝𝑖subscript𝑞𝑖d𝜑\sum_{i=1}^{n}\ln|k_{i}|\mathrm{d}(p_{i}q_{i})=\mathrm{d}\varphi∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT roman_ln | italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | roman_d ( italic_p start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = roman_d italic_φ (13)

for some smooth function φ:2n:𝜑superscript2𝑛\varphi:\mathbb{R}^{2n}\rightarrow\mathbb{R}italic_φ : blackboard_R start_POSTSUPERSCRIPT 2 italic_n end_POSTSUPERSCRIPT → blackboard_R. The condition (13) is satisfied when φ𝜑\varphiitalic_φ can be expressed as

φ(q1,,qn,p1,,pn)=Φ(p1q1,p2q2,,pnqn)𝜑subscript𝑞1subscript𝑞𝑛subscript𝑝1subscript𝑝𝑛Φsubscript𝑝1subscript𝑞1subscript𝑝2subscript𝑞2subscript𝑝𝑛subscript𝑞𝑛\varphi(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n})=\Phi(p_{1}q_{1},p_{2}q_{2},% \cdots,p_{n}q_{n})italic_φ ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = roman_Φ ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT )

for some Φ:n:Φsuperscript𝑛\Phi:\mathbb{R}^{n}\rightarrow\mathbb{R}roman_Φ : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R, and

ki=±eΦi(p1q1,p2q2,,pnqn)subscript𝑘𝑖plus-or-minussuperscriptesubscriptΦ𝑖subscript𝑝1subscript𝑞1subscript𝑝2subscript𝑞2subscript𝑝𝑛subscript𝑞𝑛k_{i}=\pm\mathrm{e}^{\Phi_{i}(p_{1}q_{1},p_{2}q_{2},\cdots,p_{n}q_{n})}italic_k start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ± roman_e start_POSTSUPERSCRIPT roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT (14)

holds, where ΦisubscriptΦ𝑖\Phi_{i}roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the partial derivative of ΦΦ\Phiroman_Φ on its i𝑖iitalic_i-ith argument:

Φi(x1,,xn)=Φxi(x1,,xn).subscriptΦ𝑖subscript𝑥1subscript𝑥𝑛Φsubscript𝑥𝑖subscript𝑥1subscript𝑥𝑛\Phi_{i}(x_{1},\cdots,x_{n})=\frac{\partial\Phi}{\partial x_{i}}(x_{1},\cdots,% x_{n}).roman_Φ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = divide start_ARG ∂ roman_Φ end_ARG start_ARG ∂ italic_x start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_ARG ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) . (15)

We call the symplectomorihism given by (10) and (14) a symplectic stretching.

2.2 Real NVP

Real NVP (Real-valued Non-Volume Preserving) [3, 1] is a generative model used for density estimation. Real NVP networks use invertible transformations, allowing us to go back and forth between the original and transformed spaces. The structure of real NVP is as follows: The input and output of the network are both N𝑁Nitalic_N-dimensional vectors. An N𝑁Nitalic_N-dimensional vector

z=(z1,z2,,zN)𝑧subscript𝑧1subscript𝑧2subscript𝑧𝑁z=(z_{1},z_{2},\cdots,z_{N})italic_z = ( italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_z start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT )

received as the input is partitioned in to two parts

z=(z1,,znA,zn+1,,zNB):=(zA,zB).𝑧subscriptsubscript𝑧1subscript𝑧𝑛𝐴subscriptsubscript𝑧𝑛1subscript𝑧𝑁𝐵assignsubscript𝑧𝐴subscript𝑧𝐵z=(\underbrace{z_{1},\cdots,z_{n}}_{A},\underbrace{z_{n+1},\cdots,z_{N}}_{B}):% =(z_{A},z_{B}).italic_z = ( under⏟ start_ARG italic_z start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_z start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , under⏟ start_ARG italic_z start_POSTSUBSCRIPT italic_n + 1 end_POSTSUBSCRIPT , ⋯ , italic_z start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT end_ARG start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) := ( italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) .

A Real NVP transformation keeps one of the parts unchanged and perform an ”entry-wise linear transformation” on the other part, whose coefficients are determined by the unchanged part. Specifically, the input z𝑧zitalic_z undergoes the following transformation:

{xA=zAxB=es(zA)zB+b(zA)casessubscript𝑥𝐴subscript𝑧𝐴otherwisesubscript𝑥𝐵direct-productsuperscripte𝑠subscript𝑧𝐴subscript𝑧𝐵𝑏subscript𝑧𝐴otherwise\begin{cases}x_{A}=z_{A}&\\ x_{B}=\mathrm{e}^{s(z_{A})}\odot z_{B}+b(z_{A})&\end{cases}{ start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT = roman_e start_POSTSUPERSCRIPT italic_s ( italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ⊙ italic_z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT + italic_b ( italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) end_CELL start_CELL end_CELL end_ROW (16)

where s,b:nNn:𝑠𝑏superscript𝑛superscript𝑁𝑛s,b:\mathbb{R}^{n}\rightarrow\mathbb{R}^{N-n}italic_s , italic_b : blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_N - italic_n end_POSTSUPERSCRIPT are two functions which are given as a neural networks in practice, and the symbol ”direct-product\odot” the Hadamard product (entry-wise product) operator:

(x1,,xn)(y1,,yn)=(x1y1,,xnyn).direct-productsubscript𝑥1subscript𝑥𝑛subscript𝑦1subscript𝑦𝑛subscript𝑥1subscript𝑦1subscript𝑥𝑛subscript𝑦𝑛(x_{1},\cdots,x_{n})\odot(y_{1},\cdots,y_{n})=(x_{1}y_{1},\cdots,x_{n}y_{n}).( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) ⊙ ( italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) = ( italic_x start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_x start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT italic_y start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ) .

The inverse of this map** (16) is clear:

{zA=xAzB=es(zA)(xBb(xA)).casessubscript𝑧𝐴subscript𝑥𝐴otherwisesubscript𝑧𝐵direct-productsuperscripte𝑠subscript𝑧𝐴subscript𝑥𝐵𝑏subscript𝑥𝐴otherwise\begin{cases}z_{A}=x_{A}&\\ z_{B}=\mathrm{e}^{-s(z_{A})}\odot(x_{B}-b(x_{A})).&\end{cases}{ start_ROW start_CELL italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = italic_x start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT = roman_e start_POSTSUPERSCRIPT - italic_s ( italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ⊙ ( italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT - italic_b ( italic_x start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) ) . end_CELL start_CELL end_CELL end_ROW (17)

The transformation (16) is often exhibited as a diagram like .

Refer to caption
Figure 1: A diagram of the transformation (16)

The apparent limitation of transformation (16) is that it does not change the part zAsubscript𝑧𝐴z_{A}italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT. This can be quickly fixed by appending another real NVP block that keeps the xBsubscript𝑥𝐵x_{B}italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT part unchanged:

{yAes~(xA)xB+b~(xA)yBxBcasessubscript𝑦𝐴direct-productsuperscripte~𝑠subscript𝑥𝐴subscript𝑥𝐵~𝑏subscript𝑥𝐴otherwisesubscript𝑦𝐵subscript𝑥𝐵otherwise\begin{cases}y_{A}\leftarrow\mathrm{e}^{\tilde{s}(x_{A})}\odot x_{B}+\tilde{b}% (x_{A})&\\ y_{B}\leftarrow x_{B}&\end{cases}{ start_ROW start_CELL italic_y start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ← roman_e start_POSTSUPERSCRIPT over~ start_ARG italic_s end_ARG ( italic_x start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ⊙ italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT + over~ start_ARG italic_b end_ARG ( italic_x start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_y start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ← italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW (18)

where s~,b~:Nnn:~𝑠~𝑏superscript𝑁𝑛superscript𝑛\tilde{s},\tilde{b}:\mathbb{R}^{N-n}\rightarrow\mathbb{R}^{n}over~ start_ARG italic_s end_ARG , over~ start_ARG italic_b end_ARG : blackboard_R start_POSTSUPERSCRIPT italic_N - italic_n end_POSTSUPERSCRIPT → blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT are another two neural network functions, so the composed transformation from z𝑧zitalic_z to y𝑦yitalic_y given by (16) and (18) do not keep any component unchanged. This can be exhibited as a diagram like .

Refer to caption
Figure 2: A diagram of the composition transformation

Of course, we can stack more layers like this to improve the expressivity of the network.

3 Symplectomorphism Neural Network (SymplectoNet, SpNN)

3.1 Structure

For our goal of building symplectomorphism neural network, the problem of real NVP is directly exhibited in its name: ”NVP” means ”non-volume-preserving”, while a symplectomorphism has to be volume preserving. Indeed, to make real NVP volume preserving (from ”real NVP” to ”real VP”), there is a quick fixation: one only needs to add an extra layer

(s1,,sN)(s1,,sN)s¯(1,,1),s¯=1Ni=1Nsiformulae-sequencesubscript𝑠1subscript𝑠𝑁subscript𝑠1subscript𝑠𝑁¯𝑠11¯𝑠1𝑁superscriptsubscript𝑖1𝑁subscript𝑠𝑖(s_{1},\cdots,s_{N})\rightarrow(s_{1},\cdots,s_{N})-\overline{s}(1,\cdots,1),% \quad\overline{s}=\frac{1}{N}\sum_{i=1}^{N}s_{i}( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_s start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) → ( italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_s start_POSTSUBSCRIPT italic_N end_POSTSUBSCRIPT ) - over¯ start_ARG italic_s end_ARG ( 1 , ⋯ , 1 ) , over¯ start_ARG italic_s end_ARG = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT italic_s start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT

after the output layer of the network that subtracts the average of the network. Unfortunately, mere volume-preserving property does not guarantee symplecticity. We need further adjustments.

Indeed, we can decompose (16) into two transformations: a ”stretching”

{ξA=zAξB=es(zA)zB,casessubscript𝜉𝐴subscript𝑧𝐴otherwisesubscript𝜉𝐵direct-productsuperscripte𝑠subscript𝑧𝐴subscript𝑧𝐵otherwise\begin{cases}\xi_{A}=z_{A}&\\ \xi_{B}=\mathrm{e}^{s(z_{A})}\odot z_{B},&\end{cases}{ start_ROW start_CELL italic_ξ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_ξ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT = roman_e start_POSTSUPERSCRIPT italic_s ( italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ⊙ italic_z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT , end_CELL start_CELL end_CELL end_ROW (19)

and a ”shearing”

{xA=ξAxB=ξB+b(ξA).casessubscript𝑥𝐴subscript𝜉𝐴otherwisesubscript𝑥𝐵subscript𝜉𝐵𝑏subscript𝜉𝐴otherwise\begin{cases}x_{A}=\xi_{A}&\\ x_{B}=\xi_{B}+b(\xi_{A}).&\end{cases}{ start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = italic_ξ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT = italic_ξ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT + italic_b ( italic_ξ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ) . end_CELL start_CELL end_CELL end_ROW (20)

Neither of these two transformations are guaranteed to be symplectic. Nevertheless, we have introduced their symplectic counterparts in the last section: Indeed, we can write (8), (9) and (10) (where (14), (15)) into a more compact form

{Q=qP=p+F(q),cases𝑄𝑞otherwise𝑃𝑝𝐹𝑞otherwise\begin{cases}Q=q&\\ P=p+\nabla F(q),&\end{cases}{ start_ROW start_CELL italic_Q = italic_q end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_P = italic_p + ∇ italic_F ( italic_q ) , end_CELL start_CELL end_CELL end_ROW (21)
{Q=q+G(p)P=p,cases𝑄𝑞𝐺𝑝otherwise𝑃𝑝otherwise\begin{cases}Q=q+\nabla G(p)&\\ P=p,&\end{cases}{ start_ROW start_CELL italic_Q = italic_q + ∇ italic_G ( italic_p ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_P = italic_p , end_CELL start_CELL end_CELL end_ROW (22)
{Q=eΦ(qp)qP=eΦ(qp)p,cases𝑄direct-productsuperscripteΦdirect-product𝑞𝑝𝑞otherwise𝑃direct-productsuperscripteΦdirect-product𝑞𝑝𝑝otherwise\begin{cases}Q=\mathrm{e}^{\nabla\Phi(q\odot p)}\odot q&\\ P=\mathrm{e}^{-\nabla\Phi(q\odot p)}\odot p,&\end{cases}{ start_ROW start_CELL italic_Q = roman_e start_POSTSUPERSCRIPT ∇ roman_Φ ( italic_q ⊙ italic_p ) end_POSTSUPERSCRIPT ⊙ italic_q end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_P = roman_e start_POSTSUPERSCRIPT - ∇ roman_Φ ( italic_q ⊙ italic_p ) end_POSTSUPERSCRIPT ⊙ italic_p , end_CELL start_CELL end_CELL end_ROW (23)

where q=(q1,,qn)𝑞subscript𝑞1subscript𝑞𝑛q=(q_{1},\cdots,q_{n})italic_q = ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), p=(p1,,pn)𝑝subscript𝑝1subscript𝑝𝑛p=(p_{1},\cdots,p_{n})italic_p = ( italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), Q=(Q1,,Qn)𝑄subscript𝑄1subscript𝑄𝑛Q=(Q_{1},\cdots,Q_{n})italic_Q = ( italic_Q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_Q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ), P=(P1,,Pn)𝑃subscript𝑃1subscript𝑃𝑛P=(P_{1},\cdots,P_{n})italic_P = ( italic_P start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_P start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). And ”direct-product\odot” is the Hadamard product as before. And now their correspondence with (19) and (20) are clear: (23) is exactly (20) when dimxAdimensionsubscript𝑥𝐴\dim x_{A}roman_dim italic_x start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT and dimxBdimensionsubscript𝑥𝐵\dim x_{B}roman_dim italic_x start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT are of the same dimension, and b𝑏bitalic_b is the gradient of a function; while (23) is a symmetrized version of (19):

{ξA=es(zAzB)zAξB=es(zAzB)zB,casessubscript𝜉𝐴direct-productsuperscripte𝑠direct-productsubscript𝑧𝐴subscript𝑧𝐵subscript𝑧𝐴otherwisesubscript𝜉𝐵direct-productsuperscripte𝑠direct-productsubscript𝑧𝐴subscript𝑧𝐵subscript𝑧𝐵otherwise\begin{cases}\xi_{A}=\mathrm{e}^{-s(z_{A}\odot z_{B})}\odot z_{A}&\\ \xi_{B}=\mathrm{e}^{s(z_{A}\odot z_{B})}\odot z_{B},&\end{cases}{ start_ROW start_CELL italic_ξ start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT = roman_e start_POSTSUPERSCRIPT - italic_s ( italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊙ italic_z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ⊙ italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_ξ start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT = roman_e start_POSTSUPERSCRIPT italic_s ( italic_z start_POSTSUBSCRIPT italic_A end_POSTSUBSCRIPT ⊙ italic_z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT ⊙ italic_z start_POSTSUBSCRIPT italic_B end_POSTSUBSCRIPT , end_CELL start_CELL end_CELL end_ROW

with s𝑠sitalic_s being the gradient of a function. We denote the transformations defined by (21), (22), (23) as qShFsubscriptqSh𝐹\operatorname{qSh}_{F}roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, pShGsubscriptpSh𝐺\operatorname{pSh}_{G}roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and StΦsubscriptStΦ\operatorname{St_{\Phi}}roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT, which are short hands for ”q-shearing”, ”p-shearing” and ”stretching”, respectively. These becomes the basic building blocks of the ”symplectic version of real NVP” once we take F𝐹Fitalic_F, G𝐺Gitalic_G and ΦΦ\Phiroman_Φ in these transformations as trainable neural networks.

Now we have introduced all the basic symplecticomorphism building blocks, and a symplectomorphism neural network (SymplectoNet, or even shorter, SpNN) is a neural network designed as an arbitrary finite composition of qShFsubscriptqSh𝐹\operatorname{qSh}_{F}roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, pShGsubscriptpSh𝐺\operatorname{pSh}_{G}roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and StΦsubscriptStΦ\operatorname{St_{\Phi}}roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT where F𝐹Fitalic_F, G𝐺Gitalic_G and ΦΦ\Phiroman_Φ are arbitrary neural networks with n𝑛nitalic_n-dimensional input and one-dimensional output.

Of course, the expressivity of this network depends on the complexity of the underlying neural networks F𝐹Fitalic_F, G𝐺Gitalic_G and H𝐻Hitalic_H, and also on the number of the building blocks we stacked. Indeed, the latter can be even more essential: e.g. if we only use less than four symplectic shearing blocks, we cannot even cover all the linear symplectomorphisms no matter how complicated the underlying network F𝐹Fitalic_F and G𝐺Gitalic_G are, because the Jacobian of a shearing transformation is of the form

(IBI)or(ICI),matrix𝐼missing-subexpression𝐵𝐼ormatrix𝐼𝐶missing-subexpression𝐼\begin{pmatrix}I&\\ B&I\end{pmatrix}\quad\text{or}\quad\begin{pmatrix}I&C\\ &I\end{pmatrix},( start_ARG start_ROW start_CELL italic_I end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_B end_CELL start_CELL italic_I end_CELL end_ROW end_ARG ) or ( start_ARG start_ROW start_CELL italic_I end_CELL start_CELL italic_C end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_I end_CELL end_ROW end_ARG ) ,

where B𝐵Bitalic_B, C𝐶Citalic_C are symmetric n×n𝑛𝑛n\times nitalic_n × italic_n matrices. The degree of freedom of these matrices are n(n+1)/2𝑛𝑛12n(n+1)/2italic_n ( italic_n + 1 ) / 2, while dimSp(2n)=n(2n+1)dimension𝑆𝑝2𝑛𝑛2𝑛1\dim Sp(2n)=n(2n+1)roman_dim italic_S italic_p ( 2 italic_n ) = italic_n ( 2 italic_n + 1 ), which is greater than 3n(n+1)/23𝑛𝑛123n(n+1)/23 italic_n ( italic_n + 1 ) / 2 for n>1𝑛1n>1italic_n > 1. This is why I also designed the symplectic stretching layer StrΦsubscriptStrΦ\operatorname{Str}_{\Phi}roman_Str start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT. A good practice is to include both the p, q-shearing and the symplectic stretching layers in the network for at lease once. A simplest example is a network with structure pShGStΦqShFsubscriptpSh𝐺subscriptStΦsubscriptqSh𝐹\operatorname{pSh}_{G}\circ\operatorname{St}_{\Phi}\circ\operatorname{qSh}_{F}roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∘ roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT ∘ roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT (see 3), which is similar to the structure of a real NVP.

q𝑞qitalic_qp𝑝pitalic_pF𝐹\nabla F∇ italic_FΦΦ\nabla\Phi∇ roman_Φexp1/x1𝑥1/x1 / italic_xG𝐺\nabla G∇ italic_GqShFsubscriptqSh𝐹\operatorname{qSh}_{F}roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPTpShGsubscriptpSh𝐺\operatorname{pSh}_{G}roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPTStΦsubscriptStΦ\operatorname{St}_{\Phi}roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT
Figure 3: The diagram expression of pShGStΦqShFsubscriptpSh𝐺subscriptStΦsubscriptqSh𝐹\operatorname{pSh}_{G}\circ\operatorname{St}_{\Phi}\circ\operatorname{qSh}_{F}roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∘ roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT ∘ roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT

3.2 SymplectoNet as Invertible Neural Network (INN)

One of the most important features of real NVP is that it is explicitly invertible: one can write out (or, in a more techical term, build the computation graph of) the explicit expression of the neural network function’s inverse function [5]. Our SymplectoNet is inspired by real NVP, so a natural question is whether the SymplectoNet structure is explicitly invertible like real NVP. Next, we will show that the answer is yes.

Indeed, since the inverse of a composed function f1f2fksubscript𝑓1subscript𝑓2subscript𝑓𝑘f_{1}\circ f_{2}\circ\cdots\circ f_{k}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∘ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∘ ⋯ ∘ italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is fk1f21f11superscriptsubscript𝑓𝑘1superscriptsubscript𝑓21superscriptsubscript𝑓11f_{k}^{-1}\circ\cdots\circ f_{2}^{-1}\circ f_{1}^{-1}italic_f start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∘ ⋯ ∘ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ∘ italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, so we only need to prove that the basic building blocks, pShGsubscriptpSh𝐺\operatorname{pSh}_{G}roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT, qShFsubscriptqSh𝐹\operatorname{qSh}_{F}roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT and StΦsubscriptStΦ\operatorname{St}_{\Phi}roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT are explicitly invertible. The inverse of pShGsubscriptpSh𝐺\operatorname{pSh}_{G}roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT, qShFsubscriptqSh𝐹\operatorname{qSh}_{F}roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT are obvious: (21) is equivalent to

{q=Qp=PF(Q),cases𝑞𝑄otherwise𝑝𝑃𝐹𝑄otherwise\begin{cases}q=Q&\\ p=P-\nabla F(Q),&\end{cases}{ start_ROW start_CELL italic_q = italic_Q end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_p = italic_P - ∇ italic_F ( italic_Q ) , end_CELL start_CELL end_CELL end_ROW

(22) is equivalent to

{q=QG(P)p=P,cases𝑞𝑄𝐺𝑃otherwise𝑝𝑃otherwise\begin{cases}q=Q-\nabla G(P)&\\ p=P,&\end{cases}{ start_ROW start_CELL italic_q = italic_Q - ∇ italic_G ( italic_P ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_p = italic_P , end_CELL start_CELL end_CELL end_ROW

therefore the inverse of pShGsubscriptpSh𝐺\operatorname{pSh}_{G}roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT, qShFsubscriptqSh𝐹\operatorname{qSh}_{F}roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT are  pShGsubscriptpSh𝐺\operatorname{pSh}_{-G}roman_pSh start_POSTSUBSCRIPT - italic_G end_POSTSUBSCRIPT, qShFsubscriptqSh𝐹\operatorname{qSh}_{-F}roman_qSh start_POSTSUBSCRIPT - italic_F end_POSTSUBSCRIPT, respectively. And finally we look at StΦsubscriptStΦ\operatorname{St}_{\Phi}roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT. Notice that from (23), we have

QP=eΦ(qp)qeΦ(qp)p=qp,direct-product𝑄𝑃direct-productsuperscripteΦdirect-product𝑞𝑝𝑞superscripteΦdirect-product𝑞𝑝𝑝direct-product𝑞𝑝Q\odot P=\mathrm{e}^{\nabla\Phi(q\odot p)}\odot q\odot\mathrm{e}^{-\nabla\Phi(% q\odot p)}\odot p=q\odot p,italic_Q ⊙ italic_P = roman_e start_POSTSUPERSCRIPT ∇ roman_Φ ( italic_q ⊙ italic_p ) end_POSTSUPERSCRIPT ⊙ italic_q ⊙ roman_e start_POSTSUPERSCRIPT - ∇ roman_Φ ( italic_q ⊙ italic_p ) end_POSTSUPERSCRIPT ⊙ italic_p = italic_q ⊙ italic_p ,

therefore

{q=eΦ(qp)Q=eΦ(QP)Qp=eΦ(qp)P=eΦ(QP)P,cases𝑞direct-productsuperscripteΦdirect-product𝑞𝑝𝑄direct-productsuperscripteΦdirect-product𝑄𝑃𝑄otherwise𝑝direct-productsuperscripteΦdirect-product𝑞𝑝𝑃direct-productsuperscripteΦdirect-product𝑄𝑃𝑃otherwise\begin{cases}q=\mathrm{e}^{-\nabla\Phi(q\odot p)}\odot Q=\mathrm{e}^{-\nabla% \Phi(Q\odot P)}\odot Q&\\ p=\mathrm{e}^{\nabla\Phi(q\odot p)}\odot P=\mathrm{e}^{\nabla\Phi(Q\odot P)}% \odot P,&\end{cases}{ start_ROW start_CELL italic_q = roman_e start_POSTSUPERSCRIPT - ∇ roman_Φ ( italic_q ⊙ italic_p ) end_POSTSUPERSCRIPT ⊙ italic_Q = roman_e start_POSTSUPERSCRIPT - ∇ roman_Φ ( italic_Q ⊙ italic_P ) end_POSTSUPERSCRIPT ⊙ italic_Q end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_p = roman_e start_POSTSUPERSCRIPT ∇ roman_Φ ( italic_q ⊙ italic_p ) end_POSTSUPERSCRIPT ⊙ italic_P = roman_e start_POSTSUPERSCRIPT ∇ roman_Φ ( italic_Q ⊙ italic_P ) end_POSTSUPERSCRIPT ⊙ italic_P , end_CELL start_CELL end_CELL end_ROW (24)

this shows that the inverse of StΦsubscriptStΦ\operatorname{St}_{\Phi}roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT is exactly StΦsubscriptStΦ\operatorname{St}_{-\Phi}roman_St start_POSTSUBSCRIPT - roman_Φ end_POSTSUBSCRIPT. In conclusion, we have

{(pShG)1=pShG,(qShF)1=qShF,(StΦ)1=StΦ.casessuperscriptsubscriptpSh𝐺1subscriptpSh𝐺otherwisesuperscriptsubscriptqSh𝐹1subscriptqSh𝐹otherwisesuperscriptsubscriptStΦ1subscriptStΦotherwise\begin{cases}\left(\operatorname{pSh}_{G}\right)^{-1}=\ \operatorname{pSh}_{-G% },&\\ \left(\operatorname{qSh}_{F}\right)^{-1}=\operatorname{qSh}_{-F},&\\ \left(\operatorname{St}_{\Phi}\right)^{-1}=\operatorname{St}_{-\Phi}.&\end{cases}{ start_ROW start_CELL ( roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = roman_pSh start_POSTSUBSCRIPT - italic_G end_POSTSUBSCRIPT , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL ( roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = roman_qSh start_POSTSUBSCRIPT - italic_F end_POSTSUBSCRIPT , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL ( roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = roman_St start_POSTSUBSCRIPT - roman_Φ end_POSTSUBSCRIPT . end_CELL start_CELL end_CELL end_ROW (25)

These results give a neat expression of inverting the SymplectoNet. E.g. the inverse of the SymplectoNet

(pShGStΦqShF)1=qShFStΦpShG.superscriptsubscriptpSh𝐺subscriptStΦsubscriptqSh𝐹1subscriptqSh𝐹subscriptStΦsubscriptpSh𝐺\left(\operatorname{pSh}_{G}\circ\operatorname{St}_{\Phi}\circ\operatorname{% qSh}_{F}\right)^{-1}=\operatorname{qSh}_{-F}\circ\operatorname{St}_{-\Phi}% \circ\operatorname{pSh}_{-G}.( roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT ∘ roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT ∘ roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = roman_qSh start_POSTSUBSCRIPT - italic_F end_POSTSUBSCRIPT ∘ roman_St start_POSTSUBSCRIPT - roman_Φ end_POSTSUBSCRIPT ∘ roman_pSh start_POSTSUBSCRIPT - italic_G end_POSTSUBSCRIPT . (26)

This shows that the inverse of SymplectoNet is explicitly available.

4 Extension to Family of Symplectomorphism

A natural extension of the symplectomorphism neural network is to include some parameters τ1,τ2,,τKsubscript𝜏1subscript𝜏2subscript𝜏𝐾\tau_{1},\tau_{2},\cdots,\tau_{K}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , ⋯ , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT other that the canonical variables as inputs. This is can be easily achieved by changing the F(q),G(p),Φ(z)𝐹𝑞𝐺𝑝Φ𝑧F(q),G(p),\Phi(z)italic_F ( italic_q ) , italic_G ( italic_p ) , roman_Φ ( italic_z ) in the basic building blocks qShFsubscriptqSh𝐹\operatorname{qSh}_{F}roman_qSh start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, pShGsubscriptpSh𝐺\operatorname{pSh}_{G}roman_pSh start_POSTSUBSCRIPT italic_G end_POSTSUBSCRIPT and StΦsubscriptStΦ\operatorname{St}_{\Phi}roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT into (n+K𝑛𝐾n+Kitalic_n + italic_K)-variable functions F(q;τ)𝐹𝑞𝜏F(q;\tau)italic_F ( italic_q ; italic_τ ), G(p;τ)𝐺𝑝𝜏G(p;\tau)italic_G ( italic_p ; italic_τ ), Φ(z;τ)Φ𝑧𝜏\Phi(z;\tau)roman_Φ ( italic_z ; italic_τ ), where τ=(τ1,,τK)𝜏subscript𝜏1subscript𝜏𝐾\tau=(\tau_{1},\cdots,\tau_{K})italic_τ = ( italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ), and modify the blocks given by (21) ~(23) into

{Q=qP=p+qF(q,τ),cases𝑄𝑞otherwise𝑃𝑝subscript𝑞𝐹𝑞𝜏otherwise\begin{cases}Q=q&\\ P=p+\nabla_{q}F(q,\tau),&\end{cases}{ start_ROW start_CELL italic_Q = italic_q end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_P = italic_p + ∇ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT italic_F ( italic_q , italic_τ ) , end_CELL start_CELL end_CELL end_ROW (27)
{Q=q+pG(p,τ)P=p,cases𝑄𝑞subscript𝑝𝐺𝑝𝜏otherwise𝑃𝑝otherwise\begin{cases}Q=q+\nabla_{p}G(p,\tau)&\\ P=p,&\end{cases}{ start_ROW start_CELL italic_Q = italic_q + ∇ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT italic_G ( italic_p , italic_τ ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_P = italic_p , end_CELL start_CELL end_CELL end_ROW (28)
{Q=ezΦ(qp,τ)qP=ezΦ(qp,τ)p,cases𝑄direct-productsuperscriptesubscript𝑧Φdirect-product𝑞𝑝𝜏𝑞otherwise𝑃direct-productsuperscriptesubscript𝑧Φdirect-product𝑞𝑝𝜏𝑝otherwise\begin{cases}Q=\mathrm{e}^{\nabla_{z}\Phi(q\odot p,\tau)}\odot q&\\ P=\mathrm{e}^{-\nabla_{z}\Phi(q\odot p,\tau)}\odot p,&\end{cases}{ start_ROW start_CELL italic_Q = roman_e start_POSTSUPERSCRIPT ∇ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT roman_Φ ( italic_q ⊙ italic_p , italic_τ ) end_POSTSUPERSCRIPT ⊙ italic_q end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_P = roman_e start_POSTSUPERSCRIPT - ∇ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT roman_Φ ( italic_q ⊙ italic_p , italic_τ ) end_POSTSUPERSCRIPT ⊙ italic_p , end_CELL start_CELL end_CELL end_ROW (29)

With this modification, the network receives (2n+K2𝑛𝐾2n+K2 italic_n + italic_K)-dimensional vectors

(q1,,qn,p1,,pn,τ1,,τK)subscript𝑞1subscript𝑞𝑛subscript𝑝1subscript𝑝𝑛subscript𝜏1subscript𝜏𝐾(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n},\tau_{1},\cdots,\tau_{K})( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT )

as inputs and the output dimension is still 2n2𝑛2n2 italic_n, and for each fixed τ1,,τKsubscript𝜏1subscript𝜏𝐾\tau_{1},\cdots,\tau_{K}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT, the output vector is a symplectomorphism of the canonical part of the input vector, i.e. (q1,,qn,p1,,pn)subscript𝑞1subscript𝑞𝑛subscript𝑝1subscript𝑝𝑛(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n})( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT ). Thus, each choice of the parameters τ1,,τKsubscript𝜏1subscript𝜏𝐾\tau_{1},\cdots,\tau_{K}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT defines a symplectomorphism, or we can say that the network defines a continuous family of symplectomorphisms parameterized by τ1,,τKsubscript𝜏1subscript𝜏𝐾\tau_{1},\cdots,\tau_{K}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_τ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT. A particularly common situation of this is when K=1𝐾1K=1italic_K = 1 and τ1=tsubscript𝜏1𝑡\tau_{1}=titalic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = italic_t represents the time variable. In this case, the network function can represent the solution of some Hamiltonian equation, and thanks to the symplectic property, of the network, there exists a Hamiltonian function

H=H(q1,,qn,p1,,pn,t)𝐻𝐻subscript𝑞1subscript𝑞𝑛subscript𝑝1subscript𝑝𝑛𝑡H=H(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n},t)italic_H = italic_H ( italic_q start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_q start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_p start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , ⋯ , italic_p start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT , italic_t )

such that the network function represents exactly the solution of its corresponding Hamiltonian system (7). Nevertheless, it is not guaranteed that the symplectomorphism family parameterized by t𝑡titalic_t forms a single-parameter symplectomorphism group, i.e. the corresponding Hamiltonian H𝐻Hitalic_H has to depend explicitly on time, and we do not have method to exactly cancel this dependency.

By including more parameters (i.e. K>1𝐾1K>1italic_K > 1), it is also possible to apply this network for optimal control problems involving Hamiltonian dynamics.

5 Some Preliminary Results

5.1 A Polar Nonlinear Map**

This example is learning a symplectic map

(q,p)(2qcosp,2qsinp)=:(Q,P)(q,\ p)\rightarrow\left(\sqrt{2q}\cos p,\sqrt{2q}\sin p\right)=:(Q,P)( italic_q , italic_p ) → ( square-root start_ARG 2 italic_q end_ARG roman_cos italic_p , square-root start_ARG 2 italic_q end_ARG roman_sin italic_p ) = : ( italic_Q , italic_P ) (30)

A network with structure

qShF1pShG1StΦqShF2pShG2,subscriptqShsubscript𝐹1subscriptpShsubscript𝐺1subscriptStΦsubscriptqShsubscript𝐹2subscriptpShsubscript𝐺2\operatorname{qSh}_{F_{1}}\circ\operatorname{pSh}_{G_{1}}\circ\operatorname{St% }_{\Phi}\circ\operatorname{qSh}_{F_{2}}\circ\operatorname{pSh}_{G_{2}},roman_qSh start_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ roman_pSh start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ roman_St start_POSTSUBSCRIPT roman_Φ end_POSTSUBSCRIPT ∘ roman_qSh start_POSTSUBSCRIPT italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∘ roman_pSh start_POSTSUBSCRIPT italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ,

where F1,G1,F2,G2subscript𝐹1subscript𝐺1subscript𝐹2subscript𝐺2F_{1},G_{1},F_{2},G_{2}italic_F start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_G start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are (2,20,10,1)220101(2,20,10,1)( 2 , 20 , 10 , 1 ) dense neural networks, and ΦΦ\Phiroman_Φ is (2,10,1)2101(2,10,1)( 2 , 10 , 1 ) dense neural network. The loss is the ordinary MSE loss. Adamax with learning rate 0.25 is applied here, and decay by factor 0.99 every 100 epoch.

Firstly, some uniformly random points for

(q,p)[0,1]×[0,1]𝑞𝑝0101(q,p)\in[0,1]\times[0,1]( italic_q , italic_p ) ∈ [ 0 , 1 ] × [ 0 , 1 ]

is sampled. The training went for 40,000 epochs, and the loss dropped from 0.30.30.30.3 to about 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT, and the plot is shown in Figure 4(a), and the loss decay is shown in Figure 4(b).

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 4: Numerical experiment results of symplectomorphism neural network fitting the symplectomorphism (30). (a): The result of (30) with (q,p)[0,1]×[0,1]𝑞𝑝0101(q,p)\in[0,1]\times[0,1]( italic_q , italic_p ) ∈ [ 0 , 1 ] × [ 0 , 1 ]. Blue dots: true data; Orange stars: predicted results. Note that most of the error comes from data near q=0𝑞0q=0italic_q = 0 because there is a singularity there; (b): The loss decay of (a); (c): The result of (30) with (q,p)[1/2,3/2]×[0,3π/2]𝑞𝑝123203𝜋2(q,p)\in[1/2,3/2]\times[0,3\pi/2]( italic_q , italic_p ) ∈ [ 1 / 2 , 3 / 2 ] × [ 0 , 3 italic_π / 2 ]. Blue dots: true data; Orange stars: predicted results. Note that most of the error comes from data near q=0𝑞0q=0italic_q = 0 because there is a singularity there; (d): The loss decay of (c);

Anoter numerical experiments concerning also (30) but the domain changed to

(q,p)[12,32]×[0,3π2]𝑞𝑝123203𝜋2(q,p)\in\left[\frac{1}{2},\frac{3}{2}\right]\times\left[0,\frac{3\pi}{2}\right]( italic_q , italic_p ) ∈ [ divide start_ARG 1 end_ARG start_ARG 2 end_ARG , divide start_ARG 3 end_ARG start_ARG 2 end_ARG ] × [ 0 , divide start_ARG 3 italic_π end_ARG start_ARG 2 end_ARG ]

is also conducted. This time, the geometry of the transformation is more complicated. Note that we cannot do p:[0,2π]:𝑝02𝜋p:[0,2\pi]italic_p : [ 0 , 2 italic_π ] because this will make the map** (30) non-injective, while the model is invertible. Thus the model will have difficulty learning the data near the two lines p=0𝑝0p=0italic_p = 0 and p=2π𝑝2𝜋p=2\piitalic_p = 2 italic_π. The training went for 40,000 epochs, and the loss dropped from 0.30.30.30.3 to about 105superscript10510^{-5}10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT, and the plot is shown in Figure 4(c), and the loss decay is shown in Figure 4(d). The majority of the error comes from p=3π/2𝑝3𝜋2p=3\pi/2italic_p = 3 italic_π / 2 boundary. This is because the points her are close to the points with p=0𝑝0p=0italic_p = 0.

References

  • [1] C. Bishop and H. Bishop, Deep Learning: Foundations and Concepts, Springer International Publishing, 2023.
  • [2] A. da Silva, Lectures on Symplectic Geometry, Lecture Notes in Mathematics, Springer Berlin Heidelberg, 2004.
  • [3] L. Dinh, J. N. Sohl-Dickstein, and S. Bengio, Density estimation using real nvp, ArXiv, abs/1605.08803 (2016).
  • [4] H. Goldstein, Classical Mechanics, Addison-Wesley series in physics, Addison-Wesley Publishing Company, 1980.
  • [5] I. Ishikawa, T. Teshima, K. Tojo, K. Oono, M. Ikeda, and M. Sugiyama, Universal approximation property of invertible neural networks, Journal of Machine Learning Research, 24 (2023), pp. 1–68.