Deep Neural Networks with Symplectic Preservation Properties

Qing He Wei Cai

Abstract

We propose a deep neural network architecture designed such that its output forms an invertible symplectomorphism of the input. This design draws an analogy to the real-valued non-volume-preserving (real NVP) method used in normalizing flow techniques. Utilizing this neural network type allows for learning tasks on unknown Hamiltonian systems without breaking the inherent symplectic structure of the phase space.

Key Words: Deep learning, Symplecticomorphism, Structure-Preserving

AMS Classifications: 37J11, 70H15, 68T07

1 Introduction

For an unknown Hamiltonian system, our objective is to learn the flow map** over a fixed time period $T$ . Specifically, we seek to determine the map $\Phi_{T}$ that computes $(q,p)_{t=T}$ given an initial condition $(q,p)_{t=0}=(q_{0},p_{0})$ . Such problems arise, for instance, when analyzing a sequence of system snapshots at times $0,T,2T,3T,\ldots$ . The key information we possess about this map** is its property as a symplectomorphism (or canonical transformation), implying that the Jacobian of $\Phi_{T}$ belongs to the symplectic group $Sp(2n)$ , where $n$ is the dimensionality of the system’s configuration space [2, 4].

In this study, we propose a neural network structure designed to ensure that its output is precisely a symplectomorphism of the input. ”Precisely” here means that the Jacobian of the map** defined by the neural network is exactly a symplectic matrix, accounting only for minimal rounding errors inherent to floating-point arithmetic. Importantly, this framework eliminates the need to introduce an additional ”deviation-from-symplecticity penalty term” in our learning objective because the inherent structure of the network guarantees that the symplectomorphism condition cannot be violated.

The approach draws inspiration from the real NVP method [3], which is primarily used for density estimation of probability measures and differs significantly in purpose from our intended application. Nonetheless, this work leverages real NVP’s elegant methodology for constructing explicitly invertible neural networks. The method we propose represents a ”symplectic adaptation” of this technique, employing building blocks akin to those in real NVP while ensuring the preservation of symplecticity throughout. This adaptation involves replacing components that could potentially compromise the symplectic property of the map**.

2 Preliminaries

2.1 Symplectic Structures and Symplectomorphism

On $\mathbb{R}^{2n}$ , we denote the standard Cartesian coordinates as $q_{1},\cdots,q_{n},p_{1},\cdots,p_{n}$ , corresponding to the ”position” and ”momentum” coordinates in Hamiltonian mechanics. The standard symplectic form on $\mathbb{R}^{2n}$ is the differential 2-form

\omega=\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i},

(1)

and a transformation $\varphi:\mathbb{R}^{2n}\rightarrow\mathbb{R}^{2n}$ is called a symplectomorphism if $\varphi^{*}\omega=\omega$ . This means

\sum_{i=1}^{n}\mathrm{d}Q_{i}\land\mathrm{d}P_{i}=\sum_{i=1}^{n}\mathrm{d}q_{i% }\land\mathrm{d}p_{i},

(2)

where

(Q_{1},\cdots,Q_{n},P_{1},\cdots,P_{n})=\varphi(q_{1},\cdots,q_{n},p_{1},% \cdots,p_{n}),

(3)

or equivalently,

J_{\varphi}^{\top}\Omega J_{\varphi}=\Omega,

(4)

where

J_{\varphi}=\begin{pmatrix}\frac{\partial Q_{1}}{\partial q_{1}}&\cdots&\frac{% \partial Q_{1}}{\partial q_{n}}&\frac{\partial Q_{1}}{\partial p_{1}}&\cdots&% \frac{\partial Q_{1}}{\partial p_{n}}\\ \vdots&\ddots&\vdots&\vdots&\ddots&\vdots\\ \frac{\partial Q_{n}}{\partial q_{1}}&\cdots&\frac{\partial Q_{n}}{\partial q_% {n}}&\frac{\partial Q_{n}}{\partial p_{1}}&\cdots&\frac{\partial Q_{n}}{% \partial p_{n}}\\ \frac{\partial P_{1}}{\partial q_{1}}&\cdots&\frac{\partial P_{1}}{\partial q_% {n}}&\frac{\partial P_{1}}{\partial p_{1}}&\cdots&\frac{\partial P_{1}}{% \partial p_{n}}\\ \vdots&\ddots&\vdots&\vdots&\ddots&\vdots\\ \frac{\partial P_{n}}{\partial q_{1}}&\cdots&\frac{\partial P_{n}}{\partial q_% {n}}&\frac{\partial P_{n}}{\partial p_{1}}&\cdots&\frac{\partial P_{n}}{% \partial p_{n}}\end{pmatrix}

(5)

is the Jacobian matrix of $\varphi$ , and

\Omega=\begin{pmatrix}0_{n\times n}&I_{n\times n}\\ -I_{n\times n}&0_{n\times n}\end{pmatrix}

(6)

is the matrix of the standard symplectic form $\omega$ .

The most essential property of a Hamiltonian system

\begin{cases}\displaystyle\frac{\mathrm{d}q_{i}}{\mathrm{d}t}=\frac{\partial H% }{\partial p_{i}},&\\[10.0pt] \displaystyle\frac{\mathrm{d}p_{i}}{\mathrm{d}t}=-\frac{\partial H}{\partial q% _{i}},&\end{cases}i=1,2,\cdots,n,

(7)

where

H=H(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n},t)\in C^{2}(\mathbb{R}^{2n+1})

is that its flow map defines a family of symplectomorphisms. This means that if we solve (7) from time $t_{0}$ to time $t_{1}$ , then the map** defined by $(q(t_{0}),p(t_{0}))\to(q(t_{1}),p(t_{1}))$ is an $\mathbb{R}^{2n}\to\mathbb{R}^{2n}$ symplectomorphism. The inverse is also true: If a differential equation system on $\mathbb{R}^{2n}\to\mathbb{R}^{2n}$ satisfies than the flow maps are symlectomorphisms, then there exists a function $H\in C^{2}(\mathbb{R}^{2n+1})$ such that the system can be written as Hamiltonian system (7).

2.1.1 Example: Shearing

One simplest example of symplecticomorphism comes from the symplectic Euler method for separable Hamiltonian. Suppose $F:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is a smooth function, then

\begin{cases}Q_{i}=q_{i}&\\ P_{i}=p_{i}+\frac{\partial F}{\partial q_{i}}(q_{1},\cdots,q_{n})&\end{cases}

(8)

is a symplectic transformation, because

	$\displaystyle\sum_{i=1}^{n}\mathrm{d}Q_{i}\land\mathrm{d}P_{i}=$	$\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}\left(p_{i}+\frac{% \partial F}{\partial q_{i}}(q_{1},\cdots,q_{n})\right)$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}+\sum_{i=1}^{n}% \mathrm{d}q_{i}\land\mathrm{d}\frac{\partial F}{\partial q_{i}}(q_{1},\cdots,q% _{n})$
	$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}-\mathrm{d}(% \mathrm{d}F(q_{1},\cdots,q_{n}))$

and the result comes from the identity $\mathrm{d}(\mathrm{d}F)=0$ . And similarly,

\begin{cases}Q_{i}=q_{i}+\frac{\partial G}{\partial p_{i}}(p_{1},\cdots,p_{n})% &\\ P_{i}=p_{i}&\end{cases}

(9)

is also a symplectomorphism, where $G:\mathbb{R}^{n}\rightarrow\mathbb{R}$ is a smooth function. We call the symplectomorphism given by (8) or (9) a symplectic shearing.

2.1.2 Example: Stretching

Another example is the ”coordinate stretching” transformation. A diagonal linear transformation on $\mathbb{R}^{2n}$ is symplectic if and only if it has the form

(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n})\mapsto\left(k_{1}q_{1},\cdots,k_{n}q_{% n},\frac{p_{1}}{k_{1}},\cdots,\frac{p_{n}}{k_{n}}\right),

(10)

where $k_{1},\cdots,k_{n}$ are nonzero constants. Now we make it more general, supposing that each $k_{i}$ ’s are functions of the coordinates $q_{1},\cdots,q_{n},p_{1},\cdots,p_{n}$ . Then

$\displaystyle\sum_{i=1}^{n}\mathrm{d}(k_{i}q_{i})\land\mathrm{d}\frac{p_{i}}{k% _{i}}=$	$\displaystyle\sum_{i=1}^{n}(k_{i}\mathrm{d}q_{i}+q_{i}\mathrm{d}k_{i})\land% \left(\frac{\mathrm{d}p_{i}}{k_{i}}-\frac{p_{i}\mathrm{d}k_{i}}{k_{i}^{2}}\right)$	(11)
$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}+\frac{q_{i}}{k_% {i}}\mathrm{d}k_{i}\land\mathrm{d}p_{i}-\frac{p_{i}\mathrm{d}q_{i}\land\mathrm% {d}k_{i}}{k_{i}}+0$
$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}-\frac{q_{i}% \mathrm{d}p_{i}+p_{i}\mathrm{d}q_{i}}{k_{i}}\land\mathrm{d}k_{i}$
$\displaystyle=$	$\displaystyle\sum_{i=1}^{n}\mathrm{d}q_{i}\land\mathrm{d}p_{i}-\frac{\mathrm{d% }(p_{i}q_{i})}{k_{i}}\land\mathrm{d}k_{i},$

therefore, a transformation given as (10) is symplectic if and only if the condition

\sum_{i=1}^{n}\frac{\mathrm{d}(p_{i}q_{i})}{k_{i}}\land\mathrm{d}k_{i}=0

(12)

is satisfied, the map** (10) is symplectic. Note that (12) can be written as

\sum_{i=1}^{n}\mathrm{d}(p_{i}q_{i})\land\mathrm{d}\ln|k_{i}|=0,

and accoring to Poincaré’s Lemma, (12) is satisfied if

\sum_{i=1}^{n}\ln|k_{i}|\mathrm{d}(p_{i}q_{i})=\mathrm{d}\varphi

(13)

for some smooth function $\varphi:\mathbb{R}^{2n}\rightarrow\mathbb{R}$ . The condition (13) is satisfied when $\varphi$ can be expressed as

\varphi(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n})=\Phi(p_{1}q_{1},p_{2}q_{2},% \cdots,p_{n}q_{n})

for some $\Phi:\mathbb{R}^{n}\rightarrow\mathbb{R}$ , and

k_{i}=\pm\mathrm{e}^{\Phi_{i}(p_{1}q_{1},p_{2}q_{2},\cdots,p_{n}q_{n})}

(14)

holds, where $\Phi_{i}$ is the partial derivative of $\Phi$ on its $i$ -ith argument:

\Phi_{i}(x_{1},\cdots,x_{n})=\frac{\partial\Phi}{\partial x_{i}}(x_{1},\cdots,% x_{n}).

(15)

We call the symplectomorihism given by (10) and (14) a symplectic stretching.

2.2 Real NVP

Real NVP (Real-valued Non-Volume Preserving) [3, 1] is a generative model used for density estimation. Real NVP networks use invertible transformations, allowing us to go back and forth between the original and transformed spaces. The structure of real NVP is as follows: The input and output of the network are both $N$ -dimensional vectors. An $N$ -dimensional vector

z=(z_{1},z_{2},\cdots,z_{N})

received as the input is partitioned in to two parts

z=(\underbrace{z_{1},\cdots,z_{n}}_{A},\underbrace{z_{n+1},\cdots,z_{N}}_{B}):% =(z_{A},z_{B}).

A Real NVP transformation keeps one of the parts unchanged and perform an ”entry-wise linear transformation” on the other part, whose coefficients are determined by the unchanged part. Specifically, the input $z$ undergoes the following transformation:

\begin{cases}x_{A}=z_{A}&\\ x_{B}=\mathrm{e}^{s(z_{A})}\odot z_{B}+b(z_{A})&\end{cases}

(16)

where $s,b:\mathbb{R}^{n}\rightarrow\mathbb{R}^{N-n}$ are two functions which are given as a neural networks in practice, and the symbol ” $\odot$ ” the Hadamard product (entry-wise product) operator:

(x_{1},\cdots,x_{n})\odot(y_{1},\cdots,y_{n})=(x_{1}y_{1},\cdots,x_{n}y_{n}).

The inverse of this map** (16) is clear:

\begin{cases}z_{A}=x_{A}&\\ z_{B}=\mathrm{e}^{-s(z_{A})}\odot(x_{B}-b(x_{A})).&\end{cases}

(17)

The transformation (16) is often exhibited as a diagram like .

Refer to caption — Figure 1: A diagram of the transformation (16)

The apparent limitation of transformation (16) is that it does not change the part $z_{A}$ . This can be quickly fixed by appending another real NVP block that keeps the $x_{B}$ part unchanged:

\begin{cases}y_{A}\leftarrow\mathrm{e}^{\tilde{s}(x_{A})}\odot x_{B}+\tilde{b}% (x_{A})&\\ y_{B}\leftarrow x_{B}&\end{cases}

(18)

where $\tilde{s},\tilde{b}:\mathbb{R}^{N-n}\rightarrow\mathbb{R}^{n}$ are another two neural network functions, so the composed transformation from $z$ to $y$ given by (16) and (18) do not keep any component unchanged. This can be exhibited as a diagram like .

Of course, we can stack more layers like this to improve the expressivity of the network.

3 Symplectomorphism Neural Network (SymplectoNet, SpNN)

3.1 Structure

For our goal of building symplectomorphism neural network, the problem of real NVP is directly exhibited in its name: ”NVP” means ”non-volume-preserving”, while a symplectomorphism has to be volume preserving. Indeed, to make real NVP volume preserving (from ”real NVP” to ”real VP”), there is a quick fixation: one only needs to add an extra layer

(s_{1},\cdots,s_{N})\rightarrow(s_{1},\cdots,s_{N})-\overline{s}(1,\cdots,1),% \quad\overline{s}=\frac{1}{N}\sum_{i=1}^{N}s_{i}

after the output layer of the network that subtracts the average of the network. Unfortunately, mere volume-preserving property does not guarantee symplecticity. We need further adjustments.

Indeed, we can decompose (16) into two transformations: a ”stretching”

\begin{cases}\xi_{A}=z_{A}&\\ \xi_{B}=\mathrm{e}^{s(z_{A})}\odot z_{B},&\end{cases}

(19)

and a ”shearing”

\begin{cases}x_{A}=\xi_{A}&\\ x_{B}=\xi_{B}+b(\xi_{A}).&\end{cases}

(20)

Neither of these two transformations are guaranteed to be symplectic. Nevertheless, we have introduced their symplectic counterparts in the last section: Indeed, we can write (8), (9) and (10) (where (14), (15)) into a more compact form

\begin{cases}Q=q&\\ P=p+\nabla F(q),&\end{cases}

(21)

\begin{cases}Q=q+\nabla G(p)&\\ P=p,&\end{cases}

(22)

\begin{cases}Q=\mathrm{e}^{\nabla\Phi(q\odot p)}\odot q&\\ P=\mathrm{e}^{-\nabla\Phi(q\odot p)}\odot p,&\end{cases}

(23)

where $q=(q_{1},\cdots,q_{n})$ , $p=(p_{1},\cdots,p_{n})$ , $Q=(Q_{1},\cdots,Q_{n})$ , $P=(P_{1},\cdots,P_{n})$ . And ” $\odot$ ” is the Hadamard product as before. And now their correspondence with (19) and (20) are clear: (23) is exactly (20) when $\dim x_{A}$ and $\dim x_{B}$ are of the same dimension, and $b$ is the gradient of a function; while (23) is a symmetrized version of (19):

\begin{cases}\xi_{A}=\mathrm{e}^{-s(z_{A}\odot z_{B})}\odot z_{A}&\\ \xi_{B}=\mathrm{e}^{s(z_{A}\odot z_{B})}\odot z_{B},&\end{cases}

with $s$ being the gradient of a function. We denote the transformations defined by (21), (22), (23) as $\operatorname{qSh}_{F}$ , $\operatorname{pSh}_{G}$ and $\operatorname{St_{\Phi}}$ , which are short hands for ”q-shearing”, ”p-shearing” and ”stretching”, respectively. These becomes the basic building blocks of the ”symplectic version of real NVP” once we take $F$ , $G$ and $\Phi$ in these transformations as trainable neural networks.

Now we have introduced all the basic symplecticomorphism building blocks, and a symplectomorphism neural network (SymplectoNet, or even shorter, SpNN) is a neural network designed as an arbitrary finite composition of $\operatorname{qSh}_{F}$ , $\operatorname{pSh}_{G}$ and $\operatorname{St_{\Phi}}$ where $F$ , $G$ and $\Phi$ are arbitrary neural networks with $n$ -dimensional input and one-dimensional output.

Of course, the expressivity of this network depends on the complexity of the underlying neural networks $F$ , $G$ and $H$ , and also on the number of the building blocks we stacked. Indeed, the latter can be even more essential: e.g. if we only use less than four symplectic shearing blocks, we cannot even cover all the linear symplectomorphisms no matter how complicated the underlying network $F$ and $G$ are, because the Jacobian of a shearing transformation is of the form

\begin{pmatrix}I&\\ B&I\end{pmatrix}\quad\text{or}\quad\begin{pmatrix}I&C\\ &I\end{pmatrix},

where $B$ , $C$ are symmetric $n\times n$ matrices. The degree of freedom of these matrices are $n(n+1)/2$ , while $\dim Sp(2n)=n(2n+1)$ , which is greater than $3n(n+1)/2$ for $n>1$ . This is why I also designed the symplectic stretching layer $\operatorname{Str}_{\Phi}$ . A good practice is to include both the p, q-shearing and the symplectic stretching layers in the network for at lease once. A simplest example is a network with structure $\operatorname{pSh}_{G}\circ\operatorname{St}_{\Phi}\circ\operatorname{qSh}_{F}$ (see 3), which is similar to the structure of a real NVP.

Figure 3: The diagram expression of

\operatorname{pSh}_{G}\circ\operatorname{St}_{\Phi}\circ\operatorname{qSh}_{F}

3.2 SymplectoNet as Invertible Neural Network (INN)

One of the most important features of real NVP is that it is explicitly invertible: one can write out (or, in a more techical term, build the computation graph of) the explicit expression of the neural network function’s inverse function [5]. Our SymplectoNet is inspired by real NVP, so a natural question is whether the SymplectoNet structure is explicitly invertible like real NVP. Next, we will show that the answer is yes.

Indeed, since the inverse of a composed function $f_{1}\circ f_{2}\circ\cdots\circ f_{k}$ is $f_{k}^{-1}\circ\cdots\circ f_{2}^{-1}\circ f_{1}^{-1}$ , so we only need to prove that the basic building blocks, $\operatorname{pSh}_{G}$ , $\operatorname{qSh}_{F}$ and $\operatorname{St}_{\Phi}$ are explicitly invertible. The inverse of $\operatorname{pSh}_{G}$ , $\operatorname{qSh}_{F}$ are obvious: (21) is equivalent to

\begin{cases}q=Q&\\ p=P-\nabla F(Q),&\end{cases}

(22) is equivalent to

\begin{cases}q=Q-\nabla G(P)&\\ p=P,&\end{cases}

therefore the inverse of $\operatorname{pSh}_{G}$ , $\operatorname{qSh}_{F}$ are $\operatorname{pSh}_{-G}$ , $\operatorname{qSh}_{-F}$ , respectively. And finally we look at $\operatorname{St}_{\Phi}$ . Notice that from (23), we have

Q\odot P=\mathrm{e}^{\nabla\Phi(q\odot p)}\odot q\odot\mathrm{e}^{-\nabla\Phi(% q\odot p)}\odot p=q\odot p,

therefore

\begin{cases}q=\mathrm{e}^{-\nabla\Phi(q\odot p)}\odot Q=\mathrm{e}^{-\nabla% \Phi(Q\odot P)}\odot Q&\\ p=\mathrm{e}^{\nabla\Phi(q\odot p)}\odot P=\mathrm{e}^{\nabla\Phi(Q\odot P)}% \odot P,&\end{cases}

(24)

this shows that the inverse of $\operatorname{St}_{\Phi}$ is exactly $\operatorname{St}_{-\Phi}$ . In conclusion, we have

\begin{cases}\left(\operatorname{pSh}_{G}\right)^{-1}=\ \operatorname{pSh}_{-G% },&\\ \left(\operatorname{qSh}_{F}\right)^{-1}=\operatorname{qSh}_{-F},&\\ \left(\operatorname{St}_{\Phi}\right)^{-1}=\operatorname{St}_{-\Phi}.&\end{cases}

(25)

These results give a neat expression of inverting the SymplectoNet. E.g. the inverse of the SymplectoNet

\left(\operatorname{pSh}_{G}\circ\operatorname{St}_{\Phi}\circ\operatorname{% qSh}_{F}\right)^{-1}=\operatorname{qSh}_{-F}\circ\operatorname{St}_{-\Phi}% \circ\operatorname{pSh}_{-G}.

(26)

This shows that the inverse of SymplectoNet is explicitly available.

4 Extension to Family of Symplectomorphism

A natural extension of the symplectomorphism neural network is to include some parameters $\tau_{1},\tau_{2},\cdots,\tau_{K}$ other that the canonical variables as inputs. This is can be easily achieved by changing the $F(q),G(p),\Phi(z)$ in the basic building blocks $\operatorname{qSh}_{F}$ , $\operatorname{pSh}_{G}$ and $\operatorname{St}_{\Phi}$ into ( $n+K$ )-variable functions $F(q;\tau)$ , $G(p;\tau)$ , $\Phi(z;\tau)$ , where $\tau=(\tau_{1},\cdots,\tau_{K})$ , and modify the blocks given by (21) ~(23) into

\begin{cases}Q=q&\\ P=p+\nabla_{q}F(q,\tau),&\end{cases}

(27)

\begin{cases}Q=q+\nabla_{p}G(p,\tau)&\\ P=p,&\end{cases}

(28)

\begin{cases}Q=\mathrm{e}^{\nabla_{z}\Phi(q\odot p,\tau)}\odot q&\\ P=\mathrm{e}^{-\nabla_{z}\Phi(q\odot p,\tau)}\odot p,&\end{cases}

(29)

With this modification, the network receives ( $2n+K$ )-dimensional vectors

(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n},\tau_{1},\cdots,\tau_{K})

as inputs and the output dimension is still $2n$ , and for each fixed $\tau_{1},\cdots,\tau_{K}$ , the output vector is a symplectomorphism of the canonical part of the input vector, i.e. $(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n})$ . Thus, each choice of the parameters $\tau_{1},\cdots,\tau_{K}$ defines a symplectomorphism, or we can say that the network defines a continuous family of symplectomorphisms parameterized by $\tau_{1},\cdots,\tau_{K}$ . A particularly common situation of this is when $K=1$ and $\tau_{1}=t$ represents the time variable. In this case, the network function can represent the solution of some Hamiltonian equation, and thanks to the symplectic property, of the network, there exists a Hamiltonian function

H=H(q_{1},\cdots,q_{n},p_{1},\cdots,p_{n},t)

such that the network function represents exactly the solution of its corresponding Hamiltonian system (7). Nevertheless, it is not guaranteed that the symplectomorphism family parameterized by $t$ forms a single-parameter symplectomorphism group, i.e. the corresponding Hamiltonian $H$ has to depend explicitly on time, and we do not have method to exactly cancel this dependency.

By including more parameters (i.e. $K>1$ ), it is also possible to apply this network for optimal control problems involving Hamiltonian dynamics.

5 Some Preliminary Results

5.1 A Polar Nonlinear Map**

This example is learning a symplectic map

(q,\ p)\rightarrow\left(\sqrt{2q}\cos p,\sqrt{2q}\sin p\right)=:(Q,P)

(30)

A network with structure

\operatorname{qSh}_{F_{1}}\circ\operatorname{pSh}_{G_{1}}\circ\operatorname{St% }_{\Phi}\circ\operatorname{qSh}_{F_{2}}\circ\operatorname{pSh}_{G_{2}},

where $F_{1},G_{1},F_{2},G_{2}$ are $(2,20,10,1)$ dense neural networks, and $\Phi$ is $(2,10,1)$ dense neural network. The loss is the ordinary MSE loss. Adamax with learning rate 0.25 is applied here, and decay by factor 0.99 every 100 epoch.

Firstly, some uniformly random points for

(q,p)\in[0,1]\times[0,1]

is sampled. The training went for 40,000 epochs, and the loss dropped from $0.3$ to about $10^{-5}$ , and the plot is shown in Figure 4(a), and the loss decay is shown in Figure 4(b).

Anoter numerical experiments concerning also (30) but the domain changed to

(q,p)\in\left[\frac{1}{2},\frac{3}{2}\right]\times\left[0,\frac{3\pi}{2}\right]

is also conducted. This time, the geometry of the transformation is more complicated. Note that we cannot do $p:[0,2\pi]$ because this will make the map** (30) non-injective, while the model is invertible. Thus the model will have difficulty learning the data near the two lines $p=0$ and $p=2\pi$ . The training went for 40,000 epochs, and the loss dropped from $0.3$ to about $10^{-5}$ , and the plot is shown in Figure 4(c), and the loss decay is shown in Figure 4(d). The majority of the error comes from $p=3\pi/2$ boundary. This is because the points her are close to the points with $p=0$ .

References

[1] C. Bishop and H. Bishop, Deep Learning: Foundations and Concepts, Springer International Publishing, 2023.
[2] A. da Silva, Lectures on Symplectic Geometry, Lecture Notes in Mathematics, Springer Berlin Heidelberg, 2004.
[3] L. Dinh, J. N. Sohl-Dickstein, and S. Bengio, Density estimation using real nvp, ArXiv, abs/1605.08803 (2016).
[4] H. Goldstein, Classical Mechanics, Addison-Wesley series in physics, Addison-Wesley Publishing Company, 1980.
[5] I. Ishikawa, T. Teshima, K. Tojo, K. Oono, M. Ikeda, and M. Sugiyama, Universal approximation property of invertible neural networks, Journal of Machine Learning Research, 24 (2023), pp. 1–68.