Robust DeepFake Video Detection via Sparse Feature Entanglement Learning and Graph Laplacian Regularization ^†^†thanks: This study was supported in part by the Ministry of Science and Technology (MOST), Taiwan, under grants MOST XXX; and partly by the Higher Education Sprout Project of Ministry of Education (MOE) to the Headquarters of University Advancement at National Cheng Kung University (NCKU). ^†^†thanks: (Corresponding author: Chih-Chung Hsu.) ^†^†thanks: C.-C. Hsu, S.-N. Chen, M.-H. Wu, Y.-F. Wang, C.-M. Lee and Y.-S. Chou are with Institute of Data Science and Department of Statistics, National Cheng Kung University, Tainan, Taiwan (R.O.C.), (e-mail:[email protected], [email protected]) Y.-S. Du is with Department of Computer Science, National Cheng Kung University, Tainan, Taiwan (R.O.C.), (e-mail: xx.)

Chih-Chung Hsu, , Shao-Ning Chen, Mei-Hsuan Wu,
Yi-Fang Wang, Chia-Ming Lee, Yi-Shiuan Chou

Abstract

Index Terms—

I some contents

Some definitions:

•

$\mathbf{Z}\in\mathbb{R}^{N\times D}$ is the entangled feature matrix, where $N$ is number of nodes (say, number of frame features), $D$ stands for feature dimension。
•

$\mathbf{A}\in\mathbb{R}^{N\times N}$ denotes adjacent matrix of the graph, where $\mathbf{A}_{ij}$ is $i$ and $j$ nodes’ weights of edge.
•

$\mathbf{D}\in\mathbb{R}^{N\times N}$ is the degree matrix, where $\mathbf{D}{ii}=\sum{j=1}^{N}\mathbf{A}_{ij}$ 。
•

$\mathbf{L}=\mathbf{D}-\mathbf{A}$ represents Graph Laplacian matrix.
•

$\mathbf{X}^{(0)}=\mathbf{Z}$ is the feature vector (matrix) of the initial node.

So the GCN propagation is

\mathbf{X}^{(l+1)}=\sigma(\hat{\mathbf{D}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{D}}^{-\frac{1}{2}}\mathbf{X}^{(l)}\mathbf{W}^{(l)})

where $\hat{\mathbf{A}}=\mathbf{A}+\mathbf{I}N$ is the loop-aware adjacent matrix , $\hat{\mathbf{D}}{ii}=\sum_{j=1}^{N}\hat{\mathbf{A}}_{ij}$ is corresponding degree matrix, $\mathbf{W}^{(l)}$ is $l$ -th layer’s weights matrix, $\sigma$ is activation function. To involve the graph Laplacian into the GCN, we have

\mathbf{X}^{(l+1)}=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(l)}\mathbf{W}^{(l)})

where $\hat{\mathbf{L}}=\hat{\mathbf{D}}-\hat{\mathbf{A}}$ is loop-aware Graph Laplacian matrix. Now, the propagation in $L$ -th layer in GCN could be rewritten as follows:

\mathbf{X}^{(L)}=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\cdot\sigma(\ldots\sigma(\hat{\mathbf{L}}^{-\frac{1}% {2}}\hat{\mathbf{A}}\hat{\mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(0)}\mathbf{W}^% {(0)})\ldots)\mathbf{W}^{(L-1)})

[Another form]

	$\displaystyle\mathbf{X}^{(0)}$	$\displaystyle=\mathbf{Z}$
	$\displaystyle\mathbf{X}^{(1)}$	$\displaystyle=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(0)}\mathbf{W}^{(0)})$
	$\displaystyle\mathbf{X}^{(2)}$	$\displaystyle=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(1)}\mathbf{W}^{(1)})$
		$\displaystyle\vdots$
	$\displaystyle\mathbf{X}^{(l)}$	$\displaystyle=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(l-1)}\mathbf{W}^{(l-1)})$
		$\displaystyle\vdots$
	$\displaystyle\mathbf{X}^{(L)}$	$\displaystyle=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(L-1)}\mathbf{W}^{(L-1)})$

Finally, the output feature could be obtained by a fully connected layer $\mathbf{X}^{(L)}$ as follows:

\mathbf{Y}=\text{softmax}(\mathbf{X}^{(L)}\mathbf{W}^{(L)})

, where $\mathbf{Y}\in\mathbb{R}^{N\times C}$ is predicted results, $C$ is number of classes, softmax denotes the softmax activation. Cross-entropy loss is used in the training phase, as follows:

\mathcal{L}=-\sum_{i=1}^{N}\sum_{c=1}^{C}\mathbf{Y}{ic}^{*}\log(\mathbf{Y}{ic})

where $\mathbf{Y}^{*}$ is one-hot label.