Robust DeepFake Video Detection via Sparse Feature Entanglement Learning and Graph Laplacian Regularization thanks: This study was supported in part by the Ministry of Science and Technology (MOST), Taiwan, under grants MOST XXX; and partly by the Higher Education Sprout Project of Ministry of Education (MOE) to the Headquarters of University Advancement at National Cheng Kung University (NCKU). thanks: (Corresponding author: Chih-Chung Hsu.) thanks: C.-C. Hsu, S.-N. Chen, M.-H. Wu, Y.-F. Wang, C.-M. Lee and Y.-S. Chou are with Institute of Data Science and Department of Statistics, National Cheng Kung University, Tainan, Taiwan (R.O.C.), (e-mail:[email protected], [email protected]) Y.-S. Du is with Department of Computer Science, National Cheng Kung University, Tainan, Taiwan (R.O.C.), (e-mail: xx.)

Chih-Chung Hsu, , Shao-Ning Chen, Mei-Hsuan Wu,
Yi-Fang Wang, Chia-Ming Lee, Yi-Shiuan Chou
Abstract

Index Terms—

I some contents

Some definitions:

  • 𝐙N×D𝐙superscript𝑁𝐷\mathbf{Z}\in\mathbb{R}^{N\times D}bold_Z ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_D end_POSTSUPERSCRIPT is the entangled feature matrix, where N𝑁Nitalic_N is number of nodes (say, number of frame features),D𝐷Ditalic_D stands for feature dimension。

  • 𝐀N×N𝐀superscript𝑁𝑁\mathbf{A}\in\mathbb{R}^{N\times N}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT denotes adjacent matrix of the graph, where 𝐀ijsubscript𝐀𝑖𝑗\mathbf{A}_{ij}bold_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is i𝑖iitalic_i and j𝑗jitalic_j nodes’ weights of edge.

  • 𝐃N×N𝐃superscript𝑁𝑁\mathbf{D}\in\mathbb{R}^{N\times N}bold_D ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT is the degree matrix, where 𝐃ii=j=1N𝐀ij𝐃𝑖𝑖𝑗superscript1𝑁subscript𝐀𝑖𝑗\mathbf{D}{ii}=\sum{j=1}^{N}\mathbf{A}_{ij}bold_D italic_i italic_i = ∑ italic_j = 1 start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT bold_A start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT

  • 𝐋=𝐃𝐀𝐋𝐃𝐀\mathbf{L}=\mathbf{D}-\mathbf{A}bold_L = bold_D - bold_A represents Graph Laplacian matrix.

  • 𝐗(0)=𝐙superscript𝐗0𝐙\mathbf{X}^{(0)}=\mathbf{Z}bold_X start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = bold_Z is the feature vector (matrix) of the initial node.

So the GCN propagation is

𝐗(l+1)=σ(𝐃^12𝐀^𝐃^12𝐗(l)𝐖(l))superscript𝐗𝑙1𝜎superscript^𝐃12^𝐀superscript^𝐃12superscript𝐗𝑙superscript𝐖𝑙\mathbf{X}^{(l+1)}=\sigma(\hat{\mathbf{D}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{D}}^{-\frac{1}{2}}\mathbf{X}^{(l)}\mathbf{W}^{(l)})bold_X start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_σ ( over^ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_A end_ARG over^ start_ARG bold_D end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT )

where 𝐀^=𝐀+𝐈N^𝐀𝐀𝐈𝑁\hat{\mathbf{A}}=\mathbf{A}+\mathbf{I}Nover^ start_ARG bold_A end_ARG = bold_A + bold_I italic_N is the loop-aware adjacent matrix ,𝐃^ii=j=1N𝐀^ij^𝐃𝑖𝑖superscriptsubscript𝑗1𝑁subscript^𝐀𝑖𝑗\hat{\mathbf{D}}{ii}=\sum_{j=1}^{N}\hat{\mathbf{A}}_{ij}over^ start_ARG bold_D end_ARG italic_i italic_i = ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT over^ start_ARG bold_A end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is corresponding degree matrix, 𝐖(l)superscript𝐖𝑙\mathbf{W}^{(l)}bold_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT is l𝑙litalic_l-th layer’s weights matrix, σ𝜎\sigmaitalic_σ is activation function. To involve the graph Laplacian into the GCN, we have

𝐗(l+1)=σ(𝐋^12𝐀^𝐋^12𝐗(l)𝐖(l))superscript𝐗𝑙1𝜎superscript^𝐋12^𝐀superscript^𝐋12superscript𝐗𝑙superscript𝐖𝑙\mathbf{X}^{(l+1)}=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(l)}\mathbf{W}^{(l)})bold_X start_POSTSUPERSCRIPT ( italic_l + 1 ) end_POSTSUPERSCRIPT = italic_σ ( over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_A end_ARG over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT )

where𝐋^=𝐃^𝐀^^𝐋^𝐃^𝐀\hat{\mathbf{L}}=\hat{\mathbf{D}}-\hat{\mathbf{A}}over^ start_ARG bold_L end_ARG = over^ start_ARG bold_D end_ARG - over^ start_ARG bold_A end_ARG is loop-aware Graph Laplacian matrix. Now, the propagation in L𝐿Litalic_L-th layer in GCN could be rewritten as follows:

𝐗(L)=σ(𝐋^12𝐀^𝐋^12σ(σ(𝐋^12𝐀^𝐋^12𝐗(0)𝐖(0)))𝐖(L1))superscript𝐗𝐿𝜎superscript^𝐋12^𝐀superscript^𝐋12𝜎𝜎superscript^𝐋12^𝐀superscript^𝐋12superscript𝐗0superscript𝐖0superscript𝐖𝐿1\mathbf{X}^{(L)}=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\cdot\sigma(\ldots\sigma(\hat{\mathbf{L}}^{-\frac{1}% {2}}\hat{\mathbf{A}}\hat{\mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(0)}\mathbf{W}^% {(0)})\ldots)\mathbf{W}^{(L-1)})bold_X start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT = italic_σ ( over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_A end_ARG over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT ⋅ italic_σ ( … italic_σ ( over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_A end_ARG over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ) … ) bold_W start_POSTSUPERSCRIPT ( italic_L - 1 ) end_POSTSUPERSCRIPT )

[Another form]

𝐗(0)superscript𝐗0\displaystyle\mathbf{X}^{(0)}bold_X start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT =𝐙absent𝐙\displaystyle=\mathbf{Z}= bold_Z
𝐗(1)superscript𝐗1\displaystyle\mathbf{X}^{(1)}bold_X start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT =σ(𝐋^12𝐀^𝐋^12𝐗(0)𝐖(0))absent𝜎superscript^𝐋12^𝐀superscript^𝐋12superscript𝐗0superscript𝐖0\displaystyle=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(0)}\mathbf{W}^{(0)})= italic_σ ( over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_A end_ARG over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT )
𝐗(2)superscript𝐗2\displaystyle\mathbf{X}^{(2)}bold_X start_POSTSUPERSCRIPT ( 2 ) end_POSTSUPERSCRIPT =σ(𝐋^12𝐀^𝐋^12𝐗(1)𝐖(1))absent𝜎superscript^𝐋12^𝐀superscript^𝐋12superscript𝐗1superscript𝐖1\displaystyle=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(1)}\mathbf{W}^{(1)})= italic_σ ( over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_A end_ARG over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT )
\displaystyle\vdots
𝐗(l)superscript𝐗𝑙\displaystyle\mathbf{X}^{(l)}bold_X start_POSTSUPERSCRIPT ( italic_l ) end_POSTSUPERSCRIPT =σ(𝐋^12𝐀^𝐋^12𝐗(l1)𝐖(l1))absent𝜎superscript^𝐋12^𝐀superscript^𝐋12superscript𝐗𝑙1superscript𝐖𝑙1\displaystyle=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(l-1)}\mathbf{W}^{(l-1)})= italic_σ ( over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_A end_ARG over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT ( italic_l - 1 ) end_POSTSUPERSCRIPT )
\displaystyle\vdots
𝐗(L)superscript𝐗𝐿\displaystyle\mathbf{X}^{(L)}bold_X start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT =σ(𝐋^12𝐀^𝐋^12𝐗(L1)𝐖(L1))absent𝜎superscript^𝐋12^𝐀superscript^𝐋12superscript𝐗𝐿1superscript𝐖𝐿1\displaystyle=\sigma(\hat{\mathbf{L}}^{-\frac{1}{2}}\hat{\mathbf{A}}\hat{% \mathbf{L}}^{-\frac{1}{2}}\mathbf{X}^{(L-1)}\mathbf{W}^{(L-1)})= italic_σ ( over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT over^ start_ARG bold_A end_ARG over^ start_ARG bold_L end_ARG start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_X start_POSTSUPERSCRIPT ( italic_L - 1 ) end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT ( italic_L - 1 ) end_POSTSUPERSCRIPT )

Finally, the output feature could be obtained by a fully connected layer 𝐗(L)superscript𝐗𝐿\mathbf{X}^{(L)}bold_X start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT as follows:

𝐘=softmax(𝐗(L)𝐖(L))𝐘softmaxsuperscript𝐗𝐿superscript𝐖𝐿\mathbf{Y}=\text{softmax}(\mathbf{X}^{(L)}\mathbf{W}^{(L)})bold_Y = softmax ( bold_X start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT bold_W start_POSTSUPERSCRIPT ( italic_L ) end_POSTSUPERSCRIPT )

, where 𝐘N×C𝐘superscript𝑁𝐶\mathbf{Y}\in\mathbb{R}^{N\times C}bold_Y ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_C end_POSTSUPERSCRIPT is predicted results,C𝐶Citalic_C is number of classes, softmax denotes the softmax activation. Cross-entropy loss is used in the training phase, as follows:

=i=1Nc=1C𝐘iclog(𝐘ic)superscriptsubscript𝑖1𝑁superscriptsubscript𝑐1𝐶𝐘𝑖superscript𝑐𝐘𝑖𝑐\mathcal{L}=-\sum_{i=1}^{N}\sum_{c=1}^{C}\mathbf{Y}{ic}^{*}\log(\mathbf{Y}{ic})caligraphic_L = - ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT ∑ start_POSTSUBSCRIPT italic_c = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C end_POSTSUPERSCRIPT bold_Y italic_i italic_c start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT roman_log ( bold_Y italic_i italic_c )

where 𝐘superscript𝐘\mathbf{Y}^{*}bold_Y start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT is one-hot label.