VDSM: Unsupervised Video Disentanglement with State-Space Modeling and Deep Mixtures of Experts

Vowels, Matthew J.; Camgoz, Necati Cihan; Bowden, Richard

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.07292 (cs)

[Submitted on 12 Mar 2021 (v1), last revised 15 Dec 2021 (this version, v3)]

Title:VDSM: Unsupervised Video Disentanglement with State-Space Modeling and Deep Mixtures of Experts

Authors:Matthew J. Vowels, Necati Cihan Camgoz, Richard Bowden

View PDF

Abstract:Disentangled representations support a range of downstream tasks including causal reasoning, generative modeling, and fair machine learning. Unfortunately, disentanglement has been shown to be impossible without the incorporation of supervision or inductive bias. Given that supervision is often expensive or infeasible to acquire, we choose to incorporate structural inductive bias and present an unsupervised, deep State-Space-Model for Video Disentanglement (VDSM). The model disentangles latent time-varying and dynamic factors via the incorporation of hierarchical structure with a dynamic prior and a Mixture of Experts decoder. VDSM learns separate disentangled representations for the identity of the object or person in the video, and for the action being performed. We evaluate VDSM across a range of qualitative and quantitative tasks including identity and dynamics transfer, sequence generation, Fréchet Inception Distance, and factor classification. VDSM provides state-of-the-art performance and exceeds adversarial methods, even when the methods use additional supervision.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2103.07292 [cs.CV]
	(or arXiv:2103.07292v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.07292

Submission history

From: Matthew Vowels [view email]
[v1] Fri, 12 Mar 2021 14:05:35 UTC (41,492 KB)
[v2] Sun, 28 Mar 2021 18:55:40 UTC (28,409 KB)
[v3] Wed, 15 Dec 2021 09:25:17 UTC (28,409 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VDSM: Unsupervised Video Disentanglement with State-Space Modeling and Deep Mixtures of Experts

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VDSM: Unsupervised Video Disentanglement with State-Space Modeling and Deep Mixtures of Experts

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators