Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings

Wray, Michael; Larlus, Diane; Csurka, Gabriela; Damen, Dima

Computer Science > Computer Vision and Pattern Recognition

arXiv:1908.03477 (cs)

[Submitted on 9 Aug 2019]

Title:Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings

Authors:Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen

View PDF

Abstract:We address the problem of cross-modal fine-grained action retrieval between text and video. Cross-modal retrieval is commonly achieved through learning a shared embedding space, that can indifferently embed modalities. In this paper, we propose to enrich the embedding by disentangling parts-of-speech (PoS) in the accompanying captions. We build a separate multi-modal embedding space for each PoS tag. The outputs of multiple PoS embeddings are then used as input to an integrated multi-modal space, where we perform action retrieval. All embeddings are trained jointly through a combination of PoS-aware and PoS-agnostic losses. Our proposal enables learning specialised embedding spaces that offer multiple views of the same embedded entities.
We report the first retrieval results on fine-grained actions for the large-scale EPIC dataset, in a generalised zero-shot setting. Results show the advantage of our approach for both video-to-text and text-to-video action retrieval. We also demonstrate the benefit of disentangling the PoS for the generic task of cross-modal video retrieval on the MSR-VTT dataset.

Comments:	Accepted for presentation at ICCV. Project Page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1908.03477 [cs.CV]
	(or arXiv:1908.03477v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1908.03477

Submission history

From: Michael Wray [view email]
[v1] Fri, 9 Aug 2019 14:41:06 UTC (3,407 KB)

Full-text links:

Access Paper:

view license

Current browse context:

< prev | next >

new | recent | 2019-08

Change to browse by:

cs.CV

References & Citations

DBLP - CS Bibliography

listing | bibtex

Michael Wray
Diane Larlus
Gabriela Csurka
Dima Damen

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Fine-Grained Action Retrieval Through Multiple Parts-of-Speech Embeddings

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators