Search | arXiv e-print repository

MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization

Authors: Alexander Kunitsyn, Maksim Kalashnikov, Maksim Dzabraev, Andrei Ivaniuta

Abstract: In this work we present a new State-of-The-Art on the text-to-video retrieval task on MSR-VTT, LSMDC, MSVD, YouCook2 and TGIF obtained by a single model. Three different data sources are combined: weakly-supervised videos, crowd-labeled text-image pairs and text-video pairs. A careful analysis of available pre-trained networks helps to choose the best prior-knowledge ones. We introduce three-stage… ▽ More In this work we present a new State-of-The-Art on the text-to-video retrieval task on MSR-VTT, LSMDC, MSVD, YouCook2 and TGIF obtained by a single model. Three different data sources are combined: weakly-supervised videos, crowd-labeled text-image pairs and text-video pairs. A careful analysis of available pre-trained networks helps to choose the best prior-knowledge ones. We introduce three-stage training procedure that provides high transfer knowledge efficiency and allows to use noisy datasets during training without prior knowledge degradation. Additionally, double positional encoding is used for better fusion of different modalities and a simple method for non-square inputs processing is suggested. △ Less

Submitted 14 March, 2022; originally announced March 2022.

arXiv:2103.10699 [pdf, other]

doi 10.1109/CVPRW53098.2021.00374

MDMMT: Multidomain Multimodal Transformer for Video Retrieval

Authors: Maksim Dzabraev, Maksim Kalashnikov, Stepan Komkov, Aleksandr Petiushko

Abstract: We present a new state-of-the-art on the text to video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin. Moreover, state-of-the-art results are achieved with a single model on two datasets without finetuning. This multidomain generalisation is achieved by a proper combination of different video caption datasets. We show that trainin… ▽ More We present a new state-of-the-art on the text to video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin. Moreover, state-of-the-art results are achieved with a single model on two datasets without finetuning. This multidomain generalisation is achieved by a proper combination of different video caption datasets. We show that training on different datasets can improve test results of each other. Additionally we check intersection between many popular datasets and found that MSRVTT has a significant overlap between the test and the train parts, and the same situation is observed for ActivityNet. △ Less

Submitted 19 March, 2021; originally announced March 2021.

Journal ref: CVPR Workshops 2021: 3354-3363

arXiv:0801.4003 [pdf, ps, other]

doi 10.1063/1.2890057

Transient electric fields in laser plasmas observed by proton streak deflectometry

Authors: T. Sokollik, M. Schnuerer, S. Ter-Avetisyan, P. V. Nickles, E. Risse, M. Kalashnikov, W. Sandner, G. Priebe, M. Amin, T. Toncian, O. Willi, A. A. Andreev

Abstract: A novel proton imaging technique was applied which allows a continuous temporal record of electric fields within a time window of several nanoseconds. This "proton streak deflectometry" was used to investigate transient electric fields of intense (~ 10^17 W/cm^2) laser irradiated foils. We found out that these fields with an absolute peak of up to 10^8 V/m extend over millimeter lateral extensio… ▽ More A novel proton imaging technique was applied which allows a continuous temporal record of electric fields within a time window of several nanoseconds. This "proton streak deflectometry" was used to investigate transient electric fields of intense (~ 10^17 W/cm^2) laser irradiated foils. We found out that these fields with an absolute peak of up to 10^8 V/m extend over millimeter lateral extension and decay at nanosecond duration. Hence, they last much longer than the (~ ps) laser excitation, and extend much beyond the laser irradiation focus. △ Less

Submitted 25 January, 2008; originally announced January 2008.

Showing 1–3 of 3 results for author: Kalashnikov, M