-
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Authors:
Alexander Kunitsyn,
Maksim Kalashnikov,
Maksim Dzabraev,
Andrei Ivaniuta
Abstract:
In this work we present a new State-of-The-Art on the text-to-video retrieval task on MSR-VTT, LSMDC, MSVD, YouCook2 and TGIF obtained by a single model. Three different data sources are combined: weakly-supervised videos, crowd-labeled text-image pairs and text-video pairs. A careful analysis of available pre-trained networks helps to choose the best prior-knowledge ones. We introduce three-stage…
▽ More
In this work we present a new State-of-The-Art on the text-to-video retrieval task on MSR-VTT, LSMDC, MSVD, YouCook2 and TGIF obtained by a single model. Three different data sources are combined: weakly-supervised videos, crowd-labeled text-image pairs and text-video pairs. A careful analysis of available pre-trained networks helps to choose the best prior-knowledge ones. We introduce three-stage training procedure that provides high transfer knowledge efficiency and allows to use noisy datasets during training without prior knowledge degradation. Additionally, double positional encoding is used for better fusion of different modalities and a simple method for non-square inputs processing is suggested.
△ Less
Submitted 14 March, 2022;
originally announced March 2022.
-
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Authors:
Maksim Dzabraev,
Maksim Kalashnikov,
Stepan Komkov,
Aleksandr Petiushko
Abstract:
We present a new state-of-the-art on the text to video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin. Moreover, state-of-the-art results are achieved with a single model on two datasets without finetuning. This multidomain generalisation is achieved by a proper combination of different video caption datasets. We show that trainin…
▽ More
We present a new state-of-the-art on the text to video retrieval task on MSRVTT and LSMDC benchmarks where our model outperforms all previous solutions by a large margin. Moreover, state-of-the-art results are achieved with a single model on two datasets without finetuning. This multidomain generalisation is achieved by a proper combination of different video caption datasets. We show that training on different datasets can improve test results of each other. Additionally we check intersection between many popular datasets and found that MSRVTT has a significant overlap between the test and the train parts, and the same situation is observed for ActivityNet.
△ Less
Submitted 19 March, 2021;
originally announced March 2021.
-
Transient electric fields in laser plasmas observed by proton streak deflectometry
Authors:
T. Sokollik,
M. Schnuerer,
S. Ter-Avetisyan,
P. V. Nickles,
E. Risse,
M. Kalashnikov,
W. Sandner,
G. Priebe,
M. Amin,
T. Toncian,
O. Willi,
A. A. Andreev
Abstract:
A novel proton imaging technique was applied which allows a continuous temporal record of electric fields within a time window of several nanoseconds. This "proton streak deflectometry" was used to investigate transient electric fields of intense (~ 10^17 W/cm^2) laser irradiated foils. We found out that these fields with an absolute peak of up to 10^8 V/m extend over millimeter lateral extensio…
▽ More
A novel proton imaging technique was applied which allows a continuous temporal record of electric fields within a time window of several nanoseconds. This "proton streak deflectometry" was used to investigate transient electric fields of intense (~ 10^17 W/cm^2) laser irradiated foils. We found out that these fields with an absolute peak of up to 10^8 V/m extend over millimeter lateral extension and decay at nanosecond duration. Hence, they last much longer than the (~ ps) laser excitation, and extend much beyond the laser irradiation focus.
△ Less
Submitted 25 January, 2008;
originally announced January 2008.