-
Kandinsky 3.0 Technical Report
Authors:
Vladimir Arkhipkin,
Andrei Filatov,
Viacheslav Vasilev,
Anastasia Maltseva,
Said Azizov,
Igor Pavlov,
Julia Agafonova,
Andrey Kuznetsov,
Denis Dimitrov
Abstract:
We present Kandinsky 3.0, a large-scale text-to-image generation model based on latent diffusion, continuing the series of text-to-image Kandinsky models and reflecting our progress to achieve higher quality and realism of image generation. In this report we describe the architecture of the model, the data collection procedure, the training technique, and the production system for user interaction…
▽ More
We present Kandinsky 3.0, a large-scale text-to-image generation model based on latent diffusion, continuing the series of text-to-image Kandinsky models and reflecting our progress to achieve higher quality and realism of image generation. In this report we describe the architecture of the model, the data collection procedure, the training technique, and the production system for user interaction. We focus on the key components that, as we have identified as a result of a large number of experiments, had the most significant impact on improving the quality of our model compared to the others. We also describe extensions and applications of our model, including super resolution, inpainting, image editing, image-to-video generation, and a distilled version of Kandinsky 3.0 - Kandinsky 3.1, which does inference in 4 steps of the reverse process and 20 times faster without visual quality decrease. By side-by-side human preferences comparison, Kandinsky becomes better in text understanding and works better on specific domains. The code is available at https://github.com/ai-forever/Kandinsky-3
△ Less
Submitted 28 June, 2024; v1 submitted 6 December, 2023;
originally announced December 2023.
-
Task Discovery: Finding the Tasks that Neural Networks Generalize on
Authors:
Andrei Atanov,
Andrei Filatov,
Teresa Yeo,
Ajay Sohmshetty,
Amir Zamir
Abstract:
When develo** deep learning models, we usually decide what task we want to solve then search for a model that generalizes well on the task. An intriguing question would be: what if, instead of fixing the task and searching in the model space, we fix the model and search in the task space? Can we find tasks that the model generalizes on? How do they look, or do they indicate anything? These are t…
▽ More
When develo** deep learning models, we usually decide what task we want to solve then search for a model that generalizes well on the task. An intriguing question would be: what if, instead of fixing the task and searching in the model space, we fix the model and search in the task space? Can we find tasks that the model generalizes on? How do they look, or do they indicate anything? These are the questions we address in this paper.
We propose a task discovery framework that automatically finds examples of such tasks via optimizing a generalization-based quantity called agreement score. We demonstrate that one set of images can give rise to many tasks on which neural networks generalize well. These tasks are a reflection of the inductive biases of the learning framework and the statistical patterns present in the data, thus they can make a useful tool for analysing the neural networks and their biases. As an example, we show that the discovered tasks can be used to automatically create adversarial train-test splits which make a model fail at test time, without changing the pixels or labels, but by only selecting how the datapoints should be split between the train and test sets. We end with a discussion on human-interpretability of the discovered tasks.
△ Less
Submitted 30 November, 2022;
originally announced December 2022.
-
Simple Control Baselines for Evaluating Transfer Learning
Authors:
Andrei Atanov,
Shijian Xu,
Onur Beker,
Andrei Filatov,
Amir Zamir
Abstract:
Transfer learning has witnessed remarkable progress in recent years, for example, with the introduction of augmentation-based contrastive self-supervised learning methods. While a number of large-scale empirical studies on the transfer performance of such models have been conducted, there is not yet an agreed-upon set of control baselines, evaluation practices, and metrics to report, which often h…
▽ More
Transfer learning has witnessed remarkable progress in recent years, for example, with the introduction of augmentation-based contrastive self-supervised learning methods. While a number of large-scale empirical studies on the transfer performance of such models have been conducted, there is not yet an agreed-upon set of control baselines, evaluation practices, and metrics to report, which often hinders a nuanced and calibrated understanding of the real efficacy of the methods. We share an evaluation standard that aims to quantify and communicate transfer learning performance in an informative and accessible setup. This is done by baking a number of simple yet critical control baselines in the evaluation method, particularly the blind-guess (quantifying the dataset bias), scratch-model (quantifying the architectural contribution), and maximal-supervision (quantifying the upper-bound). To demonstrate how the evaluation standard can be employed, we provide an example empirical study investigating a few basic questions about self-supervised learning. For example, using this standard, the study shows the effectiveness of existing self-supervised pre-training methods is skewed towards image classification tasks versus dense pixel-wise predictions. In general, we encourage using/reporting the suggested control baselines in evaluating transfer learning in order to gain a more meaningful and informative understanding.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Fast Line Search for Multi-Task Learning
Authors:
Andrey Filatov,
Daniil Merkulov
Abstract:
Multi-task learning is a powerful method for solving several tasks jointly by learning robust representation. Optimization of the multi-task learning model is a more complex task than a single-task due to task conflict. Based on theoretical results, convergence to the optimal point is guaranteed when step size is chosen through line search. But, usually, line search for the step size is not the be…
▽ More
Multi-task learning is a powerful method for solving several tasks jointly by learning robust representation. Optimization of the multi-task learning model is a more complex task than a single-task due to task conflict. Based on theoretical results, convergence to the optimal point is guaranteed when step size is chosen through line search. But, usually, line search for the step size is not the best choice due to the large computational time overhead. We propose a novel idea for line search algorithms in multi-task learning. The idea is to use latent representation space instead of parameter space for finding step size. We examined this idea with backtracking line search. We compare this fast backtracking algorithm with classical backtracking and gradient methods with a constant learning rate on MNIST, CIFAR-10, Cityscapes tasks. The systematic empirical study showed that the proposed method leads to more accurate and fast solution, than the traditional backtracking approach and keep competitive computational time and performance compared to the constant learning rate method.
△ Less
Submitted 2 October, 2021;
originally announced October 2021.
-
Correlation Filter of 2D Laser Scans For Indoor Environment
Authors:
Kirill Krinkin,
Anton Filatov
Abstract:
Modern laser SLAM (simultaneous localization and map**) and structure from motion algorithms face the problem of processing redundant data. Even if a sensor does not move, it still continues to capture scans that should be processed. This paper presents the novel filter that allows drop** 2D scans that bring no new information to the system. Experiments on MIT and TUM datasets show that it is…
▽ More
Modern laser SLAM (simultaneous localization and map**) and structure from motion algorithms face the problem of processing redundant data. Even if a sensor does not move, it still continues to capture scans that should be processed. This paper presents the novel filter that allows drop** 2D scans that bring no new information to the system. Experiments on MIT and TUM datasets show that it is possible to drop more than half of the scans. Moreover thepaper describes the formulas that enable filter adaptation to a particular robot with known speed and characteristics of lidar. In addition, the indoor corridor detector is introduced that also can be applied to any specific shape of a corridor and sensor.
△ Less
Submitted 27 May, 2021;
originally announced May 2021.
-
Any Motion Detector: Learning Class-agnostic Scene Dynamics from a Sequence of LiDAR Point Clouds
Authors:
Artem Filatov,
Andrey Rykov,
Viacheslav Murashkin
Abstract:
Object detection and motion parameters estimation are crucial tasks for self-driving vehicle safe navigation in a complex urban environment. In this work we propose a novel real-time approach of temporal context aggregation for motion detection and motion parameters estimation based on 3D point cloud sequence. We introduce an ego-motion compensation layer to achieve real-time inference with perfor…
▽ More
Object detection and motion parameters estimation are crucial tasks for self-driving vehicle safe navigation in a complex urban environment. In this work we propose a novel real-time approach of temporal context aggregation for motion detection and motion parameters estimation based on 3D point cloud sequence. We introduce an ego-motion compensation layer to achieve real-time inference with performance comparable to a naive odometric transform of the original point cloud sequence. Not only is the proposed architecture capable of estimating the motion of common road participants like vehicles or pedestrians but also generalizes to other object categories which are not present in training data. We also conduct an in-deep analysis of different temporal context aggregation strategies such as recurrent cells and 3D convolutions. Finally, we provide comparison results of our state-of-the-art model with existing solutions on KITTI Scene Flow dataset.
△ Less
Submitted 24 April, 2020;
originally announced April 2020.
-
2D SLAM Quality Evaluation Methods
Authors:
Anton Filatov,
Artyom Filatov,
Kirill Krinkin,
Baian Chen,
Diana Molodan
Abstract:
SLAM (Simultaneous Localization and map**) is one of the most challenging problems for mobile platforms and there is a huge amount of modern SLAM algorithms. The choice of the algorithm that might be used in every particular problem requires prior knowledge about advantages and disadvantages of each algorithm. This paper presents the approach for comparison of SLAM algorithms that allows to find…
▽ More
SLAM (Simultaneous Localization and map**) is one of the most challenging problems for mobile platforms and there is a huge amount of modern SLAM algorithms. The choice of the algorithm that might be used in every particular problem requires prior knowledge about advantages and disadvantages of each algorithm. This paper presents the approach for comparison of SLAM algorithms that allows to find the most accurate one. The accent of research is made on 2D SLAM algorithms and the focus of analysis is 2D map that is built after algorithm performance. Three metrics for evaluation of maps are presented in this paper
△ Less
Submitted 7 August, 2017;
originally announced August 2017.