Search | arXiv e-print repository

Kandinsky 3.0 Technical Report

Authors: Vladimir Arkhipkin, Andrei Filatov, Viacheslav Vasilev, Anastasia Maltseva, Said Azizov, Igor Pavlov, Julia Agafonova, Andrey Kuznetsov, Denis Dimitrov

Abstract: We present Kandinsky 3.0, a large-scale text-to-image generation model based on latent diffusion, continuing the series of text-to-image Kandinsky models and reflecting our progress to achieve higher quality and realism of image generation. In this report we describe the architecture of the model, the data collection procedure, the training technique, and the production system for user interaction… ▽ More We present Kandinsky 3.0, a large-scale text-to-image generation model based on latent diffusion, continuing the series of text-to-image Kandinsky models and reflecting our progress to achieve higher quality and realism of image generation. In this report we describe the architecture of the model, the data collection procedure, the training technique, and the production system for user interaction. We focus on the key components that, as we have identified as a result of a large number of experiments, had the most significant impact on improving the quality of our model compared to the others. We also describe extensions and applications of our model, including super resolution, inpainting, image editing, image-to-video generation, and a distilled version of Kandinsky 3.0 - Kandinsky 3.1, which does inference in 4 steps of the reverse process and 20 times faster without visual quality decrease. By side-by-side human preferences comparison, Kandinsky becomes better in text understanding and works better on specific domains. The code is available at https://github.com/ai-forever/Kandinsky-3 △ Less

Submitted 28 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Project page: https://ai-forever.github.io/Kandinsky-3

arXiv:2310.03502 [pdf, other]

Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Authors: Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, Andrey Kuznetsov, Denis Dimitrov

Abstract: Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhancements. These models are generally split into two categories: pixel-level and latent-level approaches. We present Kandinsky1, a novel explo… ▽ More Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhancements. These models are generally split into two categories: pixel-level and latent-level approaches. We present Kandinsky1, a novel exploration of latent diffusion architecture, combining the principles of the image prior models with latent diffusion techniques. The image prior model is trained separately to map text embeddings to image embeddings of CLIP. Another distinct feature of the proposed model is the modified MoVQ implementation, which serves as the image autoencoder component. Overall, the designed model contains 3.3B parameters. We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting. Additionally, we released the source code and checkpoints for the Kandinsky models. Experimental evaluations demonstrate a FID score of 8.03 on the COCO-30K dataset, marking our model as the top open-source performer in terms of measurable image generation quality. △ Less

Submitted 5 October, 2023; originally announced October 2023.

arXiv:2207.02811 [pdf, other]

doi 10.1109/LRA.2021.3062350

Multi-View Object Pose Refinement With Differentiable Renderer

Authors: Ivan Shugurov, Ivan Pavlov, Sergey Zakharov, Slobodan Ilic

Abstract: This paper introduces a novel multi-view 6 DoF object pose refinement approach focusing on improving methods trained on synthetic data. It is based on the DPOD detector, which produces dense 2D-3D correspondences between the model vertices and the image pixels in each frame. We have opted for the use of multiple frames with known relative camera transformations, as it allows introduction of geomet… ▽ More This paper introduces a novel multi-view 6 DoF object pose refinement approach focusing on improving methods trained on synthetic data. It is based on the DPOD detector, which produces dense 2D-3D correspondences between the model vertices and the image pixels in each frame. We have opted for the use of multiple frames with known relative camera transformations, as it allows introduction of geometrical constraints via an interpretable ICP-like loss function. The loss function is implemented with a differentiable renderer and is optimized iteratively. We also demonstrate that a full detection and refinement pipeline, which is trained solely on synthetic data, can be used for auto-labeling real data. We perform quantitative evaluation on LineMOD, Occlusion, Homebrewed and YCB-V datasets and report excellent performance in comparison to the state-of-the-art methods trained on the synthetic and real data. We demonstrate empirically that our approach requires only a few frames and is robust to close camera locations and noise in extrinsic camera calibration, making its practical usage easier and more ubiquitous. △ Less

Submitted 6 July, 2022; originally announced July 2022.

Journal ref: IEEE Robotics and Automation Letters, 2021

arXiv:2202.10784 [pdf, other]

RuCLIP -- new models and experiments: a technical report

Authors: Alex Shonenkov, Andrey Kuznetsov, Denis Dimitrov, Tatyana Shavrina, Daniil Chesakov, Anastasia Maltseva, Alena Fenogenova, Igor Pavlov, Anton Emelyanov, Sergey Markov, Daria Bakshandaeva, Vera Shybaeva, Andrey Chertok

Abstract: In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and co… ▽ More In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and concentrate on the conducted experiments. Inference execution time comparison is also presented in the report. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2007.00966 [pdf]

Gravity: a blockchain-agnostic cross-chain communication and data oracles protocol

Authors: Aleksei Pupyshev, Dmitry Gubanov, Elshan Dzhafarov, Ilya Sapranidi, Inal Kardanov, Vladimir Zhuravlev, Shamil Khalilov, Marc Jansen, Sten Laureyssens, Igor Pavlov, Sasha Ivanov

Abstract: This paper intends to propose the architecture of a blockchain-agnostic protocol designed for communication of blockchains amongst each other (i.e. cross-chain), and for blockchains with the outside world (i.e. data oracles). The expansive growth of cutting-edge technology in the blockchain industry outlines the need and opportunity for addressing oracle consensus in a manner both technologically… ▽ More This paper intends to propose the architecture of a blockchain-agnostic protocol designed for communication of blockchains amongst each other (i.e. cross-chain), and for blockchains with the outside world (i.e. data oracles). The expansive growth of cutting-edge technology in the blockchain industry outlines the need and opportunity for addressing oracle consensus in a manner both technologically and economically efficient as well as futureproof. Blockchain-agnosticism is inherently limited if proposing a technological solution involves adding one more architectural layer. As such, Gravity protocol is designed to be a truly blockchain-agnostic protocol. By ensuring parity through direct integration and by leveraging the stability and security of the respective interconnected ecosystems, Gravity circumvents the need for a dedicated, public blockchain and a native token. Ultimately, Gravity protocol intends to address scalability challenges by providing a solid infrastructure for the creation of gateways, cross-chain applications, and sidechains. This paper introduces and defines the concept of Oracle Consensus and its implementation in the Gravity protocol named the Pulse Consensus algorithm. The proposed consensus architecture allows Gravity to be considered a singular decentralized blockchain-agnostic oracle. △ Less

Submitted 31 August, 2020; v1 submitted 2 July, 2020; originally announced July 2020.

arXiv:1911.03446 [pdf, other]

doi 10.1038/s41467-021-20901-5

Scaling advantage in quantum simulation of geometrically frustrated magnets

Authors: Andrew D. King, Jack Raymond, Trevor Lanting, Sergei V. Isakov, Masoud Mohseni, Gabriel Poulin-Lamarre, Sara Ejtemaee, William Bernoudy, Isil Ozfidan, Anatoly Yu. Smirnov, Mauricio Reis, Fabio Altomare, Michael Babcock, Catia Baron, Andrew J. Berkley, Kelly Boothby, Paul I. Bunyk, Holly Christiani, Colin Enderud, Bram Evert, Richard Harris, Emile Hoskinson, Shuiyuan Huang, Kais Jooya, Ali Khodabandelou , et al. (29 additional authors not shown)

Abstract: The promise of quantum computing lies in harnessing programmable quantum devices for practical applications such as efficient simulation of quantum materials and condensed matter systems. One important task is the simulation of geometrically frustrated magnets in which topological phenomena can emerge from competition between quantum and thermal fluctuations. Here we report on experimental observa… ▽ More The promise of quantum computing lies in harnessing programmable quantum devices for practical applications such as efficient simulation of quantum materials and condensed matter systems. One important task is the simulation of geometrically frustrated magnets in which topological phenomena can emerge from competition between quantum and thermal fluctuations. Here we report on experimental observations of relaxation in such simulations, measured on up to 1440 qubits with microsecond resolution. By initializing the system in a state with topological obstruction, we observe quantum annealing (QA) relaxation timescales in excess of one microsecond. Measurements indicate a dynamical advantage in the quantum simulation over the classical approach of path-integral Monte Carlo (PIMC) fixed-Hamiltonian relaxation with multiqubit cluster updates. The advantage increases with both system size and inverse temperature, exceeding a million-fold speedup over a CPU. This is an important piece of experimental evidence that in general, PIMC does not mimic QA dynamics for stoquastic Hamiltonians. The observed scaling advantage, for simulation of frustrated magnetism in quantum condensed matter, demonstrates that near-term quantum devices can be used to accelerate computational tasks of practical relevance. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Comments: 7 pages, 4 figures, 22 pages of supplemental material with 18 figures

Showing 1–6 of 6 results for author: Pavlov, I