Search | arXiv e-print repository

YaART: Yet Another ART Rendering Technology

Authors: Sergey Kastryulin, Artem Konev, Alexander Shishenya, Eugene Lyapustin, Artem Khurshudov, Alexander Tselousov, Nikita Vinokurov, Denis Kuznedelev, Alexander Markovich, Grigoriy Livshits, Alexey Kirillov, Anastasiia Tabisheva, Liubov Chubarova, Marina Kaminskaia, Alexander Ustyuzhanin, Artemii Shvetsov, Daniil Shlenskii, Valerii Startsev, Dmitrii Kornilov, Mikhail Romanov, Artem Babenko, Sergei Ovcharenko, Valentin Khrulkov

Abstract: In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus… ▽ More In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus on the choices of the model and training dataset sizes, the aspects that were not systematically investigated for text-to-image cascaded diffusion models before. In particular, we comprehensively analyze how these choices affect both the efficiency of the training process and the quality of the generated images, which are highly important in practice. Furthermore, we demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets, establishing a more efficient scenario of diffusion models training. From the quality perspective, YaART is consistently preferred by users over many existing state-of-the-art models. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Prompts and additional information are available on the project page, see https://ya.ru/ai/art/paper-yaart-v1

arXiv:2312.15937 [pdf, ps, other]

Perfect mixed codes from generalized Reed-Muller codes

Authors: Alexander M. Romanov

Abstract: In this paper, we propose a new method for constructing $1$-perfect mixed codes in the Cartesian product $\mathbb{F}_{n} \times \mathbb{F}_{q}^n$, where $\mathbb{F}_{n}$ and $\mathbb{F}_{q}$ are finite fields of orders $n = q^m$ and $q$. We consider generalized Reed-Muller codes of length $n = q^m$ and order $(q - 1)m - 2$. Codes whose parameters are the same as the parameters of generalized Reed-… ▽ More In this paper, we propose a new method for constructing $1$-perfect mixed codes in the Cartesian product $\mathbb{F}_{n} \times \mathbb{F}_{q}^n$, where $\mathbb{F}_{n}$ and $\mathbb{F}_{q}$ are finite fields of orders $n = q^m$ and $q$. We consider generalized Reed-Muller codes of length $n = q^m$ and order $(q - 1)m - 2$. Codes whose parameters are the same as the parameters of generalized Reed-Muller codes are called Reed-Muller-like codes. The construction we propose is based on partitions of distance-2 MDS codes into Reed-Muller-like codes of order $(q - 1)m - 2$. We construct a set of $q^{q^{cn}}$ nonequivalent 1-perfect mixed codes in the Cartesian product $\mathbb{F}_{n} \times \mathbb{F}_{q}^{n}$, where the constant $c$ satisfies $c < 1$, $n = q^m$ and $m$ is a sufficiently large positive integer. We also prove that each $1$-perfect mixed code in the Cartesian product $\mathbb{F}_{n} \times \mathbb{F}_{q}^n$ corresponds to a certain partition of a distance-2 MDS code into Reed-Muller-like codes of order $(q - 1)m - 2$. △ Less

Submitted 26 December, 2023; originally announced December 2023.

arXiv:2306.02878 [pdf, other]

Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures with Uncalibrated Stereo Data

Authors: Nikolay Patakin, Mikhail Romanov, Anna Vorontsova, Mikhail Artemyev, Anton Konushin

Abstract: Nowadays, robotics, AR, and 3D modeling applications attract considerable attention to single-view depth estimation (SVDE) as it allows estimating scene geometry from a single RGB image. Recent works have demonstrated that the accuracy of an SVDE method hugely depends on the diversity and volume of the training data. However, RGB-D datasets obtained via depth capturing or 3D reconstruction are typ… ▽ More Nowadays, robotics, AR, and 3D modeling applications attract considerable attention to single-view depth estimation (SVDE) as it allows estimating scene geometry from a single RGB image. Recent works have demonstrated that the accuracy of an SVDE method hugely depends on the diversity and volume of the training data. However, RGB-D datasets obtained via depth capturing or 3D reconstruction are typically small, synthetic datasets are not photorealistic enough, and all these datasets lack diversity. The large-scale and diverse data can be sourced from stereo images or stereo videos from the web. Typically being uncalibrated, stereo data provides disparities up to unknown shift (geometrically incomplete data), so stereo-trained SVDE methods cannot recover 3D geometry. It was recently shown that the distorted point clouds obtained with a stereo-trained SVDE method can be corrected with additional point cloud modules (PCM) separately trained on the geometrically complete data. On the contrary, we propose GP$^{2}$, General-Purpose and Geometry-Preserving training scheme, and show that conventional SVDE models can learn correct shifts themselves without any post-processing, benefiting from using stereo data even in the geometry-preserving setting. Through experiments on different dataset mixtures, we prove that GP$^{2}$-trained models outperform methods relying on PCM in both accuracy and speed, and report the state-of-the-art results in the general-purpose geometry-preserving SVDE. Moreover, we show that SVDE models can learn to predict geometrically correct depth even when geometrically complete data comprises the minor part of the training set. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Journal ref: CVPR 2022

arXiv:2111.00774 [pdf, ps, other]

On the number of $q$-ary quasi-perfect codes with covering radius 2

Authors: Alexander M. Romanov

Abstract: In this paper we present a family of $q$-ary nonlinear quasi-perfect codes with covering radius 2. The codes have length $n = q^m$ and size $ M = q^{n - m - 1}$ where $q$ is a prime power, $q \geq 3$, $m$ is an integer, $m \geq 2$. We prove that there are more than $q^{q^{cn}}$ nonequivalent such codes of length $n$, for all sufficiently large $n$ and a constant $c = \frac{1}{q} - \varepsilon$. In this paper we present a family of $q$-ary nonlinear quasi-perfect codes with covering radius 2. The codes have length $n = q^m$ and size $ M = q^{n - m - 1}$ where $q$ is a prime power, $q \geq 3$, $m$ is an integer, $m \geq 2$. We prove that there are more than $q^{q^{cn}}$ nonequivalent such codes of length $n$, for all sufficiently large $n$ and a constant $c = \frac{1}{q} - \varepsilon$. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2009.12419 [pdf, other]

Towards General Purpose Geometry-Preserving Single-View Depth Estimation

Authors: Mikhail Romanov, Nikolay Patatkin, Anna Vorontsova, Sergey Nikolenko, Anton Konushin, Dmitry Senyushkin

Abstract: Single-view depth estimation (SVDE) plays a crucial role in scene understanding for AR applications, 3D modeling, and robotics, providing the geometry of a scene based on a single image. Recent works have shown that a successful solution strongly relies on the diversity and volume of training data. This data can be sourced from stereo movies and photos. However, they do not provide geometrically c… ▽ More Single-view depth estimation (SVDE) plays a crucial role in scene understanding for AR applications, 3D modeling, and robotics, providing the geometry of a scene based on a single image. Recent works have shown that a successful solution strongly relies on the diversity and volume of training data. This data can be sourced from stereo movies and photos. However, they do not provide geometrically complete depth maps (as disparities contain unknown shift value). Therefore, existing models trained on this data are not able to recover correct 3D representations. Our work shows that a model trained on this data along with conventional datasets can gain accuracy while predicting correct scene geometry. Surprisingly, only a small portion of geometrically correct depth maps are required to train a model that performs equally to a model trained on the full geometrically correct dataset. After that, we train computationally efficient models on a mixture of datasets using the proposed method. Through quantitative comparison on completely unseen datasets and qualitative comparison of 3D point clouds, we show that our model defines the new state of the art in general-purpose SVDE. △ Less

Submitted 9 February, 2021; v1 submitted 25 September, 2020; originally announced September 2020.

arXiv:2006.10451 [pdf, other]

Learning High-Resolution Domain-Specific Representations with a GAN Generator

Authors: Danil Galeev, Konstantin Sofiiuk, Danila Rukhovich, Mikhail Romanov, Olga Barinova, Anton Konushin

Abstract: In recent years generative models of visual data have made a great progress, and now they are able to produce images of high quality and diversity. In this work we study representations learnt by a GAN generator. First, we show that these representations can be easily projected onto semantic segmentation map using a lightweight decoder. We find that such semantic projection can be learnt from just… ▽ More In recent years generative models of visual data have made a great progress, and now they are able to produce images of high quality and diversity. In this work we study representations learnt by a GAN generator. First, we show that these representations can be easily projected onto semantic segmentation map using a lightweight decoder. We find that such semantic projection can be learnt from just a few annotated images. Based on this finding, we propose LayerMatch scheme for approximating the representation of a GAN generator that can be used for unsupervised domain-specific pretraining. We consider the semi-supervised learning scenario when a small amount of labeled data is available along with a large unlabeled dataset from the same domain. We find that the use of LayerMatch-pretrained backbone leads to superior accuracy compared to standard supervised pretraining on ImageNet. Moreover, this simple approach also outperforms recent semi-supervised semantic segmentation methods that use both labeled and unlabeled data during training. Source code for reproducing our experiments will be available at the time of publication. △ Less

Submitted 18 June, 2020; originally announced June 2020.

arXiv:2005.08607 [pdf, other]

Decoder Modulation for Indoor Depth Completion

Authors: Dmitry Senushkin, Mikhail Romanov, Ilia Belikov, Anton Konushin, Nikolay Patakin

Abstract: Depth completion recovers a dense depth map from sensor measurements. Current methods are mostly tailored for very sparse depth measurements from LiDARs in outdoor settings, while for indoor scenes Time-of-Flight (ToF) or structured light sensors are mostly used. These sensors provide semi-dense maps, with dense measurements in some regions and almost empty in others. We propose a new model that t… ▽ More Depth completion recovers a dense depth map from sensor measurements. Current methods are mostly tailored for very sparse depth measurements from LiDARs in outdoor settings, while for indoor scenes Time-of-Flight (ToF) or structured light sensors are mostly used. These sensors provide semi-dense maps, with dense measurements in some regions and almost empty in others. We propose a new model that takes into account the statistical difference between such regions. Our main contribution is a new decoder modulation branch added to the encoder-decoder architecture. The encoder extracts features from the concatenated RGB image and raw depth. Given the mask of missing values as input, the proposed modulation branch controls the decoding of a dense depth map from these features differently for different regions. This is implemented by modifying the spatial distribution of output signals inside the decoder via Spatially-Adaptive Denormalization (SPADE) blocks. Our second contribution is a novel training strategy that allows us to train on a semi-dense sensor data when the ground truth depth map is not available. Our model achieves the state of the art results on indoor Matterport3D dataset. Being designed for semi-dense input depth, our model is still competitive with LiDAR-oriented approaches on the KITTI dataset. Our training strategy significantly improves prediction quality with no dense ground truth available, as validated on the NYUv2 dataset. △ Less

Submitted 8 February, 2021; v1 submitted 18 May, 2020; originally announced May 2020.

arXiv:1811.08466 [pdf, other]

doi 10.1109/IROS40897.2019.8968227

Double Refinement Network for Efficient Indoor Monocular Depth Estimation

Authors: Nikita Durasov, Mikhail Romanov, Valeriya Bubnova, Pavel Bogomolov, Anton Konushin

Abstract: Monocular depth estimation is the task of obtaining a measure of distance for each pixel using a single image. It is an important problem in computer vision and is usually solved using neural networks. Though recent works in this area have shown significant improvement in accuracy, the state-of-the-art methods tend to require massive amounts of memory and time to process an image. The main purpose… ▽ More Monocular depth estimation is the task of obtaining a measure of distance for each pixel using a single image. It is an important problem in computer vision and is usually solved using neural networks. Though recent works in this area have shown significant improvement in accuracy, the state-of-the-art methods tend to require massive amounts of memory and time to process an image. The main purpose of this work is to improve the performance of the latest solutions with no decrease in accuracy. To this end, we introduce the Double Refinement Network architecture. The proposed method achieves state-of-the-art results on the standard benchmark RGB-D dataset NYU Depth v2, while its frames per second rate is significantly higher (up to 18 times speedup per image at batch size 1) and the RAM usage per image is lower. △ Less

Submitted 4 April, 2019; v1 submitted 20 November, 2018; originally announced November 2018.

Journal ref: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:1809.03891 [pdf, other]

Studying the History of the Arabic Language: Language Technology and a Large-Scale Historical Corpus

Authors: Yonatan Belinkov, Alexander Magidow, Alberto Barrón-Cedeño, Avi Shmidman, Maxim Romanov

Abstract: Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties. Therefore, studying the history of the language has so far been mostly limited to manual analyses on a small scale. In this work, we present a large-scale historical corpus of the written Arabic language, spanning 1400 years. We describe our… ▽ More Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties. Therefore, studying the history of the language has so far been mostly limited to manual analyses on a small scale. In this work, we present a large-scale historical corpus of the written Arabic language, spanning 1400 years. We describe our efforts to clean and process this corpus using Arabic NLP tools, including the identification of reused text. We study the history of the Arabic language using a novel automatic periodization algorithm, as well as other techniques. Our findings confirm the established division of written Arabic into Modern Standard and Classical Arabic, and confirm other established periodizations, while suggesting that written Arabic may be divisible into still further periods of development. △ Less

Submitted 11 September, 2018; originally announced September 2018.

ACM Class: I.2.7

arXiv:1711.00189 [pdf, ps, other]

A generalized concatenation construction for q-ary 1-perfect codes

Authors: Alexander M. Romanov

Abstract: We consider perfect 1-error correcting codes over a finite field with $q$ elements (briefly $q$-ary 1-perfect codes). In this paper, a generalized concatenation construction for $q$-ary 1-perfect codes is presented that allows us to construct $q$-ary 1-perfect codes of length $(q - 1)nm + n + m$ from the given $q$-ary 1-perfect codes of length $n =(q^{s_1} - 1) / (q - 1)$ and… ▽ More We consider perfect 1-error correcting codes over a finite field with $q$ elements (briefly $q$-ary 1-perfect codes). In this paper, a generalized concatenation construction for $q$-ary 1-perfect codes is presented that allows us to construct $q$-ary 1-perfect codes of length $(q - 1)nm + n + m$ from the given $q$-ary 1-perfect codes of length $n =(q^{s_1} - 1) / (q - 1)$ and $m = (q^{s_2} - 1) / (q - 1)$, where $s_1, s_2$ are natural numbers not less than two. This construction allows us to also construct $q$-ary codes with parameters $(q^{s_1 + s_2}, q^{q^{s_1 + s_2} - (s_1 + s_2) - 1}, 3)_q$ and can be regarded as a $q$-ary analogue of the well-known Phelps construction. △ Less

Submitted 31 October, 2017; originally announced November 2017.

arXiv:1704.02627 [pdf, ps, other]

On non-full-rank perfect codes over finite fields

Authors: Alexander M. Romanov

Abstract: The paper deals with the perfect 1-error correcting codes over a finite field with $q$ elements (briefly $q$-ary 1-perfect codes). We show that the orthogonal code to the $q$-ary non-full-rank 1-perfect code of length $n = (q^{m}-1)/(q-1)$ is a $q$-ary constant-weight code with Hamming weight equals to $q^{m - 1}$ where $m$ is any natural number not less than two. We derive necessary and sufficien… ▽ More The paper deals with the perfect 1-error correcting codes over a finite field with $q$ elements (briefly $q$-ary 1-perfect codes). We show that the orthogonal code to the $q$-ary non-full-rank 1-perfect code of length $n = (q^{m}-1)/(q-1)$ is a $q$-ary constant-weight code with Hamming weight equals to $q^{m - 1}$ where $m$ is any natural number not less than two. We derive necessary and sufficient conditions for $q$-ary 1-perfect codes of non-full rank. We suggest a generalization of the concatenation construction to the $q$-ary case and construct the ternary 1-perfect codes of length 13 and rank 12. △ Less

Submitted 9 April, 2017; originally announced April 2017.

MSC Class: 94B25

arXiv:1703.09550 [pdf]

Important New Developments in Arabographic Optical Character Recognition (OCR)

Authors: Maxim Romanov, Matthew Thomas Miller, Sarah Bowen Savant, Benjamin Kiessling

Abstract: The OpenITI team has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines. These accuracy rates not only represent a distinct improvement over the actual accuracy rates of the various proprietary OCR o… ▽ More The OpenITI team has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines. These accuracy rates not only represent a distinct improvement over the actual accuracy rates of the various proprietary OCR options for classical Arabic-script texts, but, equally important, they are produced using an open-source OCR software, thus enabling us to make this Arabic-script OCR technology freely available to the broader Islamic, Persian, and Arabic Studies communities. △ Less

Submitted 28 March, 2017; originally announced March 2017.

arXiv:1612.08989 [pdf, other]

Shamela: A Large-Scale Historical Arabic Corpus

Authors: Yonatan Belinkov, Alexander Magidow, Maxim Romanov, Avi Shmidman, Moshe Koppel

Abstract: Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries. Yet existing Arabic corpora largely focus on the modern period or lack sufficient diachronic information. We develop a large-scale, historical corpus of Arabic of about 1 billion words from diverse periods of time. We clean this corpus, process it with a morphological analyzer, and enhance it by… ▽ More Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries. Yet existing Arabic corpora largely focus on the modern period or lack sufficient diachronic information. We develop a large-scale, historical corpus of Arabic of about 1 billion words from diverse periods of time. We clean this corpus, process it with a morphological analyzer, and enhance it by detecting parallel passages and automatically dating undated texts. We demonstrate its utility with selected case-studies in which we show its application to the digital humanities. △ Less

Submitted 28 December, 2016; originally announced December 2016.

Comments: Slightly expanded version of Coling LT4DH workshop paper

ACM Class: I.2.7

arXiv:1310.1174 [pdf, ps, other]

doi 10.1134/S1990478916030157

Full-Rank Perfect Codes over Finite Fields

Authors: Alexander M. Romanov

Abstract: In this paper, we propose a construction of full-rank q-ary 1-perfect codes over finite fields. This construction is a generalization of the Etzion and Vardy construction of full-rank binary 1-perfect codes (1994). Properties of i-components of q-ary Hamming codes are investigated and the construction of full-rank q-ary 1-perfect codes is based on these properties. The switching construction of 1-… ▽ More In this paper, we propose a construction of full-rank q-ary 1-perfect codes over finite fields. This construction is a generalization of the Etzion and Vardy construction of full-rank binary 1-perfect codes (1994). Properties of i-components of q-ary Hamming codes are investigated and the construction of full-rank q-ary 1-perfect codes is based on these properties. The switching construction of 1-perfect codes are generalized for the q-ary case. We give a generalization of the concept of i-component of 1-perfect codes and introduce the concept of (i,σ)-components of q-ary 1-perfect codes. We also present a generalization of the Lindström and Schönheim construction of q-ary 1-perfect codes and provide a lower bound on the number of pairwise distinct q-ary 1-perfect codes of length n. △ Less

Submitted 4 October, 2013; originally announced October 2013.

Comments: 8 pages; submitted to IEEE Transactions on Information Theory

arXiv:1202.0349 [pdf, ps, other]

On the admissible families of components of Hamming codes

Authors: Alexander M. Romanov

Abstract: In this paper, we describe the properties of the $i$-components of Hamming codes. We suggest constructions of the admissible families of components of Hamming codes. It is shown that every $q$-ary code of length $m$ and minimum distance 5 (for $q = 3$ the minimum distance is 3) can be embedded in a $q$-ary 1-perfect code of length $n = (q^{m}-1)/(q-1)$. It is also shown that every binary code of l… ▽ More In this paper, we describe the properties of the $i$-components of Hamming codes. We suggest constructions of the admissible families of components of Hamming codes. It is shown that every $q$-ary code of length $m$ and minimum distance 5 (for $q = 3$ the minimum distance is 3) can be embedded in a $q$-ary 1-perfect code of length $n = (q^{m}-1)/(q-1)$. It is also shown that every binary code of length $m + k$ and minimum distance $3k + 3$ can be embedded in a binary 1-perfect code of length $n = 2^{m}-1$. △ Less

Submitted 1 February, 2012; originally announced February 2012.

Comments: 7 pages

Showing 1–15 of 15 results for author: Romanov, M