-
YaART: Yet Another ART Rendering Technology
Authors:
Sergey Kastryulin,
Artem Konev,
Alexander Shishenya,
Eugene Lyapustin,
Artem Khurshudov,
Alexander Tselousov,
Nikita Vinokurov,
Denis Kuznedelev,
Alexander Markovich,
Grigoriy Livshits,
Alexey Kirillov,
Anastasiia Tabisheva,
Liubov Chubarova,
Marina Kaminskaia,
Alexander Ustyuzhanin,
Artemii Shvetsov,
Daniil Shlenskii,
Valerii Startsev,
Dmitrii Kornilov,
Mikhail Romanov,
Artem Babenko,
Sergei Ovcharenko,
Valentin Khrulkov
Abstract:
In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus…
▽ More
In the rapidly progressing field of generative models, the development of efficient and high-fidelity text-to-image diffusion systems represents a significant frontier. This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences using Reinforcement Learning from Human Feedback (RLHF). During the development of YaART, we especially focus on the choices of the model and training dataset sizes, the aspects that were not systematically investigated for text-to-image cascaded diffusion models before. In particular, we comprehensively analyze how these choices affect both the efficiency of the training process and the quality of the generated images, which are highly important in practice. Furthermore, we demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets, establishing a more efficient scenario of diffusion models training. From the quality perspective, YaART is consistently preferred by users over many existing state-of-the-art models.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Perfect mixed codes from generalized Reed-Muller codes
Authors:
Alexander M. Romanov
Abstract:
In this paper, we propose a new method for constructing $1$-perfect mixed codes in the Cartesian product $\mathbb{F}_{n} \times \mathbb{F}_{q}^n$, where $\mathbb{F}_{n}$ and $\mathbb{F}_{q}$ are finite fields of orders $n = q^m$ and $q$. We consider generalized Reed-Muller codes of length $n = q^m$ and order $(q - 1)m - 2$. Codes whose parameters are the same as the parameters of generalized Reed-…
▽ More
In this paper, we propose a new method for constructing $1$-perfect mixed codes in the Cartesian product $\mathbb{F}_{n} \times \mathbb{F}_{q}^n$, where $\mathbb{F}_{n}$ and $\mathbb{F}_{q}$ are finite fields of orders $n = q^m$ and $q$. We consider generalized Reed-Muller codes of length $n = q^m$ and order $(q - 1)m - 2$. Codes whose parameters are the same as the parameters of generalized Reed-Muller codes are called Reed-Muller-like codes. The construction we propose is based on partitions of distance-2 MDS codes into Reed-Muller-like codes of order $(q - 1)m - 2$. We construct a set of $q^{q^{cn}}$ nonequivalent 1-perfect mixed codes in the Cartesian product $\mathbb{F}_{n} \times \mathbb{F}_{q}^{n}$, where the constant $c$ satisfies $c < 1$, $n = q^m$ and $m$ is a sufficiently large positive integer. We also prove that each $1$-perfect mixed code in the Cartesian product $\mathbb{F}_{n} \times \mathbb{F}_{q}^n$ corresponds to a certain partition of a distance-2 MDS code into Reed-Muller-like codes of order $(q - 1)m - 2$.
△ Less
Submitted 26 December, 2023;
originally announced December 2023.
-
Single-Stage 3D Geometry-Preserving Depth Estimation Model Training on Dataset Mixtures with Uncalibrated Stereo Data
Authors:
Nikolay Patakin,
Mikhail Romanov,
Anna Vorontsova,
Mikhail Artemyev,
Anton Konushin
Abstract:
Nowadays, robotics, AR, and 3D modeling applications attract considerable attention to single-view depth estimation (SVDE) as it allows estimating scene geometry from a single RGB image. Recent works have demonstrated that the accuracy of an SVDE method hugely depends on the diversity and volume of the training data. However, RGB-D datasets obtained via depth capturing or 3D reconstruction are typ…
▽ More
Nowadays, robotics, AR, and 3D modeling applications attract considerable attention to single-view depth estimation (SVDE) as it allows estimating scene geometry from a single RGB image. Recent works have demonstrated that the accuracy of an SVDE method hugely depends on the diversity and volume of the training data. However, RGB-D datasets obtained via depth capturing or 3D reconstruction are typically small, synthetic datasets are not photorealistic enough, and all these datasets lack diversity. The large-scale and diverse data can be sourced from stereo images or stereo videos from the web. Typically being uncalibrated, stereo data provides disparities up to unknown shift (geometrically incomplete data), so stereo-trained SVDE methods cannot recover 3D geometry. It was recently shown that the distorted point clouds obtained with a stereo-trained SVDE method can be corrected with additional point cloud modules (PCM) separately trained on the geometrically complete data. On the contrary, we propose GP$^{2}$, General-Purpose and Geometry-Preserving training scheme, and show that conventional SVDE models can learn correct shifts themselves without any post-processing, benefiting from using stereo data even in the geometry-preserving setting. Through experiments on different dataset mixtures, we prove that GP$^{2}$-trained models outperform methods relying on PCM in both accuracy and speed, and report the state-of-the-art results in the general-purpose geometry-preserving SVDE. Moreover, we show that SVDE models can learn to predict geometrically correct depth even when geometrically complete data comprises the minor part of the training set.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.
-
On the number of $q$-ary quasi-perfect codes with covering radius 2
Authors:
Alexander M. Romanov
Abstract:
In this paper we present a family of $q$-ary nonlinear quasi-perfect codes with covering radius 2. The codes have length $n = q^m$ and size $ M = q^{n - m - 1}$ where $q$ is a prime power, $q \geq 3$, $m$ is an integer, $m \geq 2$. We prove that there are more than $q^{q^{cn}}$ nonequivalent such codes of length $n$, for all sufficiently large $n$ and a constant $c = \frac{1}{q} - \varepsilon$.
In this paper we present a family of $q$-ary nonlinear quasi-perfect codes with covering radius 2. The codes have length $n = q^m$ and size $ M = q^{n - m - 1}$ where $q$ is a prime power, $q \geq 3$, $m$ is an integer, $m \geq 2$. We prove that there are more than $q^{q^{cn}}$ nonequivalent such codes of length $n$, for all sufficiently large $n$ and a constant $c = \frac{1}{q} - \varepsilon$.
△ Less
Submitted 1 November, 2021;
originally announced November 2021.
-
Towards General Purpose Geometry-Preserving Single-View Depth Estimation
Authors:
Mikhail Romanov,
Nikolay Patatkin,
Anna Vorontsova,
Sergey Nikolenko,
Anton Konushin,
Dmitry Senyushkin
Abstract:
Single-view depth estimation (SVDE) plays a crucial role in scene understanding for AR applications, 3D modeling, and robotics, providing the geometry of a scene based on a single image. Recent works have shown that a successful solution strongly relies on the diversity and volume of training data. This data can be sourced from stereo movies and photos. However, they do not provide geometrically c…
▽ More
Single-view depth estimation (SVDE) plays a crucial role in scene understanding for AR applications, 3D modeling, and robotics, providing the geometry of a scene based on a single image. Recent works have shown that a successful solution strongly relies on the diversity and volume of training data. This data can be sourced from stereo movies and photos. However, they do not provide geometrically complete depth maps (as disparities contain unknown shift value). Therefore, existing models trained on this data are not able to recover correct 3D representations. Our work shows that a model trained on this data along with conventional datasets can gain accuracy while predicting correct scene geometry. Surprisingly, only a small portion of geometrically correct depth maps are required to train a model that performs equally to a model trained on the full geometrically correct dataset. After that, we train computationally efficient models on a mixture of datasets using the proposed method. Through quantitative comparison on completely unseen datasets and qualitative comparison of 3D point clouds, we show that our model defines the new state of the art in general-purpose SVDE.
△ Less
Submitted 9 February, 2021; v1 submitted 25 September, 2020;
originally announced September 2020.
-
Learning High-Resolution Domain-Specific Representations with a GAN Generator
Authors:
Danil Galeev,
Konstantin Sofiiuk,
Danila Rukhovich,
Mikhail Romanov,
Olga Barinova,
Anton Konushin
Abstract:
In recent years generative models of visual data have made a great progress, and now they are able to produce images of high quality and diversity. In this work we study representations learnt by a GAN generator. First, we show that these representations can be easily projected onto semantic segmentation map using a lightweight decoder. We find that such semantic projection can be learnt from just…
▽ More
In recent years generative models of visual data have made a great progress, and now they are able to produce images of high quality and diversity. In this work we study representations learnt by a GAN generator. First, we show that these representations can be easily projected onto semantic segmentation map using a lightweight decoder. We find that such semantic projection can be learnt from just a few annotated images. Based on this finding, we propose LayerMatch scheme for approximating the representation of a GAN generator that can be used for unsupervised domain-specific pretraining. We consider the semi-supervised learning scenario when a small amount of labeled data is available along with a large unlabeled dataset from the same domain. We find that the use of LayerMatch-pretrained backbone leads to superior accuracy compared to standard supervised pretraining on ImageNet. Moreover, this simple approach also outperforms recent semi-supervised semantic segmentation methods that use both labeled and unlabeled data during training. Source code for reproducing our experiments will be available at the time of publication.
△ Less
Submitted 18 June, 2020;
originally announced June 2020.
-
Decoder Modulation for Indoor Depth Completion
Authors:
Dmitry Senushkin,
Mikhail Romanov,
Ilia Belikov,
Anton Konushin,
Nikolay Patakin
Abstract:
Depth completion recovers a dense depth map from sensor measurements. Current methods are mostly tailored for very sparse depth measurements from LiDARs in outdoor settings, while for indoor scenes Time-of-Flight (ToF) or structured light sensors are mostly used. These sensors provide semi-dense maps, with dense measurements in some regions and almost empty in others. We propose a new model that t…
▽ More
Depth completion recovers a dense depth map from sensor measurements. Current methods are mostly tailored for very sparse depth measurements from LiDARs in outdoor settings, while for indoor scenes Time-of-Flight (ToF) or structured light sensors are mostly used. These sensors provide semi-dense maps, with dense measurements in some regions and almost empty in others. We propose a new model that takes into account the statistical difference between such regions. Our main contribution is a new decoder modulation branch added to the encoder-decoder architecture. The encoder extracts features from the concatenated RGB image and raw depth. Given the mask of missing values as input, the proposed modulation branch controls the decoding of a dense depth map from these features differently for different regions. This is implemented by modifying the spatial distribution of output signals inside the decoder via Spatially-Adaptive Denormalization (SPADE) blocks. Our second contribution is a novel training strategy that allows us to train on a semi-dense sensor data when the ground truth depth map is not available. Our model achieves the state of the art results on indoor Matterport3D dataset. Being designed for semi-dense input depth, our model is still competitive with LiDAR-oriented approaches on the KITTI dataset. Our training strategy significantly improves prediction quality with no dense ground truth available, as validated on the NYUv2 dataset.
△ Less
Submitted 8 February, 2021; v1 submitted 18 May, 2020;
originally announced May 2020.
-
Double Refinement Network for Efficient Indoor Monocular Depth Estimation
Authors:
Nikita Durasov,
Mikhail Romanov,
Valeriya Bubnova,
Pavel Bogomolov,
Anton Konushin
Abstract:
Monocular depth estimation is the task of obtaining a measure of distance for each pixel using a single image. It is an important problem in computer vision and is usually solved using neural networks. Though recent works in this area have shown significant improvement in accuracy, the state-of-the-art methods tend to require massive amounts of memory and time to process an image. The main purpose…
▽ More
Monocular depth estimation is the task of obtaining a measure of distance for each pixel using a single image. It is an important problem in computer vision and is usually solved using neural networks. Though recent works in this area have shown significant improvement in accuracy, the state-of-the-art methods tend to require massive amounts of memory and time to process an image. The main purpose of this work is to improve the performance of the latest solutions with no decrease in accuracy. To this end, we introduce the Double Refinement Network architecture. The proposed method achieves state-of-the-art results on the standard benchmark RGB-D dataset NYU Depth v2, while its frames per second rate is significantly higher (up to 18 times speedup per image at batch size 1) and the RAM usage per image is lower.
△ Less
Submitted 4 April, 2019; v1 submitted 20 November, 2018;
originally announced November 2018.
-
Studying the History of the Arabic Language: Language Technology and a Large-Scale Historical Corpus
Authors:
Yonatan Belinkov,
Alexander Magidow,
Alberto Barrón-Cedeño,
Avi Shmidman,
Maxim Romanov
Abstract:
Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties. Therefore, studying the history of the language has so far been mostly limited to manual analyses on a small scale. In this work, we present a large-scale historical corpus of the written Arabic language, spanning 1400 years. We describe our…
▽ More
Arabic is a widely-spoken language with a long and rich history, but existing corpora and language technology focus mostly on modern Arabic and its varieties. Therefore, studying the history of the language has so far been mostly limited to manual analyses on a small scale. In this work, we present a large-scale historical corpus of the written Arabic language, spanning 1400 years. We describe our efforts to clean and process this corpus using Arabic NLP tools, including the identification of reused text. We study the history of the Arabic language using a novel automatic periodization algorithm, as well as other techniques. Our findings confirm the established division of written Arabic into Modern Standard and Classical Arabic, and confirm other established periodizations, while suggesting that written Arabic may be divisible into still further periods of development.
△ Less
Submitted 11 September, 2018;
originally announced September 2018.
-
A generalized concatenation construction for q-ary 1-perfect codes
Authors:
Alexander M. Romanov
Abstract:
We consider perfect 1-error correcting codes over a finite field with $q$ elements (briefly $q$-ary 1-perfect codes). In this paper, a generalized concatenation construction for $q$-ary 1-perfect codes is presented that allows us to construct $q$-ary 1-perfect codes of length $(q - 1)nm + n + m$ from the given $q$-ary 1-perfect codes of length $n =(q^{s_1} - 1) / (q - 1)$ and…
▽ More
We consider perfect 1-error correcting codes over a finite field with $q$ elements (briefly $q$-ary 1-perfect codes). In this paper, a generalized concatenation construction for $q$-ary 1-perfect codes is presented that allows us to construct $q$-ary 1-perfect codes of length $(q - 1)nm + n + m$ from the given $q$-ary 1-perfect codes of length $n =(q^{s_1} - 1) / (q - 1)$ and $m = (q^{s_2} - 1) / (q - 1)$, where $s_1, s_2$ are natural numbers not less than two. This construction allows us to also construct $q$-ary codes with parameters $(q^{s_1 + s_2}, q^{q^{s_1 + s_2} - (s_1 + s_2) - 1}, 3)_q$ and can be regarded as a $q$-ary analogue of the well-known Phelps construction.
△ Less
Submitted 31 October, 2017;
originally announced November 2017.
-
On non-full-rank perfect codes over finite fields
Authors:
Alexander M. Romanov
Abstract:
The paper deals with the perfect 1-error correcting codes over a finite field with $q$ elements (briefly $q$-ary 1-perfect codes). We show that the orthogonal code to the $q$-ary non-full-rank 1-perfect code of length $n = (q^{m}-1)/(q-1)$ is a $q$-ary constant-weight code with Hamming weight equals to $q^{m - 1}$ where $m$ is any natural number not less than two. We derive necessary and sufficien…
▽ More
The paper deals with the perfect 1-error correcting codes over a finite field with $q$ elements (briefly $q$-ary 1-perfect codes). We show that the orthogonal code to the $q$-ary non-full-rank 1-perfect code of length $n = (q^{m}-1)/(q-1)$ is a $q$-ary constant-weight code with Hamming weight equals to $q^{m - 1}$ where $m$ is any natural number not less than two. We derive necessary and sufficient conditions for $q$-ary 1-perfect codes of non-full rank. We suggest a generalization of the concatenation construction to the $q$-ary case and construct the ternary 1-perfect codes of length 13 and rank 12.
△ Less
Submitted 9 April, 2017;
originally announced April 2017.
-
Important New Developments in Arabographic Optical Character Recognition (OCR)
Authors:
Maxim Romanov,
Matthew Thomas Miller,
Sarah Bowen Savant,
Benjamin Kiessling
Abstract:
The OpenITI team has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines. These accuracy rates not only represent a distinct improvement over the actual accuracy rates of the various proprietary OCR o…
▽ More
The OpenITI team has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines. These accuracy rates not only represent a distinct improvement over the actual accuracy rates of the various proprietary OCR options for classical Arabic-script texts, but, equally important, they are produced using an open-source OCR software, thus enabling us to make this Arabic-script OCR technology freely available to the broader Islamic, Persian, and Arabic Studies communities.
△ Less
Submitted 28 March, 2017;
originally announced March 2017.
-
Shamela: A Large-Scale Historical Arabic Corpus
Authors:
Yonatan Belinkov,
Alexander Magidow,
Maxim Romanov,
Avi Shmidman,
Moshe Koppel
Abstract:
Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries. Yet existing Arabic corpora largely focus on the modern period or lack sufficient diachronic information. We develop a large-scale, historical corpus of Arabic of about 1 billion words from diverse periods of time. We clean this corpus, process it with a morphological analyzer, and enhance it by…
▽ More
Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries. Yet existing Arabic corpora largely focus on the modern period or lack sufficient diachronic information. We develop a large-scale, historical corpus of Arabic of about 1 billion words from diverse periods of time. We clean this corpus, process it with a morphological analyzer, and enhance it by detecting parallel passages and automatically dating undated texts. We demonstrate its utility with selected case-studies in which we show its application to the digital humanities.
△ Less
Submitted 28 December, 2016;
originally announced December 2016.
-
Full-Rank Perfect Codes over Finite Fields
Authors:
Alexander M. Romanov
Abstract:
In this paper, we propose a construction of full-rank q-ary 1-perfect codes over finite fields. This construction is a generalization of the Etzion and Vardy construction of full-rank binary 1-perfect codes (1994). Properties of i-components of q-ary Hamming codes are investigated and the construction of full-rank q-ary 1-perfect codes is based on these properties. The switching construction of 1-…
▽ More
In this paper, we propose a construction of full-rank q-ary 1-perfect codes over finite fields. This construction is a generalization of the Etzion and Vardy construction of full-rank binary 1-perfect codes (1994). Properties of i-components of q-ary Hamming codes are investigated and the construction of full-rank q-ary 1-perfect codes is based on these properties. The switching construction of 1-perfect codes are generalized for the q-ary case. We give a generalization of the concept of i-component of 1-perfect codes and introduce the concept of (i,σ)-components of q-ary 1-perfect codes. We also present a generalization of the Lindström and Schönheim construction of q-ary 1-perfect codes and provide a lower bound on the number of pairwise distinct q-ary 1-perfect codes of length n.
△ Less
Submitted 4 October, 2013;
originally announced October 2013.
-
On the admissible families of components of Hamming codes
Authors:
Alexander M. Romanov
Abstract:
In this paper, we describe the properties of the $i$-components of Hamming codes. We suggest constructions of the admissible families of components of Hamming codes. It is shown that every $q$-ary code of length $m$ and minimum distance 5 (for $q = 3$ the minimum distance is 3) can be embedded in a $q$-ary 1-perfect code of length $n = (q^{m}-1)/(q-1)$. It is also shown that every binary code of l…
▽ More
In this paper, we describe the properties of the $i$-components of Hamming codes. We suggest constructions of the admissible families of components of Hamming codes. It is shown that every $q$-ary code of length $m$ and minimum distance 5 (for $q = 3$ the minimum distance is 3) can be embedded in a $q$-ary 1-perfect code of length $n = (q^{m}-1)/(q-1)$. It is also shown that every binary code of length $m + k$ and minimum distance $3k + 3$ can be embedded in a binary 1-perfect code of length $n = 2^{m}-1$.
△ Less
Submitted 1 February, 2012;
originally announced February 2012.