Skip to main content

Showing 1–28 of 28 results for author: Kuznetsov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01866  [pdf, other

    cs.CV cs.GR

    Image-GS: Content-Adaptive Image Representation via 2D Gaussians

    Authors: Yunxiang Zhang, Alexandr Kuznetsov, Akshay **dal, Kenneth Chen, Anton Sochenov, Anton Kaplanyan, Qi Sun

    Abstract: Neural image representations have recently emerged as a promising technique for storing, streaming, and rendering visual data. Coupled with learning-based workflows, these novel representations have demonstrated remarkable visual fidelity and memory efficiency. However, existing neural image representations often rely on explicit uniform data structures without content adaptivity or computation-in… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.07906  [pdf, other

    cs.GR

    Hybrid Rendering for Dynamic Scenes

    Authors: Alexandr Kuznetsov, Stavros Diolatzis, Anton Sochenov, Anton Kaplanyan

    Abstract: Despite significant advances in algorithms and hardware, global illumination continues to be a challenge in the real-time domain. Time constraints often force developers to either compromise on the quality of global illumination or disregard it altogether. We take advantage of a common setup in modern games: having a set of a level, which is a static scene with dynamic characters and lighting. We… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  3. N-Dimensional Gaussians for Fitting of High Dimensional Functions

    Authors: Stavros Diolatzis, Tobias Zirr, Alexandr Kuznetsov, Georgios Kopanas, Anton Kaplanyan

    Abstract: In the wake of many new ML-inspired approaches for reconstructing and representing high-quality 3D content, recent hybrid and explicitly learned representations exhibit promising performance and quality characteristics. However, their scaling to higher dimensions is challenging, e.g. when accounting for dynamic content with respect to additional parameters such as material properties, illumination… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: https://www.sdiolatz.info/ndg-fitting/

  4. arXiv:2405.12250  [pdf, other

    cs.LG cs.AI cs.CL

    Your Transformer is Secretly Linear

    Authors: Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Nikolai Gerasimenko, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm o… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 9 pages, 9 figures

  5. arXiv:2404.06212  [pdf, other

    cs.CV cs.AI cs.LG

    OmniFusion Technical Report

    Authors: Elizaveta Goncharova, Anton Razzhigaev, Matvey Mikhalchuk, Maxim Kurkin, Irina Abdullaeva, Matvey Skripkin, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: Last year, multimodal architectures served up a revolution in AI-based approaches and solutions, extending the capabilities of large language models (LLM). We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality. We evaluated and compared several architecture design principles for better text and visual data coupling: MLP and transformer adapters, various… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 17 pages, 4 figures, 9 tables, 2 appendices

    MSC Class: 6804; 68T50 (Primary) ACM Class: I.2.7; I.2.10; I.4.9

  6. arXiv:2312.03511  [pdf, other

    cs.CV cs.LG cs.MM

    Kandinsky 3.0 Technical Report

    Authors: Vladimir Arkhipkin, Andrei Filatov, Viacheslav Vasilev, Anastasia Maltseva, Said Azizov, Igor Pavlov, Julia Agafonova, Andrey Kuznetsov, Denis Dimitrov

    Abstract: We present Kandinsky 3.0, a large-scale text-to-image generation model based on latent diffusion, continuing the series of text-to-image Kandinsky models and reflecting our progress to achieve higher quality and realism of image generation. In this report we describe the architecture of the model, the data collection procedure, the training technique, and the production system for user interaction… ▽ More

    Submitted 28 June, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: Project page: https://ai-forever.github.io/Kandinsky-3

  7. arXiv:2311.13073  [pdf, other

    cs.CV cs.LG cs.MM

    FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline

    Authors: Vladimir Arkhipkin, Zein Shaheen, Viacheslav Vasilev, Elizaveta Dakhova, Andrey Kuznetsov, Denis Dimitrov

    Abstract: Multimedia generation approaches occupy a prominent place in artificial intelligence research. Text-to-image models achieved high-quality results over the last few years. However, video synthesis methods recently started to develop. This paper presents a new two-stage latent diffusion text-to-video generation architecture based on the text-to-image diffusion model. The first stage concerns keyfram… ▽ More

    Submitted 20 December, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Project page: https://ai-forever.github.io/kandinsky-video/

  8. arXiv:2311.05928  [pdf, other

    cs.CL cs.AI cs.IT cs.LG math.GN

    The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

    Authors: Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from… ▽ More

    Submitted 26 February, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: Accepted to EACL-2024

  9. arXiv:2310.03502  [pdf, other

    cs.CV

    Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

    Authors: Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, Andrey Kuznetsov, Denis Dimitrov

    Abstract: Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhancements. These models are generally split into two categories: pixel-level and latent-level approaches. We present Kandinsky1, a novel explo… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  10. arXiv:2309.14471  [pdf, other

    cs.LG cs.AI

    Adapting Double Q-Learning for Continuous Reinforcement Learning

    Authors: Arsenii Kuznetsov

    Abstract: Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques. Most of these techniques rooted in heuristics, primarily addressing the consequences of overestimation rather than its fundamental origins. In this work we present a novel approach to the bias correction, similar in spirit to Double Q-Learning. We propose using a policy in form of a mixture with tw… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  11. arXiv:2306.04288  [pdf, other

    cs.LG cs.CV

    Revising deep learning methods in parking lot occupancy detection

    Authors: Anastasia Martynova, Mikhail Kuznetsov, Vadim Porvatov, Vladislav Tishin, Andrey Kuznetsov, Natalia Semenova, Ksenia Kuznetsova

    Abstract: Parking guidance systems have recently become a popular trend as a part of the smart cities' paradigm of development. The crucial part of such systems is the algorithm allowing drivers to search for available parking lots across regions of interest. The classic approach to this task is based on the application of neural network classifiers to camera records. However, existing systems demonstrate a… ▽ More

    Submitted 12 February, 2024; v1 submitted 7 June, 2023; originally announced June 2023.

    Comments: 15 pages, 7 figures

  12. arXiv:2305.19000  [pdf, other

    cs.CV cs.LG

    Independent Component Alignment for Multi-Task Learning

    Authors: Dmitry Senushkin, Nikolay Patakin, Arseny Kuznetsov, Anton Konushin

    Abstract: In a multi-task learning (MTL) setting, a single model is trained to tackle a diverse set of tasks jointly. Despite rapid progress in the field, MTL remains challenging due to optimization issues such as conflicting and dominating gradients. In this work, we propose using a condition number of a linear system of gradients as a stability criterion of an MTL optimization. We theoretically demonstrat… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Journal ref: CVPR2023

  13. arXiv:2303.16531  [pdf, other

    cs.CV

    RusTitW: Russian Language Text Dataset for Visual Text in-the-Wild Recognition

    Authors: Igor Markov, Sergey Nesteruk, Andrey Kuznetsov, Denis Dimitrov

    Abstract: Information surrounds people in modern life. Text is a very efficient type of information that people use for communication for centuries. However, automated text-in-the-wild recognition remains a challenging problem. The major limitation for a DL system is the lack of training data. For the competitive performance, training set must contain many samples that replicate the real-world cases. While… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: 5 pages, 6 figures, 2 tables

  14. Fuse: In-Situ Sensemaking Support in the Browser

    Authors: Andrew Kuznetsov, Joseph Chee Chang, Nathan Hahn, Napol Rachatasumrit, Bradley Breneisen, Julina Coupland, Aniket Kittur

    Abstract: People spend a significant amount of time trying to make sense of the internet, collecting content from a variety of sources and organizing it to make decisions and achieve their goals. While humans are able to fluidly iterate on collecting and organizing information in their minds, existing tools and approaches introduce significant friction into the process. We introduce Fuse, a browser extensio… ▽ More

    Submitted 31 August, 2022; originally announced August 2022.

  15. Wigglite: Low-cost Information Collection and Triage

    Authors: Michael Xieyang Liu, Andrew Kuznetsov, Yongsung Kim, Joseph Chee Chang, Aniket Kittur, Brad A. Myers

    Abstract: Consumers conducting comparison shop**, researchers making sense of competitive space, and developers looking for code snippets online all face the challenge of capturing the information they find for later use without interrupting their current flow. In addition, during many learning and exploration tasks, people need to externalize their mental context, such as estimating how urgent a topic is… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

  16. arXiv:2202.10784  [pdf, other

    cs.CV cs.AI

    RuCLIP -- new models and experiments: a technical report

    Authors: Alex Shonenkov, Andrey Kuznetsov, Denis Dimitrov, Tatyana Shavrina, Daniil Chesakov, Anastasia Maltseva, Alena Fenogenova, Igor Pavlov, Anton Emelyanov, Sergey Markov, Daria Bakshandaeva, Vera Shybaeva, Andrey Chertok

    Abstract: In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and co… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

  17. arXiv:2202.03046  [pdf, other

    cs.CV

    A new face swap method for image and video domains: a technical report

    Authors: Daniil Chesakov, Anastasia Maltseva, Alexander Groshev, Andrey Kuznetsov, Denis Dimitrov

    Abstract: Deep fake technology became a hot field of research in the last few years. Researchers investigate sophisticated Generative Adversarial Networks (GAN), autoencoders, and other approaches to establish precise and robust algorithms for face swap**. Achieved results show that the deep fake unsupervised synthesis task has problems in terms of the visual quality of generated data. These problems usua… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

  18. arXiv:2111.10974  [pdf, other

    cs.CV cs.AI cs.CL

    Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture

    Authors: Daria Bakshandaeva, Denis Dimitrov, Vladimir Arkhipkin, Alex Shonenkov, Mark Potanin, Denis Karachev, Andrey Kuznetsov, Anton Voronov, Vera Davydova, Elena Tutubalina, Aleksandr Petiushko

    Abstract: Supporting the current trend in the AI community, we present the AI Journey 2021 Challenge called Fusion Brain, the first competition which is targeted to make the universal architecture which could process different modalities (in this case, images, texts, and code) and solve multiple tasks for vision and language. The Fusion Brain Challenge combines the following specific tasks: Code2code Transl… ▽ More

    Submitted 28 December, 2022; v1 submitted 21 November, 2021; originally announced November 2021.

  19. arXiv:2110.13523  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Automating Control of Overestimation Bias for Reinforcement Learning

    Authors: Arsenii Kuznetsov, Alexander Grishin, Artem Tsypin, Arsenii Ashukha, Artur Kadurin, Dmitry Vetrov

    Abstract: Overestimation bias control techniques are used by the majority of high-performing off-policy reinforcement learning algorithms. However, most of these techniques rely on pre-defined bias correction policies that are either not flexible enough or require environment-specific tuning of hyperparameters. In this work, we present a general data-driven approach for the automatic selection of bias contr… ▽ More

    Submitted 28 January, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

  20. arXiv:2105.05977  [pdf, other

    cs.CL cs.AI cs.LG

    Spelling Correction with Denoising Transformer

    Authors: Alex Kuznetsov, Hector Urdiales

    Abstract: We present a novel method of performing spelling correction on short input strings, such as search queries or individual words. At its core lies a procedure for generating artificial typos which closely follow the error patterns manifested by humans. This procedure is used to train the production spelling correction model based on a transformer architecture. This model is currently served in the H… ▽ More

    Submitted 12 May, 2021; originally announced May 2021.

    Comments: 9 pages, 3 figures

  21. arXiv:2104.02789  [pdf, other

    cs.GR cs.LG eess.IV

    NeuMIP: Multi-Resolution Neural Materials

    Authors: Alexandr Kuznetsov, Krishna Mullia, Zexiang Xu, Miloš Hašan, Ravi Ramamoorthi

    Abstract: We propose NeuMIP, a neural method for representing and rendering a variety of material appearances at different scales. Classical prefiltering (mipmap**) methods work well on simple material properties such as diffuse color, but fail to generalize to normals, self-shadowing, fibers or more complex microstructures and reflectances. In this work, we generalize traditional mipmap pyramids to pyram… ▽ More

    Submitted 6 April, 2021; originally announced April 2021.

  22. arXiv:2010.01775  [pdf, other

    cs.GR cs.CV cs.LG

    Photon-Driven Neural Path Guiding

    Authors: Shilin Zhu, Zexiang Xu, Tiancheng Sun, Alexandr Kuznetsov, Mark Meyer, Henrik Wann Jensen, Hao Su, Ravi Ramamoorthi

    Abstract: Although Monte Carlo path tracing is a simple and effective algorithm to synthesize photo-realistic images, it is often very slow to converge to noise-free results when involving complex global illumination. One of the most successful variance-reduction techniques is path guiding, which can learn better distributions for importance sampling to reduce pixel noise. However, previous methods require… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: Keywords: computer graphics, rendering, path tracing, path guiding, machine learning, neural networks, denoising, reconstruction

  23. arXiv:2005.04269  [pdf, other

    cs.LG cs.AI stat.ML

    Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

    Authors: Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov

    Abstract: The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional represent… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: Under review by the International Conference on Machine Learning

  24. arXiv:2004.08606  [pdf, other

    cs.CV

    Feathers dataset for Fine-Grained Visual Categorization

    Authors: Alina Belko, Konstantin Dobratulin, Andrey Kuznetsov

    Abstract: This paper introduces a novel dataset FeatherV1, containing 28,272 images of feathers categorized by 595 bird species. It was created to perform taxonomic identification of bird species by a single feather, which can be applied in amateur and professional ornithology. FeatherV1 is the first publicly available bird's plumage dataset for machine learning, and it can raise interest for a new task in… ▽ More

    Submitted 18 April, 2020; originally announced April 2020.

    Comments: 6 pages, 6 figures, 3 tables

  25. arXiv:1811.03346  [pdf

    cs.DL

    System of Bibliometric Monitoring Sciences in Ukraine

    Authors: Leonid Kostenko, Alexander Zhabin, Alexander Kuznetsov, Elizabeth Kukharchuk, Tatiana Simonenko

    Abstract: The origins of scientometrics (research metrics) were analysed and the lack of attention to elaboration of its methodology was emphasized. The approaches to evaluation of scientific activity outcome were considered and the tendency of transition from formal quantitative indicators to receiving expert conclusion on the basis of bibliometric indicators was noted. The principles of the Leiden manifes… ▽ More

    Submitted 8 November, 2018; originally announced November 2018.

    Comments: 13 pages

  26. arXiv:1808.06199  [pdf, other

    math.CO cs.DM

    Lower bound for the cost of connecting tree with given vertex degree sequence

    Authors: Mikhail Goubko, Alexander Kuznetsov

    Abstract: The optimal connecting network problem generalizes many models of structure optimization known from the literature, including communication and transport network topology design, graph cut and graph clustering, structure identification from data, etc. For the case of connecting trees with the given sequence of vertex degrees, the cost of the optimal tree is shown to be bounded from below by the so… ▽ More

    Submitted 20 August, 2018; v1 submitted 19 August, 2018; originally announced August 2018.

    Comments: 29 pages, 6 figures, 2 tables

    MSC Class: 05C05; 05C07; 05C12; 05C35; 05C50; 68R10; 90C06; 90C22; 90C35; 90C59; 94C15

  27. arXiv:1804.03635  [pdf, other

    cs.CR stat.ML

    Semantic embeddings for program behavior patterns

    Authors: Alexander Chistyakov, Ekaterina Lobacheva, Arseny Kuznetsov, Alexey Romanenko

    Abstract: In this paper, we propose a new feature extraction technique for program execution logs. First, we automatically extract complex patterns from a program's behavior graph. Then, we embed these patterns into a continuous space by training an autoencoder. We evaluate the proposed features on a real-world malicious software detection task. We also find that the embedding space captures interpretable s… ▽ More

    Submitted 10 April, 2018; originally announced April 2018.

    Comments: Published at Workshop track of ICLR 2017

  28. arXiv:1601.02034  [pdf, other

    cs.DB cs.DS

    It's just a matter of perspective(s): Crowd-Powered Consensus Organization of Corpora

    Authors: Ayush Jain, Joon Young Seo, Karan Goel, Andrew Kuznetsov, Aditya Parameswaran, Hari Sundaram

    Abstract: We study the problem of organizing a collection of objects - images, videos - into clusters, using crowdsourcing. This problem is notoriously hard for computers to do automatically, and even with crowd workers, is challenging to orchestrate: (a) workers may cluster based on different latent hierarchies or perspectives; (b) workers may cluster at different granularities even when clustering using t… ▽ More

    Submitted 8 January, 2016; originally announced January 2016.