Skip to main content

Showing 1–8 of 8 results for author: Razzhigaev, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.12250  [pdf, other

    cs.LG cs.AI cs.CL

    Your Transformer is Secretly Linear

    Authors: Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Nikolai Gerasimenko, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: This paper reveals a novel linear characteristic exclusive to transformer decoders, including models such as GPT, LLaMA, OPT, BLOOM and others. We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship (Procrustes similarity score of 0.99). However, linearity decreases when the residual component is removed due to a consistently low output norm o… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: 9 pages, 9 figures

  2. arXiv:2404.06212  [pdf, other

    cs.CV cs.AI cs.LG

    OmniFusion Technical Report

    Authors: Elizaveta Goncharova, Anton Razzhigaev, Matvey Mikhalchuk, Maxim Kurkin, Irina Abdullaeva, Matvey Skripkin, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: Last year, multimodal architectures served up a revolution in AI-based approaches and solutions, extending the capabilities of large language models (LLM). We propose an \textit{OmniFusion} model based on a pretrained LLM and adapters for visual modality. We evaluated and compared several architecture design principles for better text and visual data coupling: MLP and transformer adapters, various… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 17 pages, 4 figures, 9 tables, 2 appendices

    MSC Class: 6804; 68T50 (Primary) ACM Class: I.2.7; I.2.10; I.4.9

  3. arXiv:2311.05928  [pdf, other

    cs.CL cs.AI cs.IT cs.LG math.GN

    The Shape of Learning: Anisotropy and Intrinsic Dimensions in Transformer-Based Models

    Authors: Anton Razzhigaev, Matvey Mikhalchuk, Elizaveta Goncharova, Ivan Oseledets, Denis Dimitrov, Andrey Kuznetsov

    Abstract: In this study, we present an investigation into the anisotropy dynamics and intrinsic dimension of embeddings in transformer architectures, focusing on the dichotomy between encoders and decoders. Our findings reveal that the anisotropy profile in transformer decoders exhibits a distinct bell-shaped curve, with the highest anisotropy concentrations in the middle layers. This pattern diverges from… ▽ More

    Submitted 26 February, 2024; v1 submitted 10 November, 2023; originally announced November 2023.

    Comments: Accepted to EACL-2024

  4. arXiv:2310.07008  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Answer Candidate Type Selection: Text-to-Text Language Model for Closed Book Question Answering Meets Knowledge Graphs

    Authors: Mikhail Salnikov, Maria Lysyuk, Pavel Braslavski, Anton Razzhigaev, Valentin Malykh, Alexander Panchenko

    Abstract: Pre-trained Text-to-Text Language Models (LMs), such as T5 or BART yield promising results in the Knowledge Graph Question Answering (KGQA) task. However, the capacity of the models is limited and the quality decreases for questions with less popular entities. In this paper, we present a novel approach which works on top of the pre-trained Text-to-Text QA system to address this issue. Our simple y… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

  5. arXiv:2310.03502  [pdf, other

    cs.CV

    Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

    Authors: Anton Razzhigaev, Arseniy Shakhmatov, Anastasia Maltseva, Vladimir Arkhipkin, Igor Pavlov, Ilya Ryabov, Angelina Kuts, Alexander Panchenko, Andrey Kuznetsov, Denis Dimitrov

    Abstract: Text-to-image generation is a significant domain in modern computer vision and has achieved substantial improvements through the evolution of generative architectures. Among these, there are diffusion-based models that have demonstrated essential quality enhancements. These models are generally split into two categories: pixel-level and latent-level approaches. We present Kandinsky1, a novel explo… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  6. arXiv:2204.10629  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    MEKER: Memory Efficient Knowledge Embedding Representation for Link Prediction and Question Answering

    Authors: Viktoriia Chekalina, Anton Razzhigaev, Albert Sayapin, Evgeny Frolov, Alexander Panchenko

    Abstract: Knowledge Graphs (KGs) are symbolically structured storages of facts. The KG embedding contains concise data used in NLP tasks requiring implicit information about the real world. Furthermore, the size of KGs that may be useful in actual NLP assignments is enormous, and creating embedding over it has memory cost issues. We represent KG as a 3rd-order binary tensor and move beyond the standard CP d… ▽ More

    Submitted 24 May, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

  7. arXiv:2106.14290  [pdf, other

    cs.CV

    Darker than Black-Box: Face Reconstruction from Similarity Queries

    Authors: Anton Razzhigaev, Klim Kireev, Igor Udovichenko, Aleksandr Petiushko

    Abstract: Several methods for inversion of face recognition models were recently presented, attempting to reconstruct a face from deep templates. Although some of these approaches work in a black-box setup using only face embeddings, usually, on the end-user side, only similarity scores are provided. Therefore, these algorithms are inapplicable in such scenarios. We propose a novel approach that allows reco… ▽ More

    Submitted 2 July, 2021; v1 submitted 27 June, 2021; originally announced June 2021.

  8. Black-Box Face Recovery from Identity Features

    Authors: Anton Razzhigaev, Klim Kireev, Edgar Kaziakhmedov, Nurislam Tursynbek, Aleksandr Petiushko

    Abstract: In this work, we present a novel algorithm based on an it-erative sampling of random Gaussian blobs for black-box face recovery, given only an output feature vector of deep face recognition systems. We attack the state-of-the-art face recognition system (ArcFace) to test our algorithm. Another network with different architecture (FaceNet) is used as an independent critic showing that the target pe… ▽ More

    Submitted 30 July, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

    Journal ref: ECCV Workshops (5) 2020: 462-475