Skip to main content

Showing 1–18 of 18 results for author: Zhmoginov, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2301.04584  [pdf, other

    cs.LG cs.CV

    Continual Few-Shot Learning Using HyperTransformers

    Authors: Max Vladymyrov, Andrey Zhmoginov, Mark Sandler

    Abstract: We focus on the problem of learning without forgetting from multiple tasks arriving sequentially, where each task is defined using a few-shot episode of novel or already seen classes. We approach this problem using the recently published HyperTransformer (HT), a Transformer-based hypernetwork that generates specialized task-specific CNN weights directly from the support set. In order to learn from… ▽ More

    Submitted 12 January, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

  2. arXiv:2301.02312  [pdf, other

    cs.LG

    Training trajectories, mini-batch losses and the curious role of the learning rate

    Authors: Mark Sandler, Andrey Zhmoginov, Max Vladymyrov, Nolan Miller

    Abstract: Stochastic gradient descent plays a fundamental role in nearly all applications of deep learning. However its ability to converge to a global minimum remains shrouded in mystery. In this paper we propose to study the behavior of the loss function on fixed mini-batches along SGD trajectories. We show that the loss function on a fixed batch appears to be remarkably convex-like. In particular for Res… ▽ More

    Submitted 1 February, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

    Comments: 21 pages, 14 figures

  3. arXiv:2212.07677  [pdf, other

    cs.LG cs.AI cs.CL

    Transformers learn in-context by gradient descent

    Authors: Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov

    Abstract: At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an intuition. In this paper, we suggest that training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linea… ▽ More

    Submitted 31 May, 2023; v1 submitted 15 December, 2022; originally announced December 2022.

  4. arXiv:2211.15774  [pdf, other

    cs.LG cs.CV

    Decentralized Learning with Multi-Headed Distillation

    Authors: Andrey Zhmoginov, Mark Sandler, Nolan Miller, Gus Kristiansen, Max Vladymyrov

    Abstract: Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxilia… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  5. arXiv:2203.15243  [pdf, other

    cs.CV

    Fine-tuning Image Transformers using Learnable Memory

    Authors: Mark Sandler, Andrey Zhmoginov, Max Vladymyrov, Andrew Jackson

    Abstract: In this paper we propose augmenting Vision Transformer models with learnable memory tokens. Our approach allows the model to adapt to new tasks, using few parameters, while optionally preserving its capabilities on previously learned tasks. At each layer we introduce a set of learnable embedding vectors that provide contextual information useful for specific datasets. We call these "memory tokens"… ▽ More

    Submitted 29 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

    Comments: CVPR 2022, to appear

  6. arXiv:2201.04182  [pdf, other

    cs.LG cs.CV

    HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning

    Authors: Andrey Zhmoginov, Mark Sandler, Max Vladymyrov

    Abstract: In this work we propose a HyperTransformer, a Transformer-based model for supervised and semi-supervised few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity Transformer model, we effectively decouple the complexity of the large task space… ▽ More

    Submitted 13 July, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

  7. arXiv:2107.10963  [pdf, other

    cs.LG cs.CV

    Compositional Models: Multi-Task Learning and Knowledge Transfer with Modular Networks

    Authors: Andrey Zhmoginov, Dina Bashkirova, Mark Sandler

    Abstract: Conditional computation and modular networks have been recently proposed for multitask learning and other problems as a way to decompose problem solving into multiple reusable computational blocks. We propose a new approach for learning modular networks based on the isometric version of ResNet with all residual blocks having the same configuration and the same number of parameters. This architectu… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

  8. arXiv:2105.03014  [pdf, other

    cs.CV

    BasisNet: Two-stage Model Synthesis for Efficient Inference

    Authors: Mingda Zhang, Chun-Te Chu, Andrey Zhmoginov, Andrew Howard, Brendan Jou, Yukun Zhu, Li Zhang, Rebecca Hwa, Adriana Kovashka

    Abstract: In this work, we present BasisNet which combines recent advancements in efficient neural network architectures, conditional computation, and early termination in a simple new form. Our approach incorporates a lightweight model to preview the input and generate input-dependent combination coefficients, which later controls the synthesis of a more accurate specialist model to make final prediction.… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: To appear, 4th Workshop on Efficient Deep Learning for Computer Vision (ECV2021), CVPR2021 Workshop

  9. arXiv:2104.04657  [pdf, other

    cs.LG cs.NE

    Meta-Learning Bidirectional Update Rules

    Authors: Mark Sandler, Max Vladymyrov, Andrey Zhmoginov, Nolan Miller, Andrew Jackson, Tom Madams, Blaise Aguera y Arcas

    Abstract: In this paper, we introduce a new type of generalized neural network where neurons and synapses maintain multiple states. We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule. In our generalized framework, networks… ▽ More

    Submitted 11 June, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: ICML 2021, 17 pages

  10. arXiv:2012.05578  [pdf, other

    cs.LG cs.CV

    Large-Scale Generative Data-Free Distillation

    Authors: Liangchen Luo, Mark Sandler, Zi Lin, Andrey Zhmoginov, Andrew Howard

    Abstract: Knowledge distillation is one of the most popular and effective techniques for knowledge transfer, model compression and semi-supervised learning. Most existing distillation approaches require the access to original or augmented training samples. But this can be problematic in practice due to privacy, proprietary and availability concerns. Recent work has put forward some methods to tackle this pr… ▽ More

    Submitted 10 December, 2020; originally announced December 2020.

  11. arXiv:2008.04965  [pdf, other

    cs.CV cs.LG

    Image segmentation via Cellular Automata

    Authors: Mark Sandler, Andrey Zhmoginov, Liangcheng Luo, Alexander Mordvintsev, Ettore Randazzo, Blaise Agúera y Arcas

    Abstract: In this paper, we propose a new approach for building cellular automata to solve real-world segmentation problems. We design and train a cellular automaton that can successfully segment high-resolution images. We consider a colony that densely inhabits the pixel grid, and all cells are governed by a randomized update that uses the current state, the color, and the state of the $3\times 3$ neighbor… ▽ More

    Submitted 12 August, 2020; v1 submitted 11 August, 2020; originally announced August 2020.

  12. arXiv:1909.03205  [pdf, other

    cs.CV

    Non-discriminative data or weak model? On the relative importance of data and model resolution

    Authors: Mark Sandler, Jonathan Baccash, Andrey Zhmoginov, Andrew Howard

    Abstract: We explore the question of how the resolution of the input image ("input resolution") affects the performance of a neural network when compared to the resolution of the hidden layers ("internal resolution"). Adjusting these characteristics is frequently used as a hyperparameter providing a trade-off between model performance and accuracy. An intuitive interpretation is that the reduced information… ▽ More

    Submitted 17 October, 2019; v1 submitted 7 September, 2019; originally announced September 2019.

    Comments: ICCV 2019 Workshop on Real-World Recognition from Low-Quality Images and Videos

  13. arXiv:1907.09578  [pdf, other

    cs.CV cs.IT cs.LG

    Information-Bottleneck Approach to Salient Region Discovery

    Authors: Andrey Zhmoginov, Ian Fischer, Mark Sandler

    Abstract: We propose a new method for learning image attention masks in a semi-supervised setting based on the Information Bottleneck principle. Provided with a set of labeled images, the mask generation model is minimizing mutual information between the input and the masked image while maximizing the mutual information between the same masked image and the image label. In contrast with other approaches, ou… ▽ More

    Submitted 14 February, 2020; v1 submitted 22 July, 2019; originally announced July 2019.

  14. arXiv:1810.10703  [pdf, ps, other

    cs.LG cs.CV stat.ML

    K for the Price of 1: Parameter-efficient Multi-task and Transfer Learning

    Authors: Pramod Kaushik Mudrakarta, Mark Sandler, Andrey Zhmoginov, Andrew Howard

    Abstract: We introduce a novel method that enables parameter-efficient transfer and multi-task learning with deep neural networks. The basic approach is to learn a model patch - a small set of parameters - that will specialize to each task, instead of fine-tuning the last layer or the entire network. For instance, we show that learning a set of scales and biases is sufficient to convert a pretrained network… ▽ More

    Submitted 23 February, 2019; v1 submitted 24 October, 2018; originally announced October 2018.

    Comments: published at ICLR 2019

  15. arXiv:1801.04381  [pdf, other

    cs.CV

    MobileNetV2: Inverted Residuals and Linear Bottlenecks

    Authors: Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen

    Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic se… ▽ More

    Submitted 21 March, 2019; v1 submitted 12 January, 2018; originally announced January 2018.

    Journal ref: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510-4520

  16. arXiv:1712.02950  [pdf, other

    cs.CV cs.LG stat.ML

    CycleGAN, a Master of Steganography

    Authors: Casey Chu, Andrey Zhmoginov, Mark Sandler

    Abstract: CycleGAN (Zhu et al. 2017) is one recent successful approach to learn a transformation between two image distributions. In a series of experiments, we demonstrate an intriguing property of the model: CycleGAN learns to "hide" information about a source image into the images it generates in a nearly imperceptible, high-frequency signal. This trick ensures that the generator can recover the original… ▽ More

    Submitted 16 December, 2017; v1 submitted 8 December, 2017; originally announced December 2017.

    Comments: NIPS 2017, workshop on Machine Deception

  17. arXiv:1702.06257  [pdf, other

    cs.CV

    The Power of Sparsity in Convolutional Neural Networks

    Authors: Soravit Changpinyo, Mark Sandler, Andrey Zhmoginov

    Abstract: Deep convolutional networks are well-known for their high computational and memory demands. Given limited resources, how does one design a network that balances its size, training time, and prediction accuracy? A surprisingly effective approach to trade accuracy for size and speed is to simply reduce the number of channels in each convolutional layer by a fixed fraction and retrain the network. In… ▽ More

    Submitted 20 February, 2017; originally announced February 2017.

  18. arXiv:1606.04189  [pdf, other

    cs.CV cs.LG cs.NE

    Inverting face embeddings with convolutional neural networks

    Authors: Andrey Zhmoginov, Mark Sandler

    Abstract: Deep neural networks have dramatically advanced the state of the art for many areas of machine learning. Recently they have been shown to have a remarkable ability to generate highly complex visual artifacts such as images and text rather than simply recognize them. In this work we use neural networks to effectively invert low-dimensional face embeddings while producing realistically looking con… ▽ More

    Submitted 7 July, 2016; v1 submitted 13 June, 2016; originally announced June 2016.