Skip to main content

Showing 1–23 of 23 results for author: Knyazev, B

.
  1. arXiv:2406.00153  [pdf, other

    cs.LG

    $μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

    Authors: Benjamin Thérien, Charles-Étienne Joseph, Boris Knyazev, Edouard Oyallon, Irina Rish, Eugene Belilovsky

    Abstract: Learned optimizers (LOs) can significantly reduce the wall-clock training time of neural networks, substantially reducing training costs. However, they often suffer from poor meta-generalization, especially when training networks larger than those seen during meta-training. To address this, we use the recently proposed Maximal Update Parametrization ($μ$P), which allows zero-shot generalization of… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  2. arXiv:2405.16287  [pdf, other

    cs.LG

    LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters

    Authors: Xinyu Zhou, Boris Knyazev, Alexia Jolicoeur-Martineau, Jie Fu

    Abstract: A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to predicting model parameters, have recently shown strong performance in initializing large vis… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 16 pages

  3. arXiv:2403.12143  [pdf, other

    cs.LG cs.AI stat.ML

    Graph Neural Networks for Learning Equivariant Representations of Neural Networks

    Authors: Miltiadis Kofinas, Boris Knyazev, Yan Zhang, Yunlu Chen, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, David W. Zhang

    Abstract: Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors. However, existing approaches either overlook the inherent permutation symmetry in the neural network or rely on intricate weight-sharing patterns to achieve equivariance,… ▽ More

    Submitted 20 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: In ICLR 2024. Source code: https://github.com/mkofinas/neural-graphs

  4. arXiv:2312.02204  [pdf, other

    cs.LG

    Can We Learn Communication-Efficient Optimizers?

    Authors: Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

    Abstract: Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally, that is on each worker, before averaging model parameters, hel** relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they c… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  5. arXiv:2303.04143  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

    Authors: Boris Knyazev, Doha Hwang, Simon Lacoste-Julien

    Abstract: Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for i… ▽ More

    Submitted 31 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: ICML 2023, camera ready (7 tables with extra results added), code and models are at https://github.com/SamsungSAILMontreal/ghn3

  6. arXiv:2209.14764  [pdf, other

    cs.LG

    Model Zoos: A Dataset of Diverse Populations of Neural Network Models

    Authors: Konstantin Schürholt, Diyar Taskiran, Boris Knyazev, Xavier Giró-i-Nieto, Damian Borth

    Abstract: In the last years, neural networks (NN) have evolved from laboratory environments to the state-of-the-art for many real-world problems. It was shown that NN models (i.e., their weights and biases) evolve on unique trajectories in weight space during training. Following, a population of such neural network models (referred to as model zoo) would form structures in weight space. We think that the ge… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks

  7. arXiv:2209.14733  [pdf, other

    cs.LG cs.CV

    Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights

    Authors: Konstantin Schürholt, Boris Knyazev, Xavier Giró-i-Nieto, Damian Borth

    Abstract: Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022). arXiv admin note: text overlap with arXiv:2207.10951

  8. arXiv:2207.10951  [pdf, other

    cs.LG

    Hyper-Representations for Pre-Training and Transfer Learning

    Authors: Konstantin Schürholt, Boris Knyazev, Xavier Giró-i-Nieto, Damian Borth

    Abstract: Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Journal ref: First Workshop of Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022, Baltimore, Maryland, USA, PMLR 162, 2022

  9. arXiv:2207.10049  [pdf, other

    cs.CV cs.AI cs.LG

    Pretraining a Neural Network before Knowing Its Architecture

    Authors: Boris Knyazev

    Abstract: Training large neural networks is possible by training a smaller hypernetwork that predicts parameters for the large ones. A recently released Graph HyperNetwork (GHN) trained this way on one million smaller ImageNet architectures is able to predict parameters for large unseen networks such as ResNet-50. While networks with predicted parameters lose performance on the source task, the predicted pa… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

    Comments: Accepted at ICML 2022 Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward, source code is available at https://github.com/facebookresearch/ppuda

  10. arXiv:2201.09871  [pdf, other

    cs.LG cs.AI

    On Evaluation Metrics for Graph Generative Models

    Authors: Rylee Thompson, Boris Knyazev, Elahe Ghalebi, Jungtaek Kim, Graham W. Taylor

    Abstract: In image generation, generative models can be evaluated naturally by visually inspecting model outputs. However, this is not always the case for graph generative models (GGMs), making their evaluation challenging. Currently, the standard process for evaluating GGMs suffers from three critical limitations: i) it does not produce a single score which makes model selection challenging, ii) in many ca… ▽ More

    Submitted 27 April, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

    Comments: Published as a conference paper at ICLR 2022

  11. arXiv:2110.15481  [pdf, other

    cs.LG stat.ML

    Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning

    Authors: Hyunsoo Chung, Jungtaek Kim, Boris Knyazev, **hwi Lee, Graham W. Taylor, Jaesik Park, Minsu Cho

    Abstract: Discovering a solution in a combinatorial space is prevalent in many real-world problems but it is also challenging due to diverse complex constraints and the vast number of possible combinations. To address such a problem, we introduce a novel formulation, combinatorial construction, which requires a building agent to assemble unit primitives (i.e., LEGO bricks) sequentially -- every connection b… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: 21 pages, 13 figures, 7 tables. Accepted at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  12. arXiv:2110.13100  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Parameter Prediction for Unseen Deep Architectures

    Authors: Boris Knyazev, Michal Drozdzal, Graham W. Taylor, Adriana Romero-Soriano

    Abstract: Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of di… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021 camera ready, the code is available at https://github.com/facebookresearch/ppuda

  13. arXiv:2007.05756  [pdf, other

    cs.CV cs.LG stat.ML

    Generative Compositional Augmentations for Scene Graph Prediction

    Authors: Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky

    Abstract: Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. Current scene graph generation models are trained on a tiny fraction of the distribution corresponding to the… ▽ More

    Submitted 1 October, 2021; v1 submitted 11 July, 2020; originally announced July 2020.

    Comments: ICCV 2021 camera ready. Added more baselines, combining GANs with Neural Motifs and t-sne visualizations. Code is available at https://github.com/bknyaz/sgg

  14. arXiv:2005.08230  [pdf, other

    cs.CV cs.LG

    Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation

    Authors: Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky

    Abstract: Scene graph generation (SGG) aims to predict graph-structured descriptions of input images, in the form of objects and relationships between them. This task is becoming increasingly useful for progress at the interface of vision and language. Here, it is important - yet challenging - to perform well on novel (zero-shot) or rare (few-shot) compositions of objects and relationships. In this paper, w… ▽ More

    Submitted 17 August, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

    Comments: accepted at BMVC 2020, the code is available at https://github.com/bknyaz/sgg

  15. arXiv:1909.10367  [pdf, other

    stat.ML cs.AI cs.LG

    Learning Temporal Attention in Dynamic Graphs with Bilinear Interactions

    Authors: Boris Knyazev, Carolyn Augusta, Graham W. Taylor

    Abstract: Reasoning about graphs evolving over time is a challenging concept in many domains, such as bioinformatics, physics, and social networks. We consider a common case in which edges can be short term interactions (e.g., messaging) or long term structural connections (e.g., friendship). In practice, long term edges are often specified by humans. Human-specified edges can be both expensive to produce a… ▽ More

    Submitted 18 June, 2020; v1 submitted 23 September, 2019; originally announced September 2019.

    Comments: 15 pages, source code is available at https://github.com/uoguelph-mlrg/LDG

  16. Diffraction of Bessel beams on 2D amplitude gratings -- a new branch in the Talbot effect study

    Authors: I. A. Kotelnikov, O. E. Kameshkov, B. A. Knyazev

    Abstract: In this paper, an analytical theory for the diffraction of a Bessel beam of arbitrary order $J_l(κr)$ on a 2D amplitude grating is presented. The diffraction pattern in the main and fractional Talbot planes under certain conditions is a lattice of annular microbeams, the diameters of which depend on the grating period, the illuminating beam diameter, the number of the Talbot plane, and the topolog… ▽ More

    Submitted 27 August, 2019; originally announced August 2019.

  17. arXiv:1907.09000  [pdf, other

    cs.CV cs.LG

    Image Classification with Hierarchical Multigraph Networks

    Authors: Boris Knyazev, Xiao Lin, Mohamed R. Amer, Graham W. Taylor

    Abstract: Graph Convolutional Networks (GCNs) are a class of general models that can learn from graph structured data. Despite being general, GCNs are admittedly inferior to convolutional neural networks (CNNs) when applied to vision tasks, mainly due to the lack of domain knowledge that is hardcoded into CNNs, such as spatially oriented translation invariant filters. However, a great advantage of GCNs is t… ▽ More

    Submitted 21 July, 2019; originally announced July 2019.

    Comments: 13 pages, BMVC 2019

  18. arXiv:1905.02850  [pdf, other

    cs.LG cs.AI stat.ML

    Understanding Attention and Generalization in Graph Neural Networks

    Authors: Boris Knyazev, Graham W. Taylor, Mohamed R. Amer

    Abstract: We aim to better understand attention over nodes in graph neural networks (GNNs) and identify factors influencing its effectiveness. We particularly focus on the ability of attention GNNs to generalize to larger, more complex or noisy graphs. Motivated by insights from the work on Graph Isomorphism Networks, we design simple graph reasoning tasks that allow us to study attention in a controlled en… ▽ More

    Submitted 28 October, 2019; v1 submitted 7 May, 2019; originally announced May 2019.

    Comments: NeurIPS 2019, camera-ready and supplementary material

  19. arXiv:1811.09595  [pdf, other

    cs.LG cs.AI stat.ML

    Spectral Multigraph Networks for Discovering and Fusing Relationships in Molecules

    Authors: Boris Knyazev, Xiao Lin, Mohamed R. Amer, Graham W. Taylor

    Abstract: Spectral Graph Convolutional Networks (GCNs) are a generalization of convolutional networks to learning on graph-structured data. Applications of spectral GCNs have been successful, but limited to a few problems where the graph is fixed, such as shape correspondence and node classification. In this work, we address this limitation by revisiting a particular family of spectral graph networks, Cheby… ▽ More

    Submitted 23 November, 2018; originally announced November 2018.

    Comments: 11 pages, 5 figures, NIPS 2018 Workshop on Machine Learning for Molecules and Materials

  20. arXiv:1711.04598  [pdf, other

    cs.CV

    Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video

    Authors: Boris Knyazev, Roman Shvetsov, Natalia Efremova, Artem Kuharenko

    Abstract: In this paper we describe a solution to our entry for the emotion recognition challenge EmotiW 2017. We propose an ensemble of several models, which capture spatial and audio features from videos. Spatial features are captured by convolutional neural networks, pretrained on large face recognition datasets. We show that usage of strong industry-level face recognition networks increases the accuracy… ▽ More

    Submitted 13 November, 2017; originally announced November 2017.

    Comments: 4 pages

  21. arXiv:1606.00611  [pdf, other

    cs.CV cs.LG cs.NE

    Recursive Autoconvolution for Unsupervised Learning of Convolutional Neural Networks

    Authors: Boris Knyazev, Erhardt Barth, Thomas Martinetz

    Abstract: In visual recognition tasks, such as image classification, unsupervised learning exploits cheap unlabeled data and can help to solve these tasks more efficiently. We show that the recursive autoconvolution operator, adopted from physics, boosts existing unsupervised methods by learning more discriminative filters. We take well established convolutional neural networks and train their filters layer… ▽ More

    Submitted 26 March, 2017; v1 submitted 2 June, 2016; originally announced June 2016.

    Comments: 8 pages, accepted to International Joint Conference on Neural Networks (IJCNN 2017)

  22. arXiv:1301.3715  [pdf, ps, other

    physics.optics math-ph

    Diffraction of surface wave on conducting rectangular wedge

    Authors: Igor A. Kotelnikov, Vasily V. Gerasimov, Boris A. Knyazev

    Abstract: Diffraction of a surface wave on a rectangular wedge with impedance faces is studied using the Sommerfeld-Malyuzhinets technique. An analog of Landau's bypass rule in the theory of plasma waves is introduced for selection of a correct branch of the Sommerfeld integral, and the exact solution is given in terms of imaginary error function. The formula derived is valid both in the near-field and far-… ▽ More

    Submitted 16 January, 2013; originally announced January 2013.

    Comments: 26 pages, 11 figures, 33 references

    Journal ref: Phys. Rev. A 87, 023828, 2013

  23. arXiv:physics/9808003  [pdf, ps, other

    physics.acc-ph physics.optics

    Concentrator of laser energy for thin vapour cloud production near a surface

    Authors: P. I. Melnikov, B. A. Knyazev, J. B. Greenly

    Abstract: A novel scheme is presented for production of a thin ($<1$ mm) uniform vapor layer over a large surface area ($>100$ cm$^2$) by pulsed laser ablation of a solid surface. Instead of dispersing the laser energy uniformly over the surface, a modified Fabry-Perot interferometer is employed to concentrate the laser energy in very narrow closely-spaced concentric rings. This approach may be optimized… ▽ More

    Submitted 9 September, 1998; v1 submitted 6 August, 1998; originally announced August 1998.

    Comments: 8 pages, 2 figures