Search | arXiv e-print repository

$μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

Authors: Benjamin Thérien, Charles-Étienne Joseph, Boris Knyazev, Edouard Oyallon, Irina Rish, Eugene Belilovsky

Abstract: Learned optimizers (LOs) can significantly reduce the wall-clock training time of neural networks, substantially reducing training costs. However, they often suffer from poor meta-generalization, especially when training networks larger than those seen during meta-training. To address this, we use the recently proposed Maximal Update Parametrization ($μ$P), which allows zero-shot generalization of… ▽ More Learned optimizers (LOs) can significantly reduce the wall-clock training time of neural networks, substantially reducing training costs. However, they often suffer from poor meta-generalization, especially when training networks larger than those seen during meta-training. To address this, we use the recently proposed Maximal Update Parametrization ($μ$P), which allows zero-shot generalization of optimizer hyperparameters from smaller to larger models. We extend $μ$P theory to learned optimizers, treating the meta-training problem as finding the learned optimizer under $μ$P. Our evaluation shows that LOs meta-trained with $μ$P substantially improve meta-generalization as compared to LOs trained under standard parametrization (SP). Notably, when applied to large-width models, our best $μ$LO, trained for 103 GPU-hours, matches or exceeds the performance of VeLO, the largest publicly available learned optimizer, meta-trained with 4000 TPU-months of compute. Moreover, $μ$LOs demonstrate better generalization than their SP counterparts to deeper networks and to much longer training horizons (25 times longer) than those seen during meta-training. △ Less

Submitted 31 May, 2024; originally announced June 2024.

arXiv:2405.16287 [pdf, other]

LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters

Authors: Xinyu Zhou, Boris Knyazev, Alexia Jolicoeur-Martineau, Jie Fu

Abstract: A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to predicting model parameters, have recently shown strong performance in initializing large vis… ▽ More A good initialization of deep learning models is essential since it can help them converge better and faster. However, pretraining large models is unaffordable for many researchers, which makes a desired prediction for initial parameters more necessary nowadays. Graph HyperNetworks (GHNs), one approach to predicting model parameters, have recently shown strong performance in initializing large vision models. Unfortunately, predicting parameters of very wide networks relies on copying small chunks of parameters multiple times and requires an extremely large number of parameters to support full prediction, which greatly hinders its adoption in practice. To address this limitation, we propose LoGAH (Low-rank GrAph Hypernetworks), a GHN with a low-rank parameter decoder that expands to significantly wider networks without requiring as excessive increase of parameters as in previous attempts. LoGAH allows us to predict the parameters of 774-million large neural networks in a memory-efficient manner. We show that vision and language models (i.e., ViT and GPT-2) initialized with LoGAH achieve better performance than those initialized randomly or using existing hypernetworks. Furthermore, we show promising transfer learning results w.r.t. training LoGAH on small datasets and using the predicted parameters to initialize for larger tasks. We provide the codes in https://github.com/Blackzxy/LoGAH . △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: 16 pages

arXiv:2403.12143 [pdf, other]

Graph Neural Networks for Learning Equivariant Representations of Neural Networks

Authors: Miltiadis Kofinas, Boris Knyazev, Yan Zhang, Yunlu Chen, Gertjan J. Burghouts, Efstratios Gavves, Cees G. M. Snoek, David W. Zhang

Abstract: Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors. However, existing approaches either overlook the inherent permutation symmetry in the neural network or rely on intricate weight-sharing patterns to achieve equivariance,… ▽ More Neural networks that process the parameters of other neural networks find applications in domains as diverse as classifying implicit neural representations, generating neural network weights, and predicting generalization errors. However, existing approaches either overlook the inherent permutation symmetry in the neural network or rely on intricate weight-sharing patterns to achieve equivariance, while ignoring the impact of the network architecture itself. In this work, we propose to represent neural networks as computational graphs of parameters, which allows us to harness powerful graph neural networks and transformers that preserve permutation symmetry. Consequently, our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations, predicting generalization performance, and learning to optimize, while consistently outperforming state-of-the-art methods. The source code is open-sourced at https://github.com/mkofinas/neural-graphs. △ Less

Submitted 20 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: In ICLR 2024. Source code: https://github.com/mkofinas/neural-graphs

arXiv:2312.02204 [pdf, other]

Can We Learn Communication-Efficient Optimizers?

Authors: Charles-Étienne Joseph, Benjamin Thérien, Abhinav Moudgil, Boris Knyazev, Eugene Belilovsky

Abstract: Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally, that is on each worker, before averaging model parameters, hel** relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they c… ▽ More Communication-efficient variants of SGD, specifically local SGD, have received a great deal of interest in recent years. These approaches compute multiple gradient steps locally, that is on each worker, before averaging model parameters, hel** relieve the critical communication bottleneck in distributed deep learning training. Although many variants of these approaches have been proposed, they can sometimes lag behind state-of-the-art adaptive optimizers for deep learning. In this work, we investigate if the recent progress in the emerging area of learned optimizers can potentially close this gap while remaining communication-efficient. Specifically, we meta-learn how to perform global updates given an update from local SGD iterations. Our results demonstrate that learned optimizers can substantially outperform local SGD and its sophisticated variants while maintaining their communication efficiency. Learned optimizers can even generalize to unseen and much larger datasets and architectures, including ImageNet and ViTs, and to unseen modalities such as language modeling. We therefore demonstrate the potential of learned optimizers for improving communication-efficient distributed learning. △ Less

Submitted 2 December, 2023; originally announced December 2023.

arXiv:2303.04143 [pdf, other]

Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?

Authors: Boris Knyazev, Doha Hwang, Simon Lacoste-Julien

Abstract: Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for i… ▽ More Pretraining a neural network on a large dataset is becoming a cornerstone in machine learning that is within the reach of only a few communities with large-resources. We aim at an ambitious goal of democratizing pretraining. Towards that goal, we train and release a single neural network that can predict high quality ImageNet parameters of other neural networks. By using predicted parameters for initialization we are able to boost training of diverse ImageNet models available in PyTorch. When transferred to other datasets, models initialized with predicted parameters also converge faster and reach competitive final performance. △ Less

Submitted 31 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

Comments: ICML 2023, camera ready (7 tables with extra results added), code and models are at https://github.com/SamsungSAILMontreal/ghn3

arXiv:2209.14764 [pdf, other]

Model Zoos: A Dataset of Diverse Populations of Neural Network Models

Authors: Konstantin Schürholt, Diyar Taskiran, Boris Knyazev, Xavier Giró-i-Nieto, Damian Borth

Abstract: In the last years, neural networks (NN) have evolved from laboratory environments to the state-of-the-art for many real-world problems. It was shown that NN models (i.e., their weights and biases) evolve on unique trajectories in weight space during training. Following, a population of such neural network models (referred to as model zoo) would form structures in weight space. We think that the ge… ▽ More In the last years, neural networks (NN) have evolved from laboratory environments to the state-of-the-art for many real-world problems. It was shown that NN models (i.e., their weights and biases) evolve on unique trajectories in weight space during training. Following, a population of such neural network models (referred to as model zoo) would form structures in weight space. We think that the geometry, curvature and smoothness of these structures contain information about the state of training and can reveal latent properties of individual models. With such model zoos, one could investigate novel approaches for (i) model analysis, (ii) discover unknown learning dynamics, (iii) learn rich representations of such populations, or (iv) exploit the model zoos for generative modelling of NN weights and biases. Unfortunately, the lack of standardized model zoos and available benchmarks significantly increases the friction for further research about populations of NNs. With this work, we publish a novel dataset of model zoos containing systematically generated and diverse populations of NN models for further research. In total the proposed model zoo dataset is based on eight image datasets, consists of 27 model zoos trained with varying hyperparameter combinations and includes 50'360 unique NN models as well as their sparsified twins, resulting in over 3'844'360 collected model states. Additionally, to the model zoo data we provide an in-depth analysis of the zoos and provide benchmarks for multiple downstream tasks. The dataset can be found at www.modelzoos.cc. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks

arXiv:2209.14733 [pdf, other]

Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights

Authors: Konstantin Schürholt, Boris Knyazev, Xavier Giró-i-Nieto, Damian Borth

Abstract: Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we… ▽ More Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we extend hyper-representations for generative use to sample new model weights. We propose layer-wise loss normalization which we demonstrate is key to generate high-performing models and several sampling methods based on the topology of hyper-representations. The models generated using our methods are diverse, performant and capable to outperform strong baselines as evaluated on several downstream tasks: initialization, ensemble sampling and transfer learning. Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations thereby paving the avenue for novel research directions. △ Less

Submitted 29 September, 2022; originally announced September 2022.

Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022). arXiv admin note: text overlap with arXiv:2207.10951

arXiv:2207.10951 [pdf, other]

Hyper-Representations for Pre-Training and Transfer Learning

Authors: Konstantin Schürholt, Boris Knyazev, Xavier Giró-i-Nieto, Damian Borth

Abstract: Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we… ▽ More Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we extend hyper-representations for generative use to sample new model weights as pre-training. We propose layer-wise loss normalization which we demonstrate is key to generate high-performing models and a sampling method based on the empirical density of hyper-representations. The models generated using our methods are diverse, performant and capable to outperform conventional baselines for transfer learning. Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations thereby paving the avenue for novel research directions. △ Less

Submitted 22 July, 2022; originally announced July 2022.

Journal ref: First Workshop of Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022, Baltimore, Maryland, USA, PMLR 162, 2022

arXiv:2207.10049 [pdf, other]

Pretraining a Neural Network before Knowing Its Architecture

Authors: Boris Knyazev

Abstract: Training large neural networks is possible by training a smaller hypernetwork that predicts parameters for the large ones. A recently released Graph HyperNetwork (GHN) trained this way on one million smaller ImageNet architectures is able to predict parameters for large unseen networks such as ResNet-50. While networks with predicted parameters lose performance on the source task, the predicted pa… ▽ More Training large neural networks is possible by training a smaller hypernetwork that predicts parameters for the large ones. A recently released Graph HyperNetwork (GHN) trained this way on one million smaller ImageNet architectures is able to predict parameters for large unseen networks such as ResNet-50. While networks with predicted parameters lose performance on the source task, the predicted parameters have been found useful for fine-tuning on other tasks. We study if fine-tuning based on the same GHN is still useful on novel strong architectures that were published after the GHN had been trained. We found that for recent architectures such as ConvNeXt, GHN initialization becomes less useful than for ResNet-50. One potential reason is the increased distribution shift of novel architectures from those used to train the GHN. We also found that the predicted parameters lack the diversity necessary to successfully fine-tune parameters with gradient descent. We alleviate this limitation by applying simple post-processing techniques to predicted parameters before fine-tuning them on a target task and improve fine-tuning of ResNet-50 and ConvNeXt. △ Less

Submitted 20 July, 2022; originally announced July 2022.

Comments: Accepted at ICML 2022 Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward, source code is available at https://github.com/facebookresearch/ppuda

arXiv:2201.09871 [pdf, other]

On Evaluation Metrics for Graph Generative Models

Authors: Rylee Thompson, Boris Knyazev, Elahe Ghalebi, Jungtaek Kim, Graham W. Taylor

Abstract: In image generation, generative models can be evaluated naturally by visually inspecting model outputs. However, this is not always the case for graph generative models (GGMs), making their evaluation challenging. Currently, the standard process for evaluating GGMs suffers from three critical limitations: i) it does not produce a single score which makes model selection challenging, ii) in many ca… ▽ More In image generation, generative models can be evaluated naturally by visually inspecting model outputs. However, this is not always the case for graph generative models (GGMs), making their evaluation challenging. Currently, the standard process for evaluating GGMs suffers from three critical limitations: i) it does not produce a single score which makes model selection challenging, ii) in many cases it fails to consider underlying edge and node features, and iii) it is prohibitively slow to perform. In this work, we mitigate these issues by searching for scalar, domain-agnostic, and scalable metrics for evaluating and ranking GGMs. To this end, we study existing GGM metrics and neural-network-based metrics emerging from generative models of images that use embeddings extracted from a task-specific network. Motivated by the power of certain Graph Neural Networks (GNNs) to extract meaningful graph representations without any training, we introduce several metrics based on the features extracted by an untrained random GNN. We design experiments to thoroughly test metrics on their ability to measure the diversity and fidelity of generated graphs, as well as their sample and computational efficiency. Depending on the quantity of samples, we recommend one of two random-GNN-based metrics that we show to be more expressive than pre-existing metrics. While we focus on applying these metrics to GGM evaluation, in practice this enables the ability to easily compute the dissimilarity between any two sets of graphs regardless of domain. Our code is released at: https://github.com/uoguelph-mlrg/GGM-metrics. △ Less

Submitted 27 April, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

Comments: Published as a conference paper at ICLR 2022

arXiv:2110.15481 [pdf, other]

Brick-by-Brick: Combinatorial Construction with Deep Reinforcement Learning

Authors: Hyunsoo Chung, Jungtaek Kim, Boris Knyazev, **hwi Lee, Graham W. Taylor, Jaesik Park, Minsu Cho

Abstract: Discovering a solution in a combinatorial space is prevalent in many real-world problems but it is also challenging due to diverse complex constraints and the vast number of possible combinations. To address such a problem, we introduce a novel formulation, combinatorial construction, which requires a building agent to assemble unit primitives (i.e., LEGO bricks) sequentially -- every connection b… ▽ More Discovering a solution in a combinatorial space is prevalent in many real-world problems but it is also challenging due to diverse complex constraints and the vast number of possible combinations. To address such a problem, we introduce a novel formulation, combinatorial construction, which requires a building agent to assemble unit primitives (i.e., LEGO bricks) sequentially -- every connection between two bricks must follow a fixed rule, while no bricks mutually overlap. To construct a target object, we provide incomplete knowledge about the desired target (i.e., 2D images) instead of exact and explicit volumetric information to the agent. This problem requires a comprehensive understanding of partial information and long-term planning to append a brick sequentially, which leads us to employ reinforcement learning. The approach has to consider a variable-sized action space where a large number of invalid actions, which would cause overlap between bricks, exist. To resolve these issues, our model, dubbed Brick-by-Brick, adopts an action validity prediction network that efficiently filters invalid actions for an actor-critic network. We demonstrate that the proposed method successfully learns to construct an unseen object conditioned on a single image or multiple views of a target object. △ Less

Submitted 28 October, 2021; originally announced October 2021.

Comments: 21 pages, 13 figures, 7 tables. Accepted at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

arXiv:2110.13100 [pdf, other]

Parameter Prediction for Unseen Deep Architectures

Authors: Boris Knyazev, Michal Drozdzal, Graham W. Taylor, Adriana Romero-Soriano

Abstract: Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of di… ▽ More Deep learning has been successful in automating the design of features in machine learning pipelines. However, the algorithms optimizing neural network parameters remain largely hand-designed and computationally inefficient. We study if we can use deep learning to directly predict these parameters by exploiting the past knowledge of training other networks. We introduce a large-scale dataset of diverse computational graphs of neural architectures - DeepNets-1M - and use it to explore parameter prediction on CIFAR-10 and ImageNet. By leveraging advances in graph neural networks, we propose a hypernetwork that can predict performant parameters in a single forward pass taking a fraction of a second, even on a CPU. The proposed model achieves surprisingly good performance on unseen and diverse networks. For example, it is able to predict all 24 million parameters of a ResNet-50 achieving a 60% accuracy on CIFAR-10. On ImageNet, top-5 accuracy of some of our networks approaches 50%. Our task along with the model and results can potentially lead to a new, more computationally efficient paradigm of training networks. Our model also learns a strong representation of neural architectures enabling their analysis. △ Less

Submitted 25 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021 camera ready, the code is available at https://github.com/facebookresearch/ppuda

arXiv:2007.05756 [pdf, other]

Generative Compositional Augmentations for Scene Graph Prediction

Authors: Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky

Abstract: Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. Current scene graph generation models are trained on a tiny fraction of the distribution corresponding to the… ▽ More Inferring objects and their relationships from an image in the form of a scene graph is useful in many applications at the intersection of vision and language. We consider a challenging problem of compositional generalization that emerges in this task due to a long tail data distribution. Current scene graph generation models are trained on a tiny fraction of the distribution corresponding to the most frequent compositions, e.g. <cup, on, table>. However, test images might contain zero- and few-shot compositions of objects and relationships, e.g. <cup, on, surfboard>. Despite each of the object categories and the predicate (e.g. 'on') being frequent in the training data, the models often fail to properly understand such unseen or rare compositions. To improve generalization, it is natural to attempt increasing the diversity of the training distribution. However, in the graph domain this is non-trivial. To that end, we propose a method to synthesize rare yet plausible scene graphs by perturbing real ones. We then propose and empirically study a model based on conditional generative adversarial networks (GANs) that allows us to generate visual features of perturbed scene graphs and learn from them in a joint fashion. When evaluated on the Visual Genome dataset, our approach yields marginal, but consistent improvements in zero- and few-shot metrics. We analyze the limitations of our approach indicating promising directions for future research. △ Less

Submitted 1 October, 2021; v1 submitted 11 July, 2020; originally announced July 2020.

Comments: ICCV 2021 camera ready. Added more baselines, combining GANs with Neural Motifs and t-sne visualizations. Code is available at https://github.com/bknyaz/sgg

arXiv:2005.08230 [pdf, other]

Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation

Authors: Boris Knyazev, Harm de Vries, Cătălina Cangea, Graham W. Taylor, Aaron Courville, Eugene Belilovsky

Abstract: Scene graph generation (SGG) aims to predict graph-structured descriptions of input images, in the form of objects and relationships between them. This task is becoming increasingly useful for progress at the interface of vision and language. Here, it is important - yet challenging - to perform well on novel (zero-shot) or rare (few-shot) compositions of objects and relationships. In this paper, w… ▽ More Scene graph generation (SGG) aims to predict graph-structured descriptions of input images, in the form of objects and relationships between them. This task is becoming increasingly useful for progress at the interface of vision and language. Here, it is important - yet challenging - to perform well on novel (zero-shot) or rare (few-shot) compositions of objects and relationships. In this paper, we identify two key issues that limit such generalization. Firstly, we show that the standard loss used in this task is unintentionally a function of scene graph density. This leads to the neglect of individual edges in large sparse graphs during training, even though these contain diverse few-shot examples that are important for generalization. Secondly, the frequency of relationships can create a strong bias in this task, such that a blind model predicting the most frequent relationship achieves good performance. Consequently, some state-of-the-art models exploit this bias to improve results. We show that such models can suffer the most in their ability to generalize to rare compositions, evaluating two different models on the Visual Genome dataset and its more recent, improved version, GQA. To address these issues, we introduce a density-normalized edge loss, which provides more than a two-fold improvement in certain generalization metrics. Compared to other works in this direction, our enhancements require only a few lines of code and no added computational cost. We also highlight the difficulty of accurately evaluating models using existing metrics, especially on zero/few shots, and introduce a novel weighted metric. △ Less

Submitted 17 August, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

Comments: accepted at BMVC 2020, the code is available at https://github.com/bknyaz/sgg

arXiv:1909.10367 [pdf, other]

Learning Temporal Attention in Dynamic Graphs with Bilinear Interactions

Authors: Boris Knyazev, Carolyn Augusta, Graham W. Taylor

Abstract: Reasoning about graphs evolving over time is a challenging concept in many domains, such as bioinformatics, physics, and social networks. We consider a common case in which edges can be short term interactions (e.g., messaging) or long term structural connections (e.g., friendship). In practice, long term edges are often specified by humans. Human-specified edges can be both expensive to produce a… ▽ More Reasoning about graphs evolving over time is a challenging concept in many domains, such as bioinformatics, physics, and social networks. We consider a common case in which edges can be short term interactions (e.g., messaging) or long term structural connections (e.g., friendship). In practice, long term edges are often specified by humans. Human-specified edges can be both expensive to produce and suboptimal for the downstream task. To alleviate these issues, we propose a model based on temporal point processes and variational autoencoders that learns to infer temporal attention between nodes by observing node communication. As temporal attention drives between-node feature propagation, using the dynamics of node interactions to learn this key component provides more flexibility while simultaneously avoiding issues associated with human-specified edges. We also propose a bilinear transformation layer for pairs of node features instead of concatenation, typically used in prior work, and demonstrate its superior performance in all cases. In experiments on two datasets in the dynamic link prediction task, our model often outperforms the baseline model that requires a human-specified graph. Moreover, our learned attention is semantically interpretable and infers connections similar to actual graphs. △ Less

Submitted 18 June, 2020; v1 submitted 23 September, 2019; originally announced September 2019.

Comments: 15 pages, source code is available at https://github.com/uoguelph-mlrg/LDG

arXiv:1908.10518 [pdf, other]

doi 10.1088/2040-8986/ab877d

Diffraction of Bessel beams on 2D amplitude gratings -- a new branch in the Talbot effect study

Authors: I. A. Kotelnikov, O. E. Kameshkov, B. A. Knyazev

Abstract: In this paper, an analytical theory for the diffraction of a Bessel beam of arbitrary order $J_l(κr)$ on a 2D amplitude grating is presented. The diffraction pattern in the main and fractional Talbot planes under certain conditions is a lattice of annular microbeams, the diameters of which depend on the grating period, the illuminating beam diameter, the number of the Talbot plane, and the topolog… ▽ More In this paper, an analytical theory for the diffraction of a Bessel beam of arbitrary order $J_l(κr)$ on a 2D amplitude grating is presented. The diffraction pattern in the main and fractional Talbot planes under certain conditions is a lattice of annular microbeams, the diameters of which depend on the grating period, the illuminating beam diameter, the number of the Talbot plane, and the topological charge $l$. For the rings near the optical axis, the latter reproduces $l$ of the illuminating beam. Experiments carried out on the Novosibirsk free electron laser at a wavelength $λ= 141$$μ$m using gratings with hole diameters of down to $d \approx 2λ$, as well as, the numerical simulations, well support the theory. Since the Laguerre-Gaussian beams can be represented as a superposition of Bessel beams, results of this paper can be applied to the analysis of the Talbot effect with the Laguerre-Gaussian beams. △ Less

Submitted 27 August, 2019; originally announced August 2019.

arXiv:1907.09000 [pdf, other]

Image Classification with Hierarchical Multigraph Networks

Authors: Boris Knyazev, Xiao Lin, Mohamed R. Amer, Graham W. Taylor

Abstract: Graph Convolutional Networks (GCNs) are a class of general models that can learn from graph structured data. Despite being general, GCNs are admittedly inferior to convolutional neural networks (CNNs) when applied to vision tasks, mainly due to the lack of domain knowledge that is hardcoded into CNNs, such as spatially oriented translation invariant filters. However, a great advantage of GCNs is t… ▽ More Graph Convolutional Networks (GCNs) are a class of general models that can learn from graph structured data. Despite being general, GCNs are admittedly inferior to convolutional neural networks (CNNs) when applied to vision tasks, mainly due to the lack of domain knowledge that is hardcoded into CNNs, such as spatially oriented translation invariant filters. However, a great advantage of GCNs is the ability to work on irregular inputs, such as superpixels of images. This could significantly reduce the computational cost of image reasoning tasks. Another key advantage inherent to GCNs is the natural ability to model multirelational data. Building upon these two promising properties, in this work, we show best practices for designing GCNs for image classification; in some cases even outperforming CNNs on the MNIST, CIFAR-10 and PASCAL image datasets. △ Less

Submitted 21 July, 2019; originally announced July 2019.

Comments: 13 pages, BMVC 2019

arXiv:1905.02850 [pdf, other]

Understanding Attention and Generalization in Graph Neural Networks

Authors: Boris Knyazev, Graham W. Taylor, Mohamed R. Amer

Abstract: We aim to better understand attention over nodes in graph neural networks (GNNs) and identify factors influencing its effectiveness. We particularly focus on the ability of attention GNNs to generalize to larger, more complex or noisy graphs. Motivated by insights from the work on Graph Isomorphism Networks, we design simple graph reasoning tasks that allow us to study attention in a controlled en… ▽ More We aim to better understand attention over nodes in graph neural networks (GNNs) and identify factors influencing its effectiveness. We particularly focus on the ability of attention GNNs to generalize to larger, more complex or noisy graphs. Motivated by insights from the work on Graph Isomorphism Networks, we design simple graph reasoning tasks that allow us to study attention in a controlled environment. We find that under typical conditions the effect of attention is negligible or even harmful, but under certain conditions it provides an exceptional gain in performance of more than 60% in some of our classification tasks. Satisfying these conditions in practice is challenging and often requires optimal initialization or supervised training of attention. We propose an alternative recipe and train attention in a weakly-supervised fashion that approaches the performance of supervised models, and, compared to unsupervised models, improves results on several synthetic as well as real datasets. Source code and datasets are available at https://github.com/bknyaz/graph_attention_pool. △ Less

Submitted 28 October, 2019; v1 submitted 7 May, 2019; originally announced May 2019.

Comments: NeurIPS 2019, camera-ready and supplementary material

arXiv:1811.09595 [pdf, other]

Spectral Multigraph Networks for Discovering and Fusing Relationships in Molecules

Authors: Boris Knyazev, Xiao Lin, Mohamed R. Amer, Graham W. Taylor

Abstract: Spectral Graph Convolutional Networks (GCNs) are a generalization of convolutional networks to learning on graph-structured data. Applications of spectral GCNs have been successful, but limited to a few problems where the graph is fixed, such as shape correspondence and node classification. In this work, we address this limitation by revisiting a particular family of spectral graph networks, Cheby… ▽ More Spectral Graph Convolutional Networks (GCNs) are a generalization of convolutional networks to learning on graph-structured data. Applications of spectral GCNs have been successful, but limited to a few problems where the graph is fixed, such as shape correspondence and node classification. In this work, we address this limitation by revisiting a particular family of spectral graph networks, Chebyshev GCNs, showing its efficacy in solving graph classification tasks with a variable graph structure and size. Chebyshev GCNs restrict graphs to have at most one edge between any pair of nodes. To this end, we propose a novel multigraph network that learns from multi-relational graphs. We model learned edges with abstract meaning and experiment with different ways to fuse the representations extracted from annotated and learned edges, achieving competitive results on a variety of chemical classification benchmarks. △ Less

Submitted 23 November, 2018; originally announced November 2018.

Comments: 11 pages, 5 figures, NIPS 2018 Workshop on Machine Learning for Molecules and Materials

arXiv:1711.04598 [pdf, other]

Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video

Authors: Boris Knyazev, Roman Shvetsov, Natalia Efremova, Artem Kuharenko

Abstract: In this paper we describe a solution to our entry for the emotion recognition challenge EmotiW 2017. We propose an ensemble of several models, which capture spatial and audio features from videos. Spatial features are captured by convolutional neural networks, pretrained on large face recognition datasets. We show that usage of strong industry-level face recognition networks increases the accuracy… ▽ More In this paper we describe a solution to our entry for the emotion recognition challenge EmotiW 2017. We propose an ensemble of several models, which capture spatial and audio features from videos. Spatial features are captured by convolutional neural networks, pretrained on large face recognition datasets. We show that usage of strong industry-level face recognition networks increases the accuracy of emotion recognition. Using our ensemble we improve on the previous best result on the test set by about 1 %, achieving a 60.03 % classification accuracy without any use of visual temporal information. △ Less

Submitted 13 November, 2017; originally announced November 2017.

Comments: 4 pages

arXiv:1606.00611 [pdf, other]

Recursive Autoconvolution for Unsupervised Learning of Convolutional Neural Networks

Authors: Boris Knyazev, Erhardt Barth, Thomas Martinetz

Abstract: In visual recognition tasks, such as image classification, unsupervised learning exploits cheap unlabeled data and can help to solve these tasks more efficiently. We show that the recursive autoconvolution operator, adopted from physics, boosts existing unsupervised methods by learning more discriminative filters. We take well established convolutional neural networks and train their filters layer… ▽ More In visual recognition tasks, such as image classification, unsupervised learning exploits cheap unlabeled data and can help to solve these tasks more efficiently. We show that the recursive autoconvolution operator, adopted from physics, boosts existing unsupervised methods by learning more discriminative filters. We take well established convolutional neural networks and train their filters layer-wise. In addition, based on previous works we design a network which extracts more than 600k features per sample, but with the total number of trainable parameters greatly reduced by introducing shared filters in higher layers. We evaluate our networks on the MNIST, CIFAR-10, CIFAR-100 and STL-10 image classification benchmarks and report several state of the art results among other unsupervised methods. △ Less

Submitted 26 March, 2017; v1 submitted 2 June, 2016; originally announced June 2016.

Comments: 8 pages, accepted to International Joint Conference on Neural Networks (IJCNN 2017)

arXiv:1301.3715 [pdf, ps, other]

doi 10.1103/PhysRevA.87.023828

Diffraction of surface wave on conducting rectangular wedge

Authors: Igor A. Kotelnikov, Vasily V. Gerasimov, Boris A. Knyazev

Abstract: Diffraction of a surface wave on a rectangular wedge with impedance faces is studied using the Sommerfeld-Malyuzhinets technique. An analog of Landau's bypass rule in the theory of plasma waves is introduced for selection of a correct branch of the Sommerfeld integral, and the exact solution is given in terms of imaginary error function. The formula derived is valid both in the near-field and far-… ▽ More Diffraction of a surface wave on a rectangular wedge with impedance faces is studied using the Sommerfeld-Malyuzhinets technique. An analog of Landau's bypass rule in the theory of plasma waves is introduced for selection of a correct branch of the Sommerfeld integral, and the exact solution is given in terms of imaginary error function. The formula derived is valid both in the near-field and far-wave zones. It is shown that a diffracted surface wave is completely scattered into freely propagating electromagnetic waves and neither reflected nor transmitted surface waves are generated in case of bare metals which have positive real part of surface impedance. The scattered waves propagate predominantly at a grazing angle along the direction of propagation of the incident surface wave and mainly in the upper hemisphere regarding the wedge face. The profile of radiated intensity is nonmonotonic and does not resemble the surface wave profile which exponentially evanesces with the distance from the wedge face. Comparison with experiments carried out in the terahertz spectral range at Novosibirsk free electron laser has shown a good agreement of the theory and the experiments. △ Less

Submitted 16 January, 2013; originally announced January 2013.

Comments: 26 pages, 11 figures, 33 references

Journal ref: Phys. Rev. A 87, 023828, 2013

arXiv:physics/9808003 [pdf, ps, other]

doi 10.1016/S0168-9002(98)00453-7

Concentrator of laser energy for thin vapour cloud production near a surface

Authors: P. I. Melnikov, B. A. Knyazev, J. B. Greenly

Abstract: A novel scheme is presented for production of a thin ($<1$ mm) uniform vapor layer over a large surface area ($>100$ cm$^2$) by pulsed laser ablation of a solid surface. Instead of dispersing the laser energy uniformly over the surface, a modified Fabry-Perot interferometer is employed to concentrate the laser energy in very narrow closely-spaced concentric rings. This approach may be optimized… ▽ More A novel scheme is presented for production of a thin ($<1$ mm) uniform vapor layer over a large surface area ($>100$ cm$^2$) by pulsed laser ablation of a solid surface. Instead of dispersing the laser energy uniformly over the surface, a modified Fabry-Perot interferometer is employed to concentrate the laser energy in very narrow closely-spaced concentric rings. This approach may be optimized to minimum total laser energy for the desired vapor density. Furthermore, since the vapor is produced from a small fraction of the total surface area, the local ablation depth is large, which minimized the fraction of surface contamination in the vapor. Key words: laser evaporation, thin gas layer formation. △ Less

Submitted 9 September, 1998; v1 submitted 6 August, 1998; originally announced August 1998.

Comments: 8 pages, 2 figures

Showing 1–23 of 23 results for author: Knyazev, B