-
Equilibrium Aggregation: Encoding Sets via Optimization
Authors:
Sergey Bartunov,
Fabian B. Fuchs,
Timothy Lillicrap
Abstract:
Processing sets or other unordered, potentially variable-sized inputs in neural networks is usually handled by aggregating a number of input tensors into a single representation. While a number of aggregation methods already exist from simple sum pooling to multi-head attention, they are limited in their representational power both from theoretical and empirical perspectives. On the search of a pr…
▽ More
Processing sets or other unordered, potentially variable-sized inputs in neural networks is usually handled by aggregating a number of input tensors into a single representation. While a number of aggregation methods already exist from simple sum pooling to multi-head attention, they are limited in their representational power both from theoretical and empirical perspectives. On the search of a principally more powerful aggregation strategy, we propose an optimization-based method called Equilibrium Aggregation. We show that many existing aggregation methods can be recovered as special cases of Equilibrium Aggregation and that it is provably more efficient in some important cases. Equilibrium Aggregation can be used as a drop-in replacement in many existing architectures and applications. We validate its efficiency on three different tasks: median estimation, class counting, and molecular property prediction. In all experiments, Equilibrium Aggregation achieves higher performance than the other aggregation techniques we test.
△ Less
Submitted 3 July, 2022; v1 submitted 25 February, 2022;
originally announced February 2022.
-
Learning a Large Neighborhood Search Algorithm for Mixed Integer Programs
Authors:
Nicolas Sonnerat,
Pengming Wang,
Ira Ktena,
Sergey Bartunov,
Vinod Nair
Abstract:
Large Neighborhood Search (LNS) is a combinatorial optimization heuristic that starts with an assignment of values for the variables to be optimized, and iteratively improves it by searching a large neighborhood around the current assignment. In this paper we consider a learning-based LNS approach for mixed integer programs (MIPs). We train a Neural Diving model to represent a probability distribu…
▽ More
Large Neighborhood Search (LNS) is a combinatorial optimization heuristic that starts with an assignment of values for the variables to be optimized, and iteratively improves it by searching a large neighborhood around the current assignment. In this paper we consider a learning-based LNS approach for mixed integer programs (MIPs). We train a Neural Diving model to represent a probability distribution over assignments, which, together with an off-the-shelf MIP solver, generates an initial assignment. Formulating the subsequent search steps as a Markov Decision Process, we train a Neural Neighborhood Selection policy to select a search neighborhood at each step, which is searched using a MIP solver to find the next assignment. The policy network is trained using imitation learning. We propose a target policy for imitation that, given enough compute resources, is guaranteed to select the neighborhood containing the optimal next assignment amongst all possible choices for the neighborhood of a specified size. Our approach matches or outperforms all the baselines on five real-world MIP datasets with large-scale instances from diverse applications, including two production applications at Google. It achieves $2\times$ to $37.8\times$ better average primal gap than the best baseline on three of the datasets at large running times.
△ Less
Submitted 20 May, 2022; v1 submitted 21 July, 2021;
originally announced July 2021.
-
Computer-Aided Design as Language
Authors:
Yaroslav Ganin,
Sergey Bartunov,
Yujia Li,
Ethan Keller,
Stefano Saliceti
Abstract:
Computer-Aided Design (CAD) applications are used in manufacturing to model everything from coffee mugs to sports cars. These programs are complex and require years of training and experience to master. A component of all CAD models particularly difficult to make are the highly structured 2D sketches that lie at the heart of every 3D construction. In this work, we propose a machine learning model…
▽ More
Computer-Aided Design (CAD) applications are used in manufacturing to model everything from coffee mugs to sports cars. These programs are complex and require years of training and experience to master. A component of all CAD models particularly difficult to make are the highly structured 2D sketches that lie at the heart of every 3D construction. In this work, we propose a machine learning model capable of automatically generating such sketches. Through this, we pave the way for develo** intelligent tools that would help engineers create better designs with less effort. Our method is a combination of a general-purpose language modeling technique alongside an off-the-shelf data serialization protocol. We show that our approach has enough flexibility to accommodate the complexity of the domain and performs well for both unconditional synthesis and image-to-sketch translation.
△ Less
Submitted 6 May, 2021;
originally announced May 2021.
-
Solving Mixed Integer Programs Using Neural Networks
Authors:
Vinod Nair,
Sergey Bartunov,
Felix Gimeno,
Ingrid von Glehn,
Pawel Lichocki,
Ivan Lobov,
Brendan O'Donoghue,
Nicolas Sonnerat,
Christian Tjandraatmadja,
Pengming Wang,
Ravichandra Addanki,
Tharindi Hapuarachchi,
Thomas Keck,
James Keeling,
Pushmeet Kohli,
Ira Ktena,
Yujia Li,
Oriol Vinyals,
Yori Zwols
Abstract:
Mixed Integer Programming (MIP) solvers rely on an array of sophisticated heuristics developed with decades of research to solve large-scale MIP instances encountered in practice. Machine learning offers to automatically construct better heuristics from data by exploiting shared structure among instances in the data. This paper applies learning to the two key sub-tasks of a MIP solver, generating…
▽ More
Mixed Integer Programming (MIP) solvers rely on an array of sophisticated heuristics developed with decades of research to solve large-scale MIP instances encountered in practice. Machine learning offers to automatically construct better heuristics from data by exploiting shared structure among instances in the data. This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. Neural Diving learns a deep neural network to generate multiple partial assignments for its integer variables, and the resulting smaller MIPs for un-assigned variables are solved with SCIP to construct high quality joint assignments. Neural Branching learns a deep neural network to make variable selection decisions in branch-and-bound to bound the objective value gap with a small tree. This is done by imitating a new variant of Full Strong Branching we propose that scales to large instances using GPUs. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each. Most instances in all the datasets combined have $10^3-10^6$ variables and constraints after presolve, which is significantly larger than previous learning approaches. Comparing solvers with respect to primal-dual gap averaged over a held-out set of instances, the learning-augmented SCIP is 2x to 10x better on all datasets except one on which it is $10^5$x better, at large time limits. To the best of our knowledge, ours is the first learning approach to demonstrate such large improvements over SCIP on both large-scale real-world application datasets and MIPLIB.
△ Less
Submitted 29 July, 2021; v1 submitted 23 December, 2020;
originally announced December 2020.
-
Meta-Learning Deep Energy-Based Memory Models
Authors:
Sergey Bartunov,
Jack W Rae,
Simon Osindero,
Timothy P Lillicrap
Abstract:
We study the problem of learning associative memory -- a system which is able to retrieve a remembered pattern based on its distorted or incomplete version. Attractor networks provide a sound model of associative memory: patterns are stored as attractors of the network dynamics and associative retrieval is performed by running the dynamics starting from a query pattern until it converges to an att…
▽ More
We study the problem of learning associative memory -- a system which is able to retrieve a remembered pattern based on its distorted or incomplete version. Attractor networks provide a sound model of associative memory: patterns are stored as attractors of the network dynamics and associative retrieval is performed by running the dynamics starting from a query pattern until it converges to an attractor. In such models the dynamics are often implemented as an optimization procedure that minimizes an energy function, such as in the classical Hopfield network. In general it is difficult to derive a writing rule for a given dynamics and energy that is both compressive and fast. Thus, most research in energy-based memory has been limited either to tractable energy models not expressive enough to handle complex high-dimensional objects such as natural images, or to models that do not offer fast writing. We present a novel meta-learning approach to energy-based memory models (EBMM) that allows one to use an arbitrary neural architecture as an energy model and quickly store patterns in its weights. We demonstrate experimentally that our EBMM approach can build compressed memories for synthetic and natural data, and is capable of associative retrieval that outperforms existing memory systems in terms of the reconstruction error and compression rate.
△ Less
Submitted 20 April, 2021; v1 submitted 7 October, 2019;
originally announced October 2019.
-
Meta-Learning Neural Bloom Filters
Authors:
Jack W Rae,
Sergey Bartunov,
Timothy P Lillicrap
Abstract:
There has been a recent trend in training neural networks to replace data structures that have been crafted by hand, with an aim for faster execution, better accuracy, or greater compression. In this setting, a neural data structure is instantiated by training a network over many epochs of its inputs until convergence. In applications where inputs arrive at high throughput, or are ephemeral, train…
▽ More
There has been a recent trend in training neural networks to replace data structures that have been crafted by hand, with an aim for faster execution, better accuracy, or greater compression. In this setting, a neural data structure is instantiated by training a network over many epochs of its inputs until convergence. In applications where inputs arrive at high throughput, or are ephemeral, training a network from scratch is not practical. This motivates the need for few-shot neural data structures. In this paper we explore the learning of approximate set membership over a set of data in one-shot via meta-learning. We propose a novel memory architecture, the Neural Bloom Filter, which is able to achieve significant compression gains over classical Bloom Filters and existing memory-augmented neural networks.
△ Less
Submitted 10 June, 2019;
originally announced June 2019.
-
Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures
Authors:
Sergey Bartunov,
Adam Santoro,
Blake A. Richards,
Luke Marris,
Geoffrey E. Hinton,
Timothy Lillicrap
Abstract:
The backpropagation of error algorithm (BP) is impossible to implement in a real brain. The recent success of deep networks in machine learning and AI, however, has inspired proposals for understanding how the brain might learn across multiple layers, and hence how it might approximate BP. As of yet, none of these proposals have been rigorously evaluated on tasks where BP-guided deep learning has…
▽ More
The backpropagation of error algorithm (BP) is impossible to implement in a real brain. The recent success of deep networks in machine learning and AI, however, has inspired proposals for understanding how the brain might learn across multiple layers, and hence how it might approximate BP. As of yet, none of these proposals have been rigorously evaluated on tasks where BP-guided deep learning has proved critical, or in architectures more structured than simple fully-connected networks. Here we present results on scaling up biologically motivated models of deep learning on datasets which need deep networks with appropriate architectures to achieve good performance. We present results on the MNIST, CIFAR-10, and ImageNet datasets and explore variants of target-propagation (TP) and feedback alignment (FA) algorithms, and explore performance in both fully- and locally-connected architectures. We also introduce weight-transport-free variants of difference target propagation (DTP) modified to remove backpropagation from the penultimate layer. Many of these algorithms perform well for MNIST, but for CIFAR and ImageNet we find that TP and FA variants perform significantly worse than BP, especially for networks composed of locally connected units, opening questions about whether new architectures and algorithms are required to scale these approaches. Our results and implementation details help establish baselines for biologically motivated deep learning schemes going forward.
△ Less
Submitted 20 November, 2018; v1 submitted 12 July, 2018;
originally announced July 2018.
-
Adaptive Cardinality Estimation
Authors:
Oleg Ivanov,
Sergey Bartunov
Abstract:
In this paper we address cardinality estimation problem which is an important subproblem in query optimization. Query optimization is a part of every relational DBMS responsible for finding the best way of the execution for the given query. These ways are called plans. The execution time of different plans may differ by several orders, so query optimizer has a great influence on the whole DBMS per…
▽ More
In this paper we address cardinality estimation problem which is an important subproblem in query optimization. Query optimization is a part of every relational DBMS responsible for finding the best way of the execution for the given query. These ways are called plans. The execution time of different plans may differ by several orders, so query optimizer has a great influence on the whole DBMS performance. We consider cost-based query optimization approach as the most popular one. It was observed that cost-based optimization quality depends much on cardinality estimation quality. Cardinality of the plan node is the number of tuples returned by it.
In the paper we propose a novel cardinality estimation approach with the use of machine learning methods. The main point of the approach is using query execution statistics of the previously executed queries to improve cardinality estimations. We called this approach adaptive cardinality estimation to reflect this point. The approach is general, flexible, and easy to implement. The experimental evaluation shows that this approach significantly increases the quality of cardinality estimation, and therefore increases the DBMS performance for some queries by several times or even by several dozens of times.
△ Less
Submitted 22 November, 2017;
originally announced November 2017.
-
StarCraft II: A New Challenge for Reinforcement Learning
Authors:
Oriol Vinyals,
Timo Ewalds,
Sergey Bartunov,
Petko Georgiev,
Alexander Sasha Vezhnevets,
Michelle Yeo,
Alireza Makhzani,
Heinrich Küttler,
John Agapiou,
Julian Schrittwieser,
John Quan,
Stephen Gaffney,
Stig Petersen,
Karen Simonyan,
Tom Schaul,
Hado van Hasselt,
David Silver,
Timothy Lillicrap,
Kevin Calderone,
Paul Keet,
Anthony Brunasso,
David Lawrence,
Anders Ekermo,
Jacob Repp,
Rodney Tsing
Abstract:
This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially o…
▽ More
This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.
△ Less
Submitted 16 August, 2017;
originally announced August 2017.
-
Fast Adaptation in Generative Models with Generative Matching Networks
Authors:
Sergey Bartunov,
Dmitry P. Vetrov
Abstract:
Despite recent advances, the remaining bottlenecks in deep generative models are necessity of extensive training and difficulties with generalization from small number of training examples. We develop a new generative model called Generative Matching Network which is inspired by the recently proposed matching networks for one-shot learning in discriminative tasks. By conditioning on the additional…
▽ More
Despite recent advances, the remaining bottlenecks in deep generative models are necessity of extensive training and difficulties with generalization from small number of training examples. We develop a new generative model called Generative Matching Network which is inspired by the recently proposed matching networks for one-shot learning in discriminative tasks. By conditioning on the additional input dataset, our model can instantly learn new concepts that were not available in the training data but conform to a similar generative process. The proposed framework does not explicitly restrict diversity of the conditioning data and also does not require an extensive inference procedure for training or adaptation. Our experiments on the Omniglot dataset demonstrate that Generative Matching Networks significantly improve predictive performance on the fly as more additional data is available and outperform existing state of the art conditional generative models.
△ Less
Submitted 5 September, 2017; v1 submitted 7 December, 2016;
originally announced December 2016.
-
One-shot Learning with Memory-Augmented Neural Networks
Authors:
Adam Santoro,
Sergey Bartunov,
Matthew Botvinick,
Daan Wierstra,
Timothy Lillicrap
Abstract:
Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of "one-shot learning." Traditional gradient-based networks require a lot of data to learn, often through extensive iterative training. When new data is encountered, the models must inefficiently relearn their parameters to adequately incorporate the new information wi…
▽ More
Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of "one-shot learning." Traditional gradient-based networks require a lot of data to learn, often through extensive iterative training. When new data is encountered, the models must inefficiently relearn their parameters to adequately incorporate the new information without catastrophic interference. Architectures with augmented memory capacities, such as Neural Turing Machines (NTMs), offer the ability to quickly encode and retrieve new information, and hence can potentially obviate the downsides of conventional models. Here, we demonstrate the ability of a memory-augmented neural network to rapidly assimilate new data, and leverage this data to make accurate predictions after only a few samples. We also introduce a new method for accessing an external memory that focuses on memory content, unlike previous methods that additionally use memory location-based focusing mechanisms.
△ Less
Submitted 19 May, 2016;
originally announced May 2016.
-
Breaking Sticks and Ambiguities with Adaptive Skip-gram
Authors:
Sergey Bartunov,
Dmitry Kondrashkin,
Anton Osokin,
Dmitry Vetrov
Abstract:
Recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words. However, Skip-gram as well as most prior work on learning word representations does not take into account word ambiguity and maintain only single representation per word. Although a number of Skip-gram modifications were proposed to overc…
▽ More
Recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words. However, Skip-gram as well as most prior work on learning word representations does not take into account word ambiguity and maintain only single representation per word. Although a number of Skip-gram modifications were proposed to overcome this limitation and learn multi-prototype word representations, they either require a known number of word meanings or learn them using greedy heuristic approaches. In this paper we propose the Adaptive Skip-gram model which is a nonparametric Bayesian extension of Skip-gram capable to automatically learn the required number of representations for all words at desired semantic resolution. We derive efficient online variational learning algorithm for the model and empirically demonstrate its efficiency on word-sense induction task.
△ Less
Submitted 15 November, 2015; v1 submitted 25 February, 2015;
originally announced February 2015.
-
Thermonuclear Burning Regimes and the Use of SNe Ia in Cosmology
Authors:
E. I. Sorokina,
S. I. Blinnikov,
O. S. Bartunov
Abstract:
The calculations of the light curves of thermonuclear supernovae are carried out by a method of multi-group radiation hydrodynamics. The effects of spectral lines and expansion opacity are taken into account. The predictions for UBVI fluxes are given. The values of rise time for B and V bands found in our calculations are in good agreement with the observed values. We explain why our results for…
▽ More
The calculations of the light curves of thermonuclear supernovae are carried out by a method of multi-group radiation hydrodynamics. The effects of spectral lines and expansion opacity are taken into account. The predictions for UBVI fluxes are given. The values of rise time for B and V bands found in our calculations are in good agreement with the observed values. We explain why our results for the rise time have more solid physical justification than those obtained by other authors. It is shown that small variations in the chemical composition of the ejecta, produced in the explosions with different regimes of nuclear burning, can influence drastically the light curve decline in the B band and, to a lesser extent, in the V band. We argue that recent results on positive cosmological constant Lambda, found from the high redshift supernova observations, could be wrong in the case of possible variations of the preferred mode of nuclear burning in the earlier Universe.
△ Less
Submitted 2 October, 1999; v1 submitted 30 June, 1999;
originally announced June 1999.
-
A comparative modeling of supernova 1993J
Authors:
S. I. Blinnikov,
R. Eastman,
O. S. Bartunov,
V. A. Popolitov,
S. E. Woosley
Abstract:
The light curve of Supernova (SN) 1993J is calculated using two approaches to radiation transport as exemplified by the two computer codes, STELLA and EDDINGTON. Particular attention is paid to shock breakout and the photometry in the U, B, and V bands during the first 120 days. The hydrodynamical model, the explosion of a 13 Msun star which had lost most of its hydrogenic envelope to a companio…
▽ More
The light curve of Supernova (SN) 1993J is calculated using two approaches to radiation transport as exemplified by the two computer codes, STELLA and EDDINGTON. Particular attention is paid to shock breakout and the photometry in the U, B, and V bands during the first 120 days. The hydrodynamical model, the explosion of a 13 Msun star which had lost most of its hydrogenic envelope to a companion, is the same in each calculation. The comparison elucidates differences between the approaches and also serves to validate the results of both. STELLA includes implicit hydrodynamics and is able to model supernova evolution at early times, before the expansion is homologous. STELLA also employs multi-group photonics and is able to follow the radiation as it decouples from the matter. EDDINGTON uses a different algorithm for integrating the transport equation, assumes homologous expansion, and uses a finer frequency resolution. Good agreement is achieved between the two codes only when compatible physical assumptions are made about the opacity. A new result for SN 1993J is a prediction of the continuum spectrum near the shock breakout (calculated by STELLA) which is superior to the results of other standard single energy group hydrocodes such as VISPHOT or TITAN. Based on the results of our independent codes, we discuss the uncertainties involved in the current time dependent models of supernova light curves.
△ Less
Submitted 6 November, 1997;
originally announced November 1997.
-
The rate of Supernovae from the combined sample of five searches
Authors:
E. Cappellaro,
M. Turatto,
D. Yu. Tsvetkov,
O. S. Bartunov,
C. Pollas,
R. Evans,
M. Hamuy
Abstract:
With the purpose to obtain new estimates of the rate of supernovae we joined the logs of five SN searches, namely the Asiago, Crimea, Cal{á}n-Tololo and OCA photographic surveys and the visual search by Evans (the sample counts 110 SNe). We found that the most prolific galaxies are late spirals in which most SNe are of type II (0.88 SNu). SN Ib/c are rarer than SN Ia (0.16 and 0.24 SNu, respecti…
▽ More
With the purpose to obtain new estimates of the rate of supernovae we joined the logs of five SN searches, namely the Asiago, Crimea, Cal{á}n-Tololo and OCA photographic surveys and the visual search by Evans (the sample counts 110 SNe). We found that the most prolific galaxies are late spirals in which most SNe are of type II (0.88 SNu). SN Ib/c are rarer than SN Ia (0.16 and 0.24 SNu, respectively), ruling out previous claims of a very high rate of SNIb/c. We also found that the rate of SN Ia in ellipticals (0.13 SNu) is smaller than in spirals, supporting the hypothesis of different ages of the progenitor systems in early and late type galaxies. Finally, we estimated that even assuming that separate classes of faint SN Ia and SN II do exist (SNe 1991bg and 1987A could be the respective prototypes) the overall SN rate is raised only by 20-30%, therefore excluding that faint SNe represent the majority of SN explosions. Also, the bright SNIIn are intrinsically very rare (2 to 5% of all SNII in spirals).
△ Less
Submitted 22 November, 1996;
originally announced November 1996.
-
The Rate of Supernovae. II. the Selection Effects and the Frequencies Per Unit Blue Luminosity
Authors:
E. Cappellaro,
M. Turatto,
Benetti,
D. Yu. Tsvetkov,
O. S. Bartunov,
I. N. Makarova
Abstract:
We present new estimates of the observed rates of SNe determined with the {\em control time} method applied to the files of observations of two long term, photographic SN searches carried out at the Asiago and Sternberg Observatories. Our calculations are applied to a galaxy sample extracted from RC3, in which 65 SNe have been discovered. This relatively large number of SNe has been redistribute…
▽ More
We present new estimates of the observed rates of SNe determined with the {\em control time} method applied to the files of observations of two long term, photographic SN searches carried out at the Asiago and Sternberg Observatories. Our calculations are applied to a galaxy sample extracted from RC3, in which 65 SNe have been discovered. This relatively large number of SNe has been redistributed in the different morphological classes of host galaxies giving the respective SN rates. The magnitude of two biases, the overexposure of the central part of galaxies and the inclination of the spiral parent galaxies, have been estimated. We show that due to overexposure a increasing fraction of SNe is lost in galaxies of increasing distances. Also, a reduced number of SNe is discovered in inclined galaxies ($i>30\degr$): SNII and Ib are more affected than Ia, as well as SNe in Sbc-Sd galaxies with respect to other spirals. We strengthen previous findings that the SN rates is proportional to the galaxy blue luminosity for all SN and Hubble types. Other sources of errors, besides those due to the statistics of the events, have been investigated. In particular those related to the adopted SN parameters (Cappellaro et al. (\cite{paper1})) and correction factor for overexposure and inclination. Moreover, we show that the frequencies of SNe per unit luminosity vary if different sources for the parameters of the sample galaxies are adopted, thus hampering the comparison of SN rates based on different galaxy samples. The overall rates per unit blue luminosity are similar to the previous
△ Less
Submitted 25 February, 1993;
originally announced February 1993.