-
Solving Mixed Integer Programs Using Neural Networks
Authors:
Vinod Nair,
Sergey Bartunov,
Felix Gimeno,
Ingrid von Glehn,
Pawel Lichocki,
Ivan Lobov,
Brendan O'Donoghue,
Nicolas Sonnerat,
Christian Tjandraatmadja,
Pengming Wang,
Ravichandra Addanki,
Tharindi Hapuarachchi,
Thomas Keck,
James Keeling,
Pushmeet Kohli,
Ira Ktena,
Yujia Li,
Oriol Vinyals,
Yori Zwols
Abstract:
Mixed Integer Programming (MIP) solvers rely on an array of sophisticated heuristics developed with decades of research to solve large-scale MIP instances encountered in practice. Machine learning offers to automatically construct better heuristics from data by exploiting shared structure among instances in the data. This paper applies learning to the two key sub-tasks of a MIP solver, generating…
▽ More
Mixed Integer Programming (MIP) solvers rely on an array of sophisticated heuristics developed with decades of research to solve large-scale MIP instances encountered in practice. Machine learning offers to automatically construct better heuristics from data by exploiting shared structure among instances in the data. This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one. Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP. Neural Diving learns a deep neural network to generate multiple partial assignments for its integer variables, and the resulting smaller MIPs for un-assigned variables are solved with SCIP to construct high quality joint assignments. Neural Branching learns a deep neural network to make variable selection decisions in branch-and-bound to bound the objective value gap with a small tree. This is done by imitating a new variant of Full Strong Branching we propose that scales to large instances using GPUs. We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each. Most instances in all the datasets combined have $10^3-10^6$ variables and constraints after presolve, which is significantly larger than previous learning approaches. Comparing solvers with respect to primal-dual gap averaged over a held-out set of instances, the learning-augmented SCIP is 2x to 10x better on all datasets except one on which it is $10^5$x better, at large time limits. To the best of our knowledge, ours is the first learning approach to demonstrate such large improvements over SCIP on both large-scale real-world application datasets and MIPLIB.
△ Less
Submitted 29 July, 2021; v1 submitted 23 December, 2020;
originally announced December 2020.
-
Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search
Authors:
Lars Buesing,
Theophane Weber,
Yori Zwols,
Sebastien Racaniere,
Arthur Guez,
Jean-Baptiste Lespiau,
Nicolas Heess
Abstract:
Learning policies on data synthesized by models can in principle quench the thirst of reinforcement learning algorithms for large amounts of real experience, which is often costly to acquire. However, simulating plausible experience de novo is a hard problem for many complex environments, often resulting in biases for model-based policy evaluation and search. Instead of de novo synthesis of data,…
▽ More
Learning policies on data synthesized by models can in principle quench the thirst of reinforcement learning algorithms for large amounts of real experience, which is often costly to acquire. However, simulating plausible experience de novo is a hard problem for many complex environments, often resulting in biases for model-based policy evaluation and search. Instead of de novo synthesis of data, here we assume logged, real experience and model alternative outcomes of this experience under counterfactual actions, actions that were not actually taken. Based on this, we propose the Counterfactually-Guided Policy Search (CF-GPS) algorithm for learning policies in POMDPs from off-policy experience. It leverages structural causal models for counterfactual evaluation of arbitrary policies on individual off-policy episodes. CF-GPS can improve on vanilla model-based RL algorithms by making use of available logged data to de-bias model predictions. In contrast to off-policy algorithms based on Importance Sampling which re-weight data, CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data. We find empirically that these advantages translate into improved policy evaluation and search results on a non-trivial grid-world task. Finally, we show that CF-GPS generalizes the previously proposed Guided Policy Search and that reparameterization-based algorithms such Stochastic Value Gradient can be interpreted as counterfactual methods.
△ Less
Submitted 15 November, 2018;
originally announced November 2018.
-
Generative Temporal Models with Spatial Memory for Partially Observed Environments
Authors:
Marco Fraccaro,
Danilo Jimenez Rezende,
Yori Zwols,
Alexander Pritzel,
S. M. Ali Eslami,
Fabio Viola
Abstract:
In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent's representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially p…
▽ More
In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent's representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially partially-observed and 3D environments. In this work we introduce a novel action-conditioned generative model of such challenging environments. The model features a non-parametric spatial memory system in which we store learned, disentangled representations of the environment. Low-dimensional spatial updates are computed using a state-space model that makes use of knowledge on the prior dynamics of the moving agent, and high-dimensional visual observations are modelled with a Variational Auto-Encoder. The result is a scalable architecture capable of performing coherent predictions over hundreds of time steps across a range of partially observed 2D and 3D environments.
△ Less
Submitted 19 July, 2018; v1 submitted 25 April, 2018;
originally announced April 2018.
-
PathNet: Evolution Channels Gradient Descent in Super Neural Networks
Authors:
Chrisantha Fernando,
Dylan Banarse,
Charles Blundell,
Yori Zwols,
David Ha,
Andrei A. Rusu,
Alexander Pritzel,
Daan Wierstra
Abstract:
For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks. Agents are pathw…
▽ More
For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to re-use for new tasks. Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function. We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and re-evolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after fine-tuning. Paths evolved on task B re-use parts of the optimal path evolved on task A. Positive transfer was demonstrated for binary MNIST, CIFAR, and SVHN supervised learning classification tasks, and a set of Atari and Labyrinth reinforcement learning tasks, suggesting PathNets have general applicability for neural network training. Finally, PathNet also significantly improves the robustness to hyperparameter choices of a parallel asynchronous reinforcement learning algorithm (A3C).
△ Less
Submitted 30 January, 2017;
originally announced January 2017.
-
Patterns of conjunctive forks
Authors:
Vašek Chvátal,
František Matúš,
Yori Zwólš
Abstract:
Three events in a probability space form a conjunctive fork if they satisfy specific constraints on conditional independence and covariances. Patterns of conjunctive forks within collections of events are characterized by means of systems of linear equations that have positive solutions. This characterization allows patterns of conjunctive forks to be recognized in polynomial time. Relations to pr…
▽ More
Three events in a probability space form a conjunctive fork if they satisfy specific constraints on conditional independence and covariances. Patterns of conjunctive forks within collections of events are characterized by means of systems of linear equations that have positive solutions. This characterization allows patterns of conjunctive forks to be recognized in polynomial time. Relations to previous work on causal betweenness and on patterns of conditional independence among random variables are discussed.
△ Less
Submitted 29 August, 2016; v1 submitted 13 August, 2016;
originally announced August 2016.
-
Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions
Authors:
Peter Sunehag,
Richard Evans,
Gabriel Dulac-Arnold,
Yori Zwols,
Daniel Visentin,
Ben Coppin
Abstract:
Many real-world problems come with action spaces represented as feature vectors. Although high-dimensional control is a largely unsolved problem, there has recently been progress for modest dimensionalities. Here we report on a successful attempt at addressing problems of dimensionality as high as $2000$, of a particular form. Motivated by important applications such as recommendation systems that…
▽ More
Many real-world problems come with action spaces represented as feature vectors. Although high-dimensional control is a largely unsolved problem, there has recently been progress for modest dimensionalities. Here we report on a successful attempt at addressing problems of dimensionality as high as $2000$, of a particular form. Motivated by important applications such as recommendation systems that do not fit the standard reinforcement learning frameworks, we introduce Slate Markov Decision Processes (slate-MDPs). A Slate-MDP is an MDP with a combinatorial action space consisting of slates (tuples) of primitive actions of which one is executed in an underlying MDP. The agent does not control the choice of this executed action and the action might not even be from the slate, e.g., for recommendation systems for which all recommendations can be ignored. We use deep Q-learning based on feature representations of both the state and action to learn the value of whole slates. Unlike existing methods, we optimize for both the combinatorial and sequential aspects of our tasks. The new agent's superiority over agents that either ignore the combinatorial or sequential long-term value aspect is demonstrated on a range of environments with dynamics from a real-world recommendation system. Further, we use deep deterministic policy gradients to learn a policy that for each position of the slate, guides attention towards the part of the action space in which the value is the highest and we only evaluate actions in this area. The attention is used within a sequentially greedy procedure leveraging submodularity. Finally, we show how introducing risk-seeking can dramatically improve the agents performance and ability to discover more far reaching strategies.
△ Less
Submitted 16 December, 2015; v1 submitted 3 December, 2015;
originally announced December 2015.
-
Minimum length path decompositions
Authors:
Dariusz Dereniowski,
Wieslaw Kubiak,
Yori Zwols
Abstract:
We consider a bi-criteria generalization of the pathwidth problem, where, for given integers $k,l$ and a graph $G$, we ask whether there exists a path decomposition $\cP$ of $G$ such that the width of $\cP$ is at most $k$ and the number of bags in $\cP$, i.e., the \emph{length} of $\cP$, is at most $l$.
We provide a complete complexity classification of the problem in terms of $k$ and $l$ for ge…
▽ More
We consider a bi-criteria generalization of the pathwidth problem, where, for given integers $k,l$ and a graph $G$, we ask whether there exists a path decomposition $\cP$ of $G$ such that the width of $\cP$ is at most $k$ and the number of bags in $\cP$, i.e., the \emph{length} of $\cP$, is at most $l$.
We provide a complete complexity classification of the problem in terms of $k$ and $l$ for general graphs. Contrary to the original pathwidth problem, which is fixed-parameter tractable with respect to $k$, we prove that the generalized problem is NP-complete for any fixed $k\geq 4$, and is also NP-complete for any fixed $l\geq 2$. On the other hand, we give a polynomial-time algorithm that, for any (possibly disconnected) graph $G$ and integers $k\leq 3$ and $l>0$, constructs a path decomposition of width at most $k$ and length at most $l$, if any exists.
As a by-product, we obtain an almost complete classification of the problem in terms of $k$ and $l$ for connected graphs. Namely, the problem is NP-complete for any fixed $k\geq 5$ and it is polynomial-time for any $k\leq 3$. This leaves open the case $k=4$ for connected graphs.
△ Less
Submitted 12 February, 2013;
originally announced February 2013.
-
A De Bruijn-Erdos theorem for chordal graphs
Authors:
Laurent Beaudou,
Adrian Bondy,
Xiaomin Chen,
Ehsan Chiniforooshan,
Maria Chudnovsky,
Vasek Chvatal,
Nicolas Fraiman,
Yori Zwols
Abstract:
A special case of a combinatorial theorem of De Bruijn and Erdos asserts that every noncollinear set of n points in the plane determines at least n distinct lines. Chen and Chvatal suggested a possible generalization of this assertion in metric spaces with appropriately defined lines. We prove this generalization in all metric spaces induced by connected chordal graphs.
A special case of a combinatorial theorem of De Bruijn and Erdos asserts that every noncollinear set of n points in the plane determines at least n distinct lines. Chen and Chvatal suggested a possible generalization of this assertion in metric spaces with appropriately defined lines. We prove this generalization in all metric spaces induced by connected chordal graphs.
△ Less
Submitted 30 January, 2012;
originally announced January 2012.
-
Lines in hypergraphs
Authors:
Laurent Beaudou,
Adrian Bondy,
Xiaomin Chen,
Ehsan Chiniforooshan,
Maria Chudnovsky,
Vasek Chvatal,
Nicolas Fraiman,
Yori Zwols
Abstract:
One of the De Bruijn - Erdos theorems deals with finite hypergraphs where every two vertices belong to precisely one hyperedge. It asserts that, except in the perverse case where a single hyperedge equals the whole vertex set, the number of hyperedges is at least the number of vertices and the two numbers are equal if and only if the hypergraph belongs to one of simply described families, near-pen…
▽ More
One of the De Bruijn - Erdos theorems deals with finite hypergraphs where every two vertices belong to precisely one hyperedge. It asserts that, except in the perverse case where a single hyperedge equals the whole vertex set, the number of hyperedges is at least the number of vertices and the two numbers are equal if and only if the hypergraph belongs to one of simply described families, near-pencils and finite projective planes. Chen and Chvatal proposed to define the line uv in a 3-uniform hypergraph as the set of vertices that consists of u, v, and all w such that {u,v,w} is a hyperedge. With this definition, the De Bruijn - Erdos theorem is easily seen to be equivalent to the following statement: If no four vertices in a 3-uniform hypergraph carry two or three hyperedges, then, except in the perverse case where one of the lines equals the whole vertex set, the number of lines is at least the number of vertices and the two numbers are equal if and only if the hypergraph belongs to one of two simply described families. Our main result eneralizes this statement by allowing any four vertices to carry three hyperedges (but kee** two forbidden): the conclusion remains the same except that a third simply described family, complements of Steiner triple systems, appears in the extremal case.
△ Less
Submitted 1 December, 2011;
originally announced December 2011.