-
Decision-focused predictions via pessimistic bilevel optimization: a computational study
Authors:
Víctor Bucarey,
Sophia Calderón,
Gonzalo Muñoz,
Frederic Semet
Abstract:
Dealing with uncertainty in optimization parameters is an important and longstanding challenge. Typically, uncertain parameters are predicted accurately, and then a deterministic optimization problem is solved. However, the decisions produced by this so-called \emph{predict-then-optimize} procedure can be highly sensitive to uncertain parameters. In this work, we contribute to recent efforts in pr…
▽ More
Dealing with uncertainty in optimization parameters is an important and longstanding challenge. Typically, uncertain parameters are predicted accurately, and then a deterministic optimization problem is solved. However, the decisions produced by this so-called \emph{predict-then-optimize} procedure can be highly sensitive to uncertain parameters. In this work, we contribute to recent efforts in producing \emph{decision-focused} predictions, i.e., to build predictive models that are constructed with the goal of minimizing a \emph{regret} measure on the decisions taken with them. We begin by formulating the exact expected regret minimization as a pessimistic bilevel optimization model. Then, we establish NP-completeness of this problem, even in a heavily restricted case. Using duality arguments, we reformulate it as a non-convex quadratic optimization problem. Finally, we show various computational techniques to achieve tractability. We report extensive computational results on shortest-path instances with uncertain cost vectors. Our results indicate that our approach can improve training performance over the approach of Elmachtoub and Grigas (2022), a state-of-the-art method for decision-focused learning.
△ Less
Submitted 26 May, 2024; v1 submitted 29 December, 2023;
originally announced December 2023.
-
Computational Tradeoffs of Optimization-Based Bound Tightening in ReLU Networks
Authors:
Fabian Badilla,
Marcos Goycoolea,
Gonzalo Muñoz,
Thiago Serra
Abstract:
The use of Mixed-Integer Linear Programming (MILP) models to represent neural networks with Rectified Linear Unit (ReLU) activations has become increasingly widespread in the last decade. This has enabled the use of MILP technology to test-or stress-their behavior, to adversarially improve their training, and to embed them in optimization models leveraging their predictive power. Many of these MIL…
▽ More
The use of Mixed-Integer Linear Programming (MILP) models to represent neural networks with Rectified Linear Unit (ReLU) activations has become increasingly widespread in the last decade. This has enabled the use of MILP technology to test-or stress-their behavior, to adversarially improve their training, and to embed them in optimization models leveraging their predictive power. Many of these MILP models rely on activation bounds. That is, bounds on the input values of each neuron. In this work, we explore the tradeoff between the tightness of these bounds and the computational effort of solving the resulting MILP models. We provide guidelines for implementing these models based on the impact of network structure, regularization, and rounding.
△ Less
Submitted 30 January, 2024; v1 submitted 27 December, 2023;
originally announced December 2023.
-
When Deep Learning Meets Polyhedral Theory: A Survey
Authors:
Joey Huchette,
Gonzalo Muñoz,
Thiago Serra,
Calvin Tsay
Abstract:
In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU)…
▽ More
In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU), which became the most commonly used type of activation function in neural networks. That made certain types of network structure $\unicode{x2014}$such as the typical fully-connected feedforward neural network$\unicode{x2014}$ amenable to analysis through polyhedral theory and to the application of methodologies such as Linear Programming (LP) and Mixed-Integer Linear Programming (MILP) for a variety of purposes. In this paper, we survey the main topics emerging from this fast-paced area of work, which bring a fresh perspective to understanding neural networks in more detail as well as to applying linear optimization techniques to train, verify, and reduce the size of such networks.
△ Less
Submitted 31 August, 2023; v1 submitted 29 April, 2023;
originally announced May 2023.
-
Out of Distribution Data Detection Using Dropout Bayesian Neural Networks
Authors:
Andre T. Nguyen,
Fred Lu,
Gary Lopez Munoz,
Edward Raff,
Charles Nicholas,
James Holt
Abstract:
We explore the utility of information contained within a dropout based Bayesian neural network (BNN) for the task of detecting out of distribution (OOD) data. We first show how previous attempts to leverage the randomized embeddings induced by the intermediate layers of a dropout BNN can fail due to the distance metric used. We introduce an alternative approach to measuring embedding uncertainty,…
▽ More
We explore the utility of information contained within a dropout based Bayesian neural network (BNN) for the task of detecting out of distribution (OOD) data. We first show how previous attempts to leverage the randomized embeddings induced by the intermediate layers of a dropout BNN can fail due to the distance metric used. We introduce an alternative approach to measuring embedding uncertainty, justify its use theoretically, and demonstrate how incorporating embedding uncertainty improves OOD data identification across three tasks: image classification, language classification, and malware detection.
△ Less
Submitted 17 February, 2022;
originally announced February 2022.
-
Automatic Yara Rule Generation Using Biclustering
Authors:
Edward Raff,
Richard Zak,
Gary Lopez Munoz,
William Fleming,
Hyrum S. Anderson,
Bobby Filar,
Charles Nicholas,
James Holt
Abstract:
Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts. Develo** high-quality Yara rules to detect a malware family of interest can be labor- and time-intensive, even for expert users. Few tools exist and relatively little work has been done on how to automate the generation of Yara rules for specific families. In this paper, we leverage large n-grams ($n \geq 8$) combin…
▽ More
Yara rules are a ubiquitous tool among cybersecurity practitioners and analysts. Develo** high-quality Yara rules to detect a malware family of interest can be labor- and time-intensive, even for expert users. Few tools exist and relatively little work has been done on how to automate the generation of Yara rules for specific families. In this paper, we leverage large n-grams ($n \geq 8$) combined with a new biclustering algorithm to construct simple Yara rules more effectively than currently available software. Our method, AutoYara, is fast, allowing for deployment on low-resource equipment for teams that deploy to remote networks. Our results demonstrate that AutoYara can help reduce analyst workload by producing rules with useful true-positive rates while maintaining low false-positive rates, sometimes matching or even outperforming human analysts. In addition, real-world testing by malware analysts indicates AutoYara could reduce analyst time spent constructing Yara rules by 44-86%, allowing them to spend their time on the more advanced malware that current tools can't handle. Code will be made available at https://github.com/NeuromorphicComputationResearchProgram .
△ Less
Submitted 5 September, 2020;
originally announced September 2020.
-
Principled Deep Neural Network Training through Linear Programming
Authors:
Daniel Bienstock,
Gonzalo Muñoz,
Sebastian Pokutta
Abstract:
Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident in recent years. In this work, using a unified framework, we show that there exists a polyhedron which encodes simultaneously all possible deep neural network training prob…
▽ More
Deep learning has received much attention lately due to the impressive empirical performance achieved by training algorithms. Consequently, a need for a better theoretical understanding of these problems has become more evident in recent years. In this work, using a unified framework, we show that there exists a polyhedron which encodes simultaneously all possible deep neural network training problems that can arise from a given architecture, activation functions, loss function, and sample-size. Notably, the size of the polyhedral representation depends only linearly on the sample-size, and a better dependency on several other network parameters is unlikely (assuming $P\neq NP$). Additionally, we use our polyhedral representation to obtain new and better computational complexity results for training problems of well-known neural network architectures. Our results provide a new perspective on training problems through the lens of polyhedral theory and reveal a strong structure arising from these problems.
△ Less
Submitted 1 March, 2022; v1 submitted 7 October, 2018;
originally announced October 2018.
-
New Limits of Treewidth-based tractability in Optimization
Authors:
Yuri Faenza,
Gonzalo Muñoz,
Sebastian Pokutta
Abstract:
Sparse structures are frequently sought when pursuing tractability in optimization problems. They are exploited from both theoretical and computational perspectives to handle complex problems that become manageable when sparsity is present. An example of this type of structure is given by treewidth: a graph theoretical parameter that measures how "tree-like" a graph is. This parameter has been use…
▽ More
Sparse structures are frequently sought when pursuing tractability in optimization problems. They are exploited from both theoretical and computational perspectives to handle complex problems that become manageable when sparsity is present. An example of this type of structure is given by treewidth: a graph theoretical parameter that measures how "tree-like" a graph is. This parameter has been used for decades for analyzing the complexity of various optimization problems and for obtaining tractable algorithms for problems where this parameter is bounded. The goal of this work is to contribute to the understanding of the limits of the treewidth-based tractability in optimization. Our results are as follows. First, we prove that, in a certain sense, the already known positive results on extension complexity based on low treewidth are the best possible. Secondly, under mild assumptions, we prove that treewidth is the only graph-theoretical parameter that yields tractability a wide class of optimization problems, a fact well known in Graphical Models in Machine Learning and in Constraint Satisfaction Problems, which here we extend to an approximation setting in Optimization.
△ Less
Submitted 20 March, 2019; v1 submitted 6 July, 2018;
originally announced July 2018.
-
LP approximations to mixed-integer polynomial optimization problems
Authors:
Daniel Bienstock,
Gonzalo Munoz
Abstract:
We present a class of linear programming approximations for constrained optimization problems. In the case of mixed-integer polynomial optimization problems, if the intersection graph of the constraints has bounded tree-width our construction yields a class of linear size formulations that attain any desired tolerance. As a result, we obtain an approximation scheme for the "AC-OPF" problem on grap…
▽ More
We present a class of linear programming approximations for constrained optimization problems. In the case of mixed-integer polynomial optimization problems, if the intersection graph of the constraints has bounded tree-width our construction yields a class of linear size formulations that attain any desired tolerance. As a result, we obtain an approximation scheme for the "AC-OPF" problem on graphs with bounded tree-width. We also describe a more general construction for pure binary optimization problems where individual constraints are available through a membership oracle; if the intersection graph for the constraints has bounded tree-width our construction is of linear size and exact. This improves on a number of results in the literature, both from the perspective of formulation size and generality.
△ Less
Submitted 18 October, 2016; v1 submitted 1 January, 2015;
originally announced January 2015.