-
A GPU-Accelerated Bi-linear ADMM Algorithm for Distributed Sparse Machine Learning
Authors:
Alireza Olama,
Andreas Lundell,
Jan Kronqvist,
Elham Ahmadi,
Eduardo Camponogara
Abstract:
This paper introduces the Bi-linear consensus Alternating Direction Method of Multipliers (Bi-cADMM), aimed at solving large-scale regularized Sparse Machine Learning (SML) problems defined over a network of computational nodes. Mathematically, these are stated as minimization problems with convex local loss functions over a global decision vector, subject to an explicit $\ell_0$ norm constraint t…
▽ More
This paper introduces the Bi-linear consensus Alternating Direction Method of Multipliers (Bi-cADMM), aimed at solving large-scale regularized Sparse Machine Learning (SML) problems defined over a network of computational nodes. Mathematically, these are stated as minimization problems with convex local loss functions over a global decision vector, subject to an explicit $\ell_0$ norm constraint to enforce the desired sparsity. The considered SML problem generalizes different sparse regression and classification models, such as sparse linear and logistic regression, sparse softmax regression, and sparse support vector machines. Bi-cADMM leverages a bi-linear consensus reformulation of the original non-convex SML problem and a hierarchical decomposition strategy that divides the problem into smaller sub-problems amenable to parallel computing. In Bi-cADMM, this decomposition strategy is based on a two-phase approach. Initially, it performs a sample decomposition of the data and distributes local datasets across computational nodes. Subsequently, a delayed feature decomposition of the data is conducted on Graphics Processing Units (GPUs) available to each node. This methodology allows Bi-cADMM to undertake computationally intensive data-centric computations on GPUs, while CPUs handle more cost-effective computations. The proposed algorithm is implemented within an open-source Python package called Parallel Sparse Fitting Toolbox (PsFiT), which is publicly available. Finally, computational experiments demonstrate the efficiency and scalability of our algorithm through numerical benchmarks across various SML problems featuring distributed datasets.
△ Less
Submitted 26 June, 2024; v1 submitted 25 May, 2024;
originally announced May 2024.
-
A cutting plane algorithm for globally solving low dimensional k-means clustering problems
Authors:
Martin Ryner,
Jan Kronqvist,
Johan Karlsson
Abstract:
Clustering is one of the most fundamental tools in data science and machine learning, and k-means clustering is one of the most common such methods. There is a variety of approximate algorithms for the k-means problem, but computing the globally optimal solution is in general NP-hard. In this paper we consider the k-means problem for instances with low dimensional data and formulate it as a struct…
▽ More
Clustering is one of the most fundamental tools in data science and machine learning, and k-means clustering is one of the most common such methods. There is a variety of approximate algorithms for the k-means problem, but computing the globally optimal solution is in general NP-hard. In this paper we consider the k-means problem for instances with low dimensional data and formulate it as a structured concave assignment problem. This allows us to exploit the low dimensional structure and solve the problem to global optimality within reasonable time for large data sets with several clusters. The method builds on iteratively solving a small concave problem and a large linear programming problem. This gives a sequence of feasible solutions along with bounds which we show converges to zero optimality gap. The paper combines methods from global optimization theory to accelerate the procedure, and we provide numerical results on their performance.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Solution Polishing via Path Relinking for Continuous Black-Box Optimization
Authors:
Dimitri Papageorgiou,
Jan Kronqvist,
Asha Ramanujam,
James Kor,
Youngdae Kim,
Can Li
Abstract:
When faced with a limited budget of function evaluations, state-of-the-art black-box optimization (BBO) solvers struggle to obtain globally, or sometimes even locally, optimal solutions. In such cases, one may pursue solution polishing, i.e., a computational method to improve (or ``polish'') an incumbent solution, typically via some sort of evolutionary algorithm involving two or more solutions. W…
▽ More
When faced with a limited budget of function evaluations, state-of-the-art black-box optimization (BBO) solvers struggle to obtain globally, or sometimes even locally, optimal solutions. In such cases, one may pursue solution polishing, i.e., a computational method to improve (or ``polish'') an incumbent solution, typically via some sort of evolutionary algorithm involving two or more solutions. While solution polishing in ``white-box'' optimization has existed for years, relatively little has been published regarding its application in costly-to-evaluate BBO. To fill this void, we explore two novel methods for performing solution polishing along one-dimensional curves rather than along straight lines. We introduce a convex quadratic program that can generate promising curves through multiple elite solutions, i.e., via path relinking, or around a single elite solution. In comparing four solution polishing techniques for continuous BBO, we show that solution polishing along a curve is competitive with solution polishing using a state-of-the-art BBO solver.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
LineWalker: Line Search for Black Box Derivative-Free Optimization and Surrogate Model Construction
Authors:
Dimitri J. Papageorgiou,
Jan Kronqvist,
Krishnan Kumaran
Abstract:
This paper describes a simple, but effective sampling method for optimizing and learning a discrete approximation (or surrogate) of a multi-dimensional function along a one-dimensional line segment of interest. The method does not rely on derivative information and the function to be learned can be a computationally-expensive ``black box'' function that must be queried via simulation or other mean…
▽ More
This paper describes a simple, but effective sampling method for optimizing and learning a discrete approximation (or surrogate) of a multi-dimensional function along a one-dimensional line segment of interest. The method does not rely on derivative information and the function to be learned can be a computationally-expensive ``black box'' function that must be queried via simulation or other means. It is assumed that the underlying function is noise-free and smooth, although the algorithm can still be effective when the underlying function is piecewise smooth. The method constructs a smooth surrogate on a set of equally-spaced grid points by evaluating the true function at a sparse set of judiciously chosen grid points. At each iteration, the surrogate's non-tabu local minima and maxima are identified as candidates for sampling. Tabu search constructs are also used to promote diversification. If no non-tabu extrema are identified, a simple exploration step is taken by sampling the midpoint of the largest unexplored interval. The algorithm continues until a user-defined function evaluation limit is reached. Numerous examples are shown to illustrate the algorithm's efficacy and superiority relative to state-of-the-art methods, including Bayesian optimization and NOMAD, on primarily nonconvex test functions.
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
Globally solving the Gromov-Wasserstein problem for point clouds in low dimensional Euclidean spaces
Authors:
Martin Ryner,
Jan Kronqvist,
Johan Karlsson
Abstract:
This paper presents a framework for computing the Gromov-Wasserstein problem between two sets of points in low dimensional spaces, where the discrepancy is the squared Euclidean norm. The Gromov-Wasserstein problem is a generalization of the optimal transport problem that finds the assignment between two sets preserving pairwise distances as much as possible. This can be used to quantify the simil…
▽ More
This paper presents a framework for computing the Gromov-Wasserstein problem between two sets of points in low dimensional spaces, where the discrepancy is the squared Euclidean norm. The Gromov-Wasserstein problem is a generalization of the optimal transport problem that finds the assignment between two sets preserving pairwise distances as much as possible. This can be used to quantify the similarity between two formations or shapes, a common problem in AI and machine learning. The problem can be formulated as a Quadratic Assignment Problem (QAP), which is in general computationally intractable even for small problems. Our framework addresses this challenge by reformulating the QAP as an optimization problem with a low-dimensional domain, leveraging the fact that the problem can be expressed as a concave quadratic optimization problem with low rank. The method scales well with the number of points, and it can be used to find the global solution for large-scale problems with thousands of points. We compare the computational complexity of our approach with state-of-the-art methods on synthetic problems and apply it to a near-symmetrical problem which is of particular interest in computational biology.
△ Less
Submitted 18 July, 2023;
originally announced July 2023.
-
Alternating mixed-integer programming and neural network training for approximating stochastic two-stage problems
Authors:
Jan Kronqvist,
Boda Li,
Jan Rolfes,
Shudian Zhao
Abstract:
The presented work addresses two-stage stochastic programs (2SPs), a broadly applicable model to capture optimization problems subject to uncertain parameters with adjustable decision variables. In case the adjustable or second-stage variables contain discrete decisions, the corresponding 2SPs are known to be NP-complete. The standard approach of forming a single-stage deterministic equivalent pro…
▽ More
The presented work addresses two-stage stochastic programs (2SPs), a broadly applicable model to capture optimization problems subject to uncertain parameters with adjustable decision variables. In case the adjustable or second-stage variables contain discrete decisions, the corresponding 2SPs are known to be NP-complete. The standard approach of forming a single-stage deterministic equivalent problem can be computationally challenging even for small instances, as the number of variables and constraints scales with the number of scenarios. To avoid forming a potentially huge MILP problem, we build upon an approach of approximating the expected value of the second-stage problem by a neural network (NN) and encoding the resulting NN into the first-stage problem. The proposed algorithm alternates between optimizing the first-stage variables and retraining the NN. We demonstrate the value of our approach with the example of computing operating points in power systems by showing that the alternating approach provides improved first-stage decisions and a tighter approximation between the expected objective and its neural network approximation.
△ Less
Submitted 19 July, 2023; v1 submitted 11 May, 2023;
originally announced May 2023.
-
A Column Generation Approach for Radiation Therapy Patient Scheduling with Planned Machine Unavailability and Uncertain Future Arrivals
Authors:
Sara Frimodig,
Per Enqvist,
Jan Kronqvist
Abstract:
The number of cancer cases per year is rapidly increasing worldwide. In radiation therapy (RT), radiation from linear accelerators is used to kill malignant tumor cells. Scheduling patients for RT is difficult both due to the numerous medical and technical constraints, and because of the stochastic inflow of patients with different urgency levels. In this paper, a Column Generation (CG) approach i…
▽ More
The number of cancer cases per year is rapidly increasing worldwide. In radiation therapy (RT), radiation from linear accelerators is used to kill malignant tumor cells. Scheduling patients for RT is difficult both due to the numerous medical and technical constraints, and because of the stochastic inflow of patients with different urgency levels. In this paper, a Column Generation (CG) approach is proposed for the RT patient scheduling problem. The model includes all the constraints necessary for the generated schedules to work in practice, including for example different machine compatibilities, individualized patient protocols, and multiple hospital sites. The model is the first to include planned interruptions in treatments due to maintenance on machines, which is an important aspect when scheduling patients in practice, as it can create bottlenecks in the patient flow. Different methods to ensure that there are available resources for high priority patients at arrival are compared, including static and dynamic time reservation. Data from Iridium Netwerk, the largest cancer center in Belgium, is used to evaluate the CG approach. The results show that the dynamic time reservation method outperforms the other methods used to handle uncertainty in future urgent patients. A sensitivity analysis also shows that the dynamic time reservation method is robust to fluctuations in arrival rates. The CG approach produces schedules that fulfill all the medical and technical constraints posed at Iridium Netwerk with acceptable computation times.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
A mixed-integer approximation of robust optimization problems with mixed-integer adjustments
Authors:
Jan Kronqvist,
Boda Li,
Jan Rolfes
Abstract:
In the present article we propose a mixed-integer approximation of adjustable-robust optimization (ARO) problems, that have both, continuous and discrete variables on the lowest level. As these trilevel problems are notoriously hard to solve, we restrict ourselves to weakly-connected instances. Our approach allows us to approximate, and in some cases exactly represent, the trilevel problem as a si…
▽ More
In the present article we propose a mixed-integer approximation of adjustable-robust optimization (ARO) problems, that have both, continuous and discrete variables on the lowest level. As these trilevel problems are notoriously hard to solve, we restrict ourselves to weakly-connected instances. Our approach allows us to approximate, and in some cases exactly represent, the trilevel problem as a single-level mixed-integer problem. This allows us to leverage the computational efficiency of state-of-the-art mixed-integer programming solvers. We demonstrate the value of this approach by applying it to the optimization of power systems, particularly to the control of smart converters.
△ Less
Submitted 19 July, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Model-based feature selection for neural networks: A mixed-integer programming approach
Authors:
Shudian Zhao,
Calvin Tsay,
Jan Kronqvist
Abstract:
In this work, we develop a novel input feature selection framework for ReLU-based deep neural networks (DNNs), which builds upon a mixed-integer optimization approach. While the method is generally applicable to various classification tasks, we focus on finding input features for image classification for clarity of presentation. The idea is to use a trained DNN, or an ensemble of trained DNNs, to…
▽ More
In this work, we develop a novel input feature selection framework for ReLU-based deep neural networks (DNNs), which builds upon a mixed-integer optimization approach. While the method is generally applicable to various classification tasks, we focus on finding input features for image classification for clarity of presentation. The idea is to use a trained DNN, or an ensemble of trained DNNs, to identify the salient input features. The input feature selection is formulated as a sequence of mixed-integer linear programming (MILP) problems that find sets of sparse inputs that maximize the classification confidence of each category. These ''inverse'' problems are regularized by the number of inputs selected for each category and by distribution constraints. Numerical results on the well-known MNIST and FashionMNIST datasets show that the proposed input feature selection allows us to drastically reduce the size of the input to $\sim$15\% while maintaining a good classification accuracy. This allows us to design DNNs with significantly fewer connections, reducing computational effort and producing DNNs that are more robust towards adversarial attacks.
△ Less
Submitted 20 February, 2023;
originally announced February 2023.
-
Sparse Convex Optimization Toolkit: A Mixed-Integer Framework
Authors:
Alireza Olama,
Eduardo Camponogara,
Jan Kronqvist
Abstract:
This paper proposes an open-source distributed solver for solving Sparse Convex Optimization (SCO) problems over computational networks. Motivated by past algorithmic advances in mixed-integer optimization, the Sparse Convex Optimization Toolkit (SCOT) adopts a mixed-integer approach to find exact solutions to SCO problems. In particular, SCOT brings together various techniques to transform the or…
▽ More
This paper proposes an open-source distributed solver for solving Sparse Convex Optimization (SCO) problems over computational networks. Motivated by past algorithmic advances in mixed-integer optimization, the Sparse Convex Optimization Toolkit (SCOT) adopts a mixed-integer approach to find exact solutions to SCO problems. In particular, SCOT brings together various techniques to transform the original SCO problem into an equivalent convex Mixed-Integer Nonlinear Programming (MINLP) problem that can benefit from high-performance and parallel computing platforms. To solve the equivalent mixed-integer problem, we present the Distributed Hybrid Outer Approximation (DiHOA) algorithm that builds upon the LP/NLP based branch-and-bound and is tailored for this specific problem structure. The DiHOA algorithm combines the so-called single- and multi-tree outer approximation, naturally integrates a decentralized algorithm for distributed convex nonlinear subproblems, and utilizes enhancement techniques such as quadratic cuts. Finally, we present detailed computational experiments that show the benefit of our solver through numerical benchmarks on 140 SCO problems with distributed datasets. To show the overall efficiency of SCOT we also provide performance profiles comparing SCOT to other state-of-the-art MINLP solvers.
△ Less
Submitted 30 October, 2022;
originally announced October 2022.
-
P-split formulations: A class of intermediate formulations between big-M and convex hull for disjunctive constraints
Authors:
Jan Kronqvist,
Ruth Misener,
Calvin Tsay
Abstract:
We develop a class of mixed-integer formulations for disjunctive constraints intermediate to the big-M and convex hull formulations in terms of relaxation strength. The main idea is to capture the best of both the big-M and convex hull formulations: a computationally light formulation with a tight relaxation. The "P-split" formulations are based on a lifted transformation that splits convex additi…
▽ More
We develop a class of mixed-integer formulations for disjunctive constraints intermediate to the big-M and convex hull formulations in terms of relaxation strength. The main idea is to capture the best of both the big-M and convex hull formulations: a computationally light formulation with a tight relaxation. The "P-split" formulations are based on a lifted transformation that splits convex additively separable constraints into P partitions and forms the convex hull of the linearized and partitioned disjunction. The "P-split" formulations are derived for disjunctive constraints with convex constraints within each disjuct, and we generalize the results for the case with nonconvex constraints within the disjuncts. We analyze the continuous relaxation of the P-split formulations and show that, under certain assumptions, the formulations form a hierarchy starting from a big-M equivalent and converging to the convex hull. The goal of the P-split formulations is to form strong approximations of the convex hull through a computationally simpler formulation. We computationally compare the P-split formulations against big-M and convex hull formulations on 344 test instances. The test problems include K-means clustering, semi-supervised clustering, P_ball problems, and optimization over trained ReLU neural networks. The computational results show promising potential of the P-split formulations. For many of the test problems, P-split formulations are solved with a similar number of explored nodes as the convex hull formulation, while reducing the solution time by an order of magnitude and outperforming big-M both in time and number of explored nodes.
△ Less
Submitted 27 May, 2024; v1 submitted 10 February, 2022;
originally announced February 2022.
-
Maximizing information from chemical engineering data sets: Applications to machine learning
Authors:
Alexander Thebelt,
Johannes Wiebe,
Jan Kronqvist,
Calvin Tsay,
Ruth Misener
Abstract:
It is well-documented how artificial intelligence can have (and already is having) a big impact on chemical engineering. But classical machine learning approaches may be weak for many chemical engineering applications. This review discusses how challenging data characteristics arise in chemical engineering applications. We identify four characteristics of data arising in chemical engineering appli…
▽ More
It is well-documented how artificial intelligence can have (and already is having) a big impact on chemical engineering. But classical machine learning approaches may be weak for many chemical engineering applications. This review discusses how challenging data characteristics arise in chemical engineering applications. We identify four characteristics of data arising in chemical engineering applications that make applying classical artificial intelligence approaches difficult: (1) high variance, low volume data, (2) low variance, high volume data, (3) noisy/corrupt/missing data, and (4) restricted data with physics-based limitations. For each of these four data characteristics, we discuss applications where these data characteristics arise and show how current chemical engineering research is extending the fields of data science and machine learning to incorporate these challenges. Finally, we identify several challenges for future research.
△ Less
Submitted 24 January, 2022;
originally announced January 2022.
-
Partition-based formulations for mixed-integer optimization of trained ReLU neural networks
Authors:
Calvin Tsay,
Jan Kronqvist,
Alexander Thebelt,
Ruth Misener
Abstract:
This paper introduces a class of mixed-integer formulations for trained ReLU neural networks. The approach balances model size and tightness by partitioning node inputs into a number of groups and forming the convex hull over the partitions via disjunctive programming. At one extreme, one partition per input recovers the convex hull of a node, i.e., the tightest possible formulation for each node.…
▽ More
This paper introduces a class of mixed-integer formulations for trained ReLU neural networks. The approach balances model size and tightness by partitioning node inputs into a number of groups and forming the convex hull over the partitions via disjunctive programming. At one extreme, one partition per input recovers the convex hull of a node, i.e., the tightest possible formulation for each node. For fewer partitions, we develop smaller relaxations that approximate the convex hull, and show that they outperform existing formulations. Specifically, we propose strategies for partitioning variables based on theoretical motivations and validate these strategies using extensive computational experiments. Furthermore, the proposed scheme complements known algorithmic approaches, e.g., optimization-based bound tightening captures dependencies within a partition.
△ Less
Submitted 20 October, 2021; v1 submitted 8 February, 2021;
originally announced February 2021.
-
Between steps: Intermediate relaxations between big-M and convex hull formulations
Authors:
Jan Kronqvist,
Ruth Misener,
Calvin Tsay
Abstract:
This work develops a class of relaxations in between the big-M and convex hull formulations of disjunctions, drawing advantages from both. The proposed "P-split" formulations split convex additively separable constraints into P partitions and form the convex hull of the partitioned disjuncts. Parameter P represents the trade-off of model size vs. relaxation strength. We examine the novel formulati…
▽ More
This work develops a class of relaxations in between the big-M and convex hull formulations of disjunctions, drawing advantages from both. The proposed "P-split" formulations split convex additively separable constraints into P partitions and form the convex hull of the partitioned disjuncts. Parameter P represents the trade-off of model size vs. relaxation strength. We examine the novel formulations and prove that, under certain assumptions, the relaxations form a hierarchy starting from a big-M equivalent and converging to the convex hull. We computationally compare the proposed formulations to big-M and convex hull formulations on a test set including: K-means clustering, P_ball problems, and ReLU neural networks. The computational results show that the intermediate P-split formulations can form strong outer approximations of the convex hull with fewer variables and constraints than the extended convex hull formulations, giving significant computational advantages over both the big-M and convex hull.
△ Less
Submitted 29 January, 2021;
originally announced January 2021.
-
ENTMOOT: A Framework for Optimization over Ensemble Tree Models
Authors:
Alexander Thebelt,
Jan Kronqvist,
Miten Mistry,
Robert M. Lee,
Nathan Sudermann-Merx,
Ruth Misener
Abstract:
Gradient boosted trees and other regression tree models perform well in a wide range of real-world, industrial applications. These tree models (i) offer insight into important prediction features, (ii) effectively manage sparse data, and (iii) have excellent prediction capabilities. Despite their advantages, they are generally unpopular for decision-making tasks and black-box optimization, which i…
▽ More
Gradient boosted trees and other regression tree models perform well in a wide range of real-world, industrial applications. These tree models (i) offer insight into important prediction features, (ii) effectively manage sparse data, and (iii) have excellent prediction capabilities. Despite their advantages, they are generally unpopular for decision-making tasks and black-box optimization, which is due to their difficult-to optimize structure and the lack of a reliable uncertainty measure. ENTMOOT is our new framework for integrating (already trained) tree models into larger optimization problems. The contributions of ENTMOOT include: (i) explicitly introducing a reliable uncertainty measure that is compatible with tree models, (ii) solving the larger optimization problems that incorporate these uncertainty aware tree models, (iii) proving that the solutions are globally optimal, i.e. no better solution exists. In particular, we show how the ENTMOOT approach allows a simple integration of tree models into decision-making and black-box optimization, where it proves as a strong competitor to commonly-used frameworks.
△ Less
Submitted 18 May, 2021; v1 submitted 10 March, 2020;
originally announced March 2020.