Search | arXiv e-print repository

Catastrophe Insurance: An Adaptive Robust Optimization Approach

Authors: Dimitris Bertsimas, Cynthia Zeng

Abstract: The escalating frequency and severity of natural disasters, exacerbated by climate change, underscore the critical role of insurance in facilitating recovery and promoting investments in risk reduction. This work introduces a novel Adaptive Robust Optimization (ARO) framework tailored for the calculation of catastrophe insurance premiums, with a case study applied to the United States National Flo… ▽ More The escalating frequency and severity of natural disasters, exacerbated by climate change, underscore the critical role of insurance in facilitating recovery and promoting investments in risk reduction. This work introduces a novel Adaptive Robust Optimization (ARO) framework tailored for the calculation of catastrophe insurance premiums, with a case study applied to the United States National Flood Insurance Program (NFIP). To the best of our knowledge, it is the first time an ARO approach has been applied to for disaster insurance pricing. Our methodology is designed to protect against both historical and emerging risks, the latter predicted by machine learning models, thus directly incorporating amplified risks induced by climate change. Using the US flood insurance data as a case study, optimization models demonstrate effectiveness in covering losses and produce surpluses, with a smooth balance transition through parameter fine-tuning. Among tested optimization models, results show ARO models with conservative parameter values achieving low number of insolvent states with the least insurance premium charged. Overall, optimization frameworks offer versatility and generalizability, making it adaptable to a variety of natural disaster scenarios, such as wildfires, droughts, etc. This work not only advances the field of insurance premium modeling but also serves as a vital tool for policymakers and stakeholders in building resilience to the growing risks of natural catastrophes. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2403.19871 [pdf, other]

Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences

Authors: Dimitris Bertsimas, Vassilis Digalakis Jr, Yu Ma, Phevos Paschalidis

Abstract: We consider the task of retraining machine learning (ML) models when new batches of data become available. Existing methods focus largely on greedy approaches to find the best-performing model for each batch, without considering the stability of the model's structure across retraining iterations. In this study, we propose a methodology for finding sequences of ML models that are stable across retr… ▽ More We consider the task of retraining machine learning (ML) models when new batches of data become available. Existing methods focus largely on greedy approaches to find the best-performing model for each batch, without considering the stability of the model's structure across retraining iterations. In this study, we propose a methodology for finding sequences of ML models that are stable across retraining iterations. We develop a mixed-integer optimization formulation that is guaranteed to recover Pareto optimal models (in terms of the predictive power-stability trade-off) and an efficient polynomial-time algorithm that performs well in practice. We focus on retaining consistent analytical insights - which is important to model interpretability, ease of implementation, and fostering trust with users - by using custom-defined distance metrics that can be directly incorporated into the optimization problem. Our method shows stronger stability than greedily trained models with a small, controllable sacrifice in predictive power, as evidenced through a real-world case study in a major hospital system in Connecticut. △ Less

Submitted 22 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2311.06960 [pdf, other]

Robust Regression over Averaged Uncertainty

Authors: Dimitris Bertsimas, Yu Ma

Abstract: We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing… ▽ More We propose a new formulation of robust regression by integrating all realizations of the uncertainty set and taking an averaged approach to obtain the optimal solution for the ordinary least-squared regression problem. We show that this formulation surprisingly recovers ridge regression and establishes the missing link between robust optimization and the mean squared error approaches for existing regression problems. We first prove the equivalence for four uncertainty sets: ellipsoidal, box, diamond, and budget, and provide closed-form formulations of the penalty term as a function of the sample size, feature size, as well as perturbation protection strength. We then show in synthetic datasets with different levels of perturbations, a consistent improvement of the averaged formulation over the existing worst-case formulation in out-of-sample performance. Importantly, as the perturbation level increases, the improvement increases, confirming our method's advantage in high-noise environments. We report similar improvements in the out-of-sample datasets in real-world regression problems obtained from UCI datasets. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2311.01742 [pdf, other]

Global Optimization: A Machine Learning Approach

Authors: Dimitris Bertsimas, Georgios Margaritis

Abstract: Many approaches for addressing Global Optimization problems typically rely on relaxations of nonlinear constraints over specific mathematical primitives. This is restricting in applications with constraints that are black-box, implicit or consist of more general primitives. Trying to address such limitations, Bertsimas and Ozturk (2023) proposed OCTHaGOn as a way of solving black-box global optimi… ▽ More Many approaches for addressing Global Optimization problems typically rely on relaxations of nonlinear constraints over specific mathematical primitives. This is restricting in applications with constraints that are black-box, implicit or consist of more general primitives. Trying to address such limitations, Bertsimas and Ozturk (2023) proposed OCTHaGOn as a way of solving black-box global optimization problems by approximating the nonlinear constraints using hyperplane-based Decision-Trees and then using those trees to construct a unified mixed integer optimization (MIO) approximation of the original problem. We provide extensions to this approach, by (i) approximating the original problem using other MIO-representable ML models besides Decision Trees, such as Gradient Boosted Trees, Multi Layer Perceptrons and Suport Vector Machines, (ii) proposing adaptive sampling procedures for more accurate machine learning-based constraint approximations, (iii) utilizing robust optimization to account for the uncertainty of the sample-dependent training of the ML models, and (iv) leveraging a family of relaxations to address the infeasibilities of the final MIO approximation. We then test the enhanced framework in 81 Global Optimization instances. We show improvements in solution feasibility and optimality in the majority of instances. We also compare against BARON, showing improved optimality gaps or solution times in 11 instances. △ Less

Submitted 3 November, 2023; originally announced November 2023.

Comments: Submitted to the Journal of Global Optimization. 35 pages

arXiv:2309.08162 [pdf, other]

Adaptive Pricing in Unit Commitment Under Load and Capacity Uncertainty

Authors: Dimitris Bertsimas, Angelos G. Koulouras

Abstract: The increase of renewables in the grid and the volatility of the load create uncertainties in the day-ahead prices of electricity markets. Adaptive robust optimization (ARO) and stochastic optimization have been used to make commitment and dispatch decisions that adapt to the load and capacity uncertainty. These approaches have been successfully applied in practice but current pricing approaches u… ▽ More The increase of renewables in the grid and the volatility of the load create uncertainties in the day-ahead prices of electricity markets. Adaptive robust optimization (ARO) and stochastic optimization have been used to make commitment and dispatch decisions that adapt to the load and capacity uncertainty. These approaches have been successfully applied in practice but current pricing approaches used by US Independent System Operators (marginal pricing) and proposed in the literature (convex hull pricing) have two major disadvantages: a) they are deterministic in nature, that is they do not adapt to the load and capacity uncertainty, and b) require uplift payments to the generators that are typically determined by ad hoc procedures and create inefficiencies that motivate self-scheduling. In this work, we extend pay-as-bid and uniform pricing mechanisms to propose the first adaptive pricing method in electricity markets that adapts to the load and capacity uncertainty, eliminates post-market uplifts and deters self-scheduling, addressing both disadvantages. △ Less

Submitted 15 September, 2023; originally announced September 2023.

arXiv:2307.12409 [pdf, other]

A Machine Learning Approach to Two-Stage Adaptive Robust Optimization

Authors: Dimitris Bertsimas, Cheol Woo Kim

Abstract: We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization (ARO) problems with binary here-and-now variables and polyhedral uncertainty sets. We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions into what we denote as the strategy. We solve multi… ▽ More We propose an approach based on machine learning to solve two-stage linear adaptive robust optimization (ARO) problems with binary here-and-now variables and polyhedral uncertainty sets. We encode the optimal here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the optimal wait-and-see decisions into what we denote as the strategy. We solve multiple similar ARO instances in advance using the column and constraint generation algorithm and extract the optimal strategies to generate a training set. We train a machine learning model that predicts high-quality strategies for the here-and-now decisions, the worst-case scenarios associated with the optimal here-and-now decisions, and the wait-and-see decisions. We also introduce an algorithm to reduce the number of different target classes the machine learning algorithm needs to be trained on. We apply the proposed approach to the facility location, the multi-item inventory control and the unit commitment problems. Our approach solves ARO problems drastically faster than the state-of-the-art algorithms with high accuracy. △ Less

Submitted 7 December, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

arXiv:2305.17299 [pdf, other]

Improving Stability in Decision Tree Models

Authors: Dimitris Bertsimas, Vassilis Digalakis Jr

Abstract: Owing to their inherently interpretable structure, decision trees are commonly used in applications where interpretability is essential. Recent work has focused on improving various aspects of decision trees, including their predictive power and robustness; however, their instability, albeit well-documented, has been addressed to a lesser extent. In this paper, we take a step towards the stabiliza… ▽ More Owing to their inherently interpretable structure, decision trees are commonly used in applications where interpretability is essential. Recent work has focused on improving various aspects of decision trees, including their predictive power and robustness; however, their instability, albeit well-documented, has been addressed to a lesser extent. In this paper, we take a step towards the stabilization of decision tree models through the lens of real-world health care applications due to the relevance of stability and interpretability in this space. We introduce a new distance metric for decision trees and use it to determine a tree's level of stability. We propose a novel methodology to train stable decision trees and investigate the existence of trade-offs that are inherent to decision tree models - including between stability, predictive power, and interpretability. We demonstrate the value of the proposed methodology through an extensive quantitative and qualitative analysis of six case studies from real-world health care applications, and we show that, on average, with a small 4.6% decrease in predictive power, we gain a significant 38% improvement in the model's stability. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2305.12292 [pdf, other]

Optimal Low-Rank Matrix Completion: Semidefinite Relaxations and Eigenvector Disjunctions

Authors: Dimitris Bertsimas, Ryan Cory-Wright, Sean Lo, Jean Pauphilet

Abstract: Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye. We… ▽ More Low-rank matrix completion consists of computing a matrix of minimal complexity that recovers a given set of observations as accurately as possible. Unfortunately, existing methods for matrix completion are heuristics that, while highly scalable and often identifying high-quality solutions, do not possess any optimality guarantees. We reexamine matrix completion with an optimality-oriented eye. We reformulate these low-rank problems as convex problems over the non-convex set of projection matrices and implement a disjunctive branch-and-bound scheme that solves them to certifiable optimality. Further, we derive a novel and often tight class of convex relaxations by decomposing a low-rank matrix as a sum of rank-one matrices and incentivizing that two-by-two minors in each rank-one matrix have determinant zero. In numerical experiments, our new convex relaxations decrease the optimality gap by two orders of magnitude compared to existing attempts, and our disjunctive branch-and-bound scheme solves nxn rank-r matrix completion problems to certifiable optimality in hours for n<=150 and r<=5. △ Less

Submitted 26 January, 2024; v1 submitted 20 May, 2023; originally announced May 2023.

Comments: Updated version with new numerics showcasing relaxation for rank k>1

arXiv:2304.04308 [pdf, other]

Ensemble Modeling for Time Series Forecasting: an Adaptive Robust Optimization Approach

Authors: Dimitris Bertsimas, Leonard Boussioux

Abstract: Accurate time series forecasting is critical for a wide range of problems with temporal data. Ensemble modeling is a well-established technique for leveraging multiple predictive models to increase accuracy and robustness, as the performance of a single predictor can be highly variable due to shifts in the underlying data distribution. This paper proposes a new methodology for building robust ense… ▽ More Accurate time series forecasting is critical for a wide range of problems with temporal data. Ensemble modeling is a well-established technique for leveraging multiple predictive models to increase accuracy and robustness, as the performance of a single predictor can be highly variable due to shifts in the underlying data distribution. This paper proposes a new methodology for building robust ensembles of time series forecasting models. Our approach utilizes Adaptive Robust Optimization (ARO) to construct a linear regression ensemble in which the models' weights can adapt over time. We demonstrate the effectiveness of our method through a series of synthetic experiments and real-world applications, including air pollution management, energy consumption forecasting, and tropical cyclone intensity forecasting. Our results show that our adaptive ensembles outperform the best ensemble member in hindsight by 16-26% in root mean square error and 14-28% in conditional value at risk and improve over competitive ensemble techniques. △ Less

Submitted 9 April, 2023; originally announced April 2023.

arXiv:2303.07695 [pdf, other]

A Stochastic Benders Decomposition Scheme for Large-Scale Stochastic Network Design

Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet, Periklis Petridis

Abstract: Network design problems involve constructing edges in a transportation or supply chain network to minimize construction and daily operational costs. We study a stochastic version where operational costs are uncertain due to fluctuating demand and estimated as a sample average from historical data. This problem is computationally challenging, and instances with as few as 100 nodes often cannot be s… ▽ More Network design problems involve constructing edges in a transportation or supply chain network to minimize construction and daily operational costs. We study a stochastic version where operational costs are uncertain due to fluctuating demand and estimated as a sample average from historical data. This problem is computationally challenging, and instances with as few as 100 nodes often cannot be solved to optimality using current decomposition techniques. We propose a stochastic variant of Benders decomposition that mitigates the high computational cost of generating each cut by sampling a subset of the data at each iteration and nonetheless generates deterministically valid cuts, rather than the probabilistically valid cuts frequently proposed in the stochastic optimization literature, via a dual averaging technique. We implement both single-cut and multi-cut variants of this Benders decomposition, as well as a variant that uses clustering of the historical scenarios. To our knowledge, this is the first single-tree implementation of Benders decomposition that facilitates sampling. On instances with 100-200 nodes and relatively complete recourse, our algorithm achieves 5-7% optimality gaps, compared with 16-27% for deterministic Benders schemes, and scales to instances with 700 nodes and 50 commodities within hours. Beyond network design, our strategy could be adapted to generic two-stage stochastic mixed-integer optimization problems where second-stage costs are estimated via a sample average. △ Less

Submitted 29 April, 2024; v1 submitted 14 March, 2023; originally announced March 2023.

arXiv:2303.06515 [pdf, other]

Multistage Stochastic Optimization via Kernels

Authors: Dimitris Bertsimas, Kimberly Villalobos Carballo

Abstract: We develop a non-parametric, data-driven, tractable approach for solving multistage stochastic optimization problems in which decisions do not affect the uncertainty. The proposed framework represents the decision variables as elements of a reproducing kernel Hilbert space and performs functional stochastic gradient descent to minimize the empirical regularized loss. By incorporating sparsificatio… ▽ More We develop a non-parametric, data-driven, tractable approach for solving multistage stochastic optimization problems in which decisions do not affect the uncertainty. The proposed framework represents the decision variables as elements of a reproducing kernel Hilbert space and performs functional stochastic gradient descent to minimize the empirical regularized loss. By incorporating sparsification techniques based on function subspace projections we are able to overcome the computational complexity that standard kernel methods introduce as the data size increases. We prove that the proposed approach is asymptotically optimal for multistage stochastic optimization with side information. Across various computational experiments on stochastic inventory management problems, {our method performs well in multidimensional settings} and remains tractable when the data size is large. Lastly, by computing lower bounds for the optimal loss of the inventory control problem, we show that the proposed method produces decision rules with near-optimal average performance. △ Less

Submitted 11 March, 2023; originally announced March 2023.

arXiv:2302.10369 [pdf, other]

The Benefit of Uncertainty Coupling in Robust and Adaptive Robust Optimization

Authors: Dimitris Bertsimas, Liangyuan Na, Bartolomeo Stellato

Abstract: Despite the modeling power for problems under uncertainty, robust optimization (RO) and adaptive robust optimization (ARO) can exhibit too conservative solutions in terms of objective value degradation compared to the nominal case. One of the main reasons behind this conservatism is that, in many practical applications, uncertain constraints are directly designed as constraint-wise without taking… ▽ More Despite the modeling power for problems under uncertainty, robust optimization (RO) and adaptive robust optimization (ARO) can exhibit too conservative solutions in terms of objective value degradation compared to the nominal case. One of the main reasons behind this conservatism is that, in many practical applications, uncertain constraints are directly designed as constraint-wise without taking into account couplings over multiple constraints. In this paper, we define a coupled uncertainty set as the intersection between a constraint-wise uncertainty set and a coupling set. We study the benefit of coupling in alleviating conservatism in RO and ARO. We provide theoretical tight and computable upper and lower bounds on the objective value improvement of RO and ARO problems under coupled uncertainty over constraint-wise uncertainty. In addition, we relate the power of adaptability over static solutions with the coupling of uncertainty set. Computational results demonstrate the benefit of coupling in applications. △ Less

Submitted 20 February, 2023; originally announced February 2023.

Comments: 51 pages, 13 figures

arXiv:2210.08326 [pdf, ps, other]

Distributionally Robust Causal Inference with Observational Data

Authors: Dimitris Bertsimas, Kosuke Imai, Michael Lingzhi Li

Abstract: We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then… ▽ More We consider the estimation of average treatment effects in observational studies and propose a new framework of robust causal inference with unobserved confounders. Our approach is based on distributionally robust optimization and proceeds in two steps. We first specify the maximal degree to which the distribution of unobserved potential outcomes may deviate from that of observed outcomes. We then derive sharp bounds on the average treatment effects under this assumption. Our framework encompasses the popular marginal sensitivity model as a special case, and we demonstrate how the proposed methodology can address a primary challenge of the marginal sensitivity model that it produces uninformative results when unobserved confounders substantially affect treatment and outcome. Specifically, we develop an alternative sensitivity model, called the distributional sensitivity model, under the assumption that heterogeneity of treatment effect due to unobserved variables is relatively small. Unlike the marginal sensitivity model, the distributional sensitivity model allows for potential lack of overlap and often produces informative bounds even when unobserved variables substantially affect both treatment and outcome. Finally, we show how to extend the distributional sensitivity model to difference-in-differences designs and settings with instrumental variables. Through simulation and empirical studies, we demonstrate the applicability of the proposed methodology. △ Less

Submitted 2 February, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

arXiv:2210.06590 [pdf, other]

Sparse PCA: a Geometric Approach

Authors: Dimitris Bertsimas, Driss Lahlou Kitane

Abstract: We consider the problem of maximizing the variance explained from a data matrix using orthogonal sparse principal components that have a support of fixed cardinality. While most existing methods focus on building principal components (PCs) iteratively through deflation, we propose GeoSPCA, a novel algorithm to build all PCs at once while satisfying the orthogonality constraints which brings substa… ▽ More We consider the problem of maximizing the variance explained from a data matrix using orthogonal sparse principal components that have a support of fixed cardinality. While most existing methods focus on building principal components (PCs) iteratively through deflation, we propose GeoSPCA, a novel algorithm to build all PCs at once while satisfying the orthogonality constraints which brings substantial benefits over deflation. This novel approach is based on the left eigenvalues of the covariance matrix which helps circumvent the non-convexity of the problem by approximating the optimal solution using a binary linear optimization problem that can find the optimal solution. The resulting approximation can be used to tackle different versions of the sparse PCA problem including the case in which the principal components share the same support or have disjoint supports and the Structured Sparse PCA problem. We also propose optimality bounds and illustrate the benefits of GeoSPCA in selected real world problems both in terms of explained variance, sparsity and tractability. Improvements vs. the greedy algorithm, which is often at par with state-of-the-art techniques, reaches up to 24% in terms of variance while solving real world problems with 10,000s of variables and support cardinality of 100s in minutes. We also apply GeoSPCA in a face recognition problem yielding more than 10% improvement vs. other PCA based technique such as structured sparse PCA. △ Less

Submitted 12 October, 2022; originally announced October 2022.

Comments: 35 pages, 10 figures

arXiv:2209.06341 [pdf, other]

Decarbonizing OCP

Authors: Dimitris Bertsimas, Ryan Cory-Wright, Vassilis Digalakis Jr

Abstract: We present our collaboration with the OCP Group, one of the world's largest producers of phosphate and phosphate-based products, to reduce OCP's carbon emissions significantly. We study the problem of decarbonizing OCP's electricity supply by installing a mixture of solar panels and batteries to minimize its time-discounted investment cost plus the cost of satisfying its remaining demand via the n… ▽ More We present our collaboration with the OCP Group, one of the world's largest producers of phosphate and phosphate-based products, to reduce OCP's carbon emissions significantly. We study the problem of decarbonizing OCP's electricity supply by installing a mixture of solar panels and batteries to minimize its time-discounted investment cost plus the cost of satisfying its remaining demand via the national grid. OCP is currently designing its renewable investment strategy, using insights gleaned from our optimization model, and has pledged to invest 130 billion MAD (approx. 13 billion USD) in a green initiative by 2027, a subset of which involves decarbonization. We immunize our model against deviations between forecast and realized solar generation output via a combination of robust and distributionally robust optimization. To account for variability in daily solar generation, we propose a data-driven robust optimization approach that prevents excessive conservatism by averaging across uncertainty sets. To protect against variability in seasonal weather patterns induced by climate change, we invoke distributionally robust optimization techniques. Under a 10 billion MAD investment by OCP, the proposed methodology reduces the carbon emissions which arise from OCP's energy needs by more than 70% while generating a net present value (NPV) of 5 billion MAD over twenty years. Moreover, a 20 billion MAD investment induces a 95% reduction in carbon emissions and generates an NPV of around 2 billion MAD. To fulfill the Paris climate agreement, rapidly decarbonizing the global economy in a financially sustainable fashion is imperative. Accordingly, this work develops a robust optimization methodology that enables OCP to decarbonize at a profit by purchasing solar panels and batteries. Moreover, the methodology could be applied to decarbonize other industrial consumers. △ Less

Submitted 16 June, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

Comments: Submitted to MSOM on 08/2022

arXiv:2206.00176 [pdf, other]

Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization

Authors: Dimitris Bertsimas, Wes Gurnee

Abstract: Discovering governing equations of complex dynamical systems directly from data is a central problem in scientific machine learning. In recent years, the sparse identification of nonlinear dynamics (SINDy) framework, powered by heuristic sparse regression methods, has become a dominant tool for learning parsimonious models. We propose an exact formulation of the SINDy problem using mixed-integer o… ▽ More Discovering governing equations of complex dynamical systems directly from data is a central problem in scientific machine learning. In recent years, the sparse identification of nonlinear dynamics (SINDy) framework, powered by heuristic sparse regression methods, has become a dominant tool for learning parsimonious models. We propose an exact formulation of the SINDy problem using mixed-integer optimization (MIO) to solve the sparsity constrained regression problem to provable optimality in seconds. On a large number of canonical ordinary and partial differential equations, we illustrate the dramatic improvement of our approach in accurate model discovery while being more sample efficient, robust to noise, and flexible in accommodating physical constraints. △ Less

Submitted 31 May, 2022; originally announced June 2022.

arXiv:2202.06017 [pdf, other]

Global Optimization via Optimal Decision Trees

Authors: Dimitris Bertsimas, Berk Öztürk

Abstract: The global optimization literature places large emphasis on reducing intractable optimization problems into more tractable structured optimization forms. In order to achieve this goal, many existing methods are restricted to optimization over explicit constraints and objectives that use a subset of possible mathematical primitives. These are limiting in real-world contexts where more general expli… ▽ More The global optimization literature places large emphasis on reducing intractable optimization problems into more tractable structured optimization forms. In order to achieve this goal, many existing methods are restricted to optimization over explicit constraints and objectives that use a subset of possible mathematical primitives. These are limiting in real-world contexts where more general explicit and black box constraints appear. Leveraging the dramatic speed improvements in mixed-integer optimization (MIO) and recent research in machine learning, we propose a new method to learn MIO-compatible approximations of global optimization problems using optimal decision trees with hyperplanes (OCT-Hs). This constraint learning approach only requires a bounded variable domain, and can address both explicit and inexplicit constraints. We solve the MIO approximation efficiently to find a near-optimal, near-feasible solution to the global optimization problem. We further improve the solution using a series of projected gradient descent iterations. We test the method on a number of numerical benchmarks from the literature as well as real-world design problems, demonstrating its promise in finding global optima efficiently. △ Less

Submitted 12 February, 2022; originally announced February 2022.

Comments: 52 pages, 9 figures, 10 tables. Submitted to Operations Research

arXiv:2112.09279 [pdf, other]

Robust Upper Bounds for Adversarial Training

Authors: Dimitris Bertsimas, Xavier Boix, Kimberly Villalobos Carballo, Dick den Hertog

Abstract: Many state-of-the-art adversarial training methods for deep learning leverage upper bounds of the adversarial loss to provide security guarantees against adversarial attacks. Yet, these methods rely on convex relaxations to propagate lower and upper bounds for intermediate layers, which affect the tightness of the bound at the output layer. We introduce a new approach to adversarial training by mi… ▽ More Many state-of-the-art adversarial training methods for deep learning leverage upper bounds of the adversarial loss to provide security guarantees against adversarial attacks. Yet, these methods rely on convex relaxations to propagate lower and upper bounds for intermediate layers, which affect the tightness of the bound at the output layer. We introduce a new approach to adversarial training by minimizing an upper bound of the adversarial loss that is based on a holistic expansion of the network instead of separate bounds for each layer. This bound is facilitated by state-of-the-art tools from Robust Optimization; it has closed-form and can be effectively trained using backpropagation. We derive two new methods with the proposed approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from Linear Robust Optimization to obtain an empirical upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes a provable upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our approach -- RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations. △ Less

Submitted 5 April, 2023; v1 submitted 16 December, 2021; originally announced December 2021.

arXiv:2111.04469 [pdf, other]

Mixed-Integer Optimization with Constraint Learning

Authors: Donato Maragno, Holly Wiberg, Dimitris Bertsimas, S. Ilker Birbil, Dick den Hertog, Adejuyigbe Fajemisin

Abstract: We establish a broad methodological foundation for mixed-integer optimization with learned constraints. We propose an end-to-end pipeline for data-driven decision making in which constraints and objectives are directly learned from data using machine learning, and the trained models are embedded in an optimization formulation. We exploit the mixed-integer optimization-representability of many mach… ▽ More We establish a broad methodological foundation for mixed-integer optimization with learned constraints. We propose an end-to-end pipeline for data-driven decision making in which constraints and objectives are directly learned from data using machine learning, and the trained models are embedded in an optimization formulation. We exploit the mixed-integer optimization-representability of many machine learning methods, including linear models, decision trees, ensembles, and multi-layer perceptrons, which allows us to capture various underlying relationships between decisions, contextual variables, and outcomes. We also introduce two approaches for handling the inherent uncertainty of learning from data. First, we characterize a decision trust region using the convex hull of the observations, to ensure credible recommendations and avoid extrapolation. We efficiently incorporate this representation using column generation and propose a more flexible formulation to deal with low-density regions and high-dimensional datasets. Then, we propose an ensemble learning approach that enforces constraint satisfaction over multiple bootstrapped estimators or multiple algorithms. In combination with domain-driven components, the embedded models and trust region define a mixed-integer optimization problem for prescription generation. We implement this framework as a Python package (OptiCL) for practitioners. We demonstrate the method in both World Food Programme planning and chemotherapy optimization. The case studies illustrate the framework's ability to generate high-quality prescriptions as well as the value added by the trust region, the use of ensembles to control model robustness, the consideration of multiple machine learning methods, and the inclusion of multiple learned constraints. △ Less

Submitted 26 October, 2023; v1 submitted 4 November, 2021; originally announced November 2021.

arXiv:2109.12701 [pdf, other]

Sparse Plus Low Rank Matrix Decomposition: A Discrete Optimization Approach

Authors: Dimitris Bertsimas, Ryan Cory-Wright, Nicholas A. G. Johnson

Abstract: We study the Sparse Plus Low-Rank decomposition problem (SLR), which is the problem of decomposing a corrupted data matrix into a sparse matrix of perturbations plus a low-rank matrix containing the ground truth. SLR is a fundamental problem in Operations Research and Machine Learning which arises in various applications, including data compression, latent semantic indexing, collaborative filterin… ▽ More We study the Sparse Plus Low-Rank decomposition problem (SLR), which is the problem of decomposing a corrupted data matrix into a sparse matrix of perturbations plus a low-rank matrix containing the ground truth. SLR is a fundamental problem in Operations Research and Machine Learning which arises in various applications, including data compression, latent semantic indexing, collaborative filtering, and medical imaging. We introduce a novel formulation for SLR that directly models its underlying discreteness. For this formulation, we develop an alternating minimization heuristic that computes high-quality solutions and a novel semidefinite relaxation that provides meaningful bounds for the solutions returned by our heuristic. We also develop a custom branch-and-bound algorithm that leverages our heuristic and convex relaxations to solve small instances of SLR to certifiable (near) optimality. Given an input $n$-by-$n$ matrix, our heuristic scales to solve instances where $n=10000$ in minutes, our relaxation scales to instances where $n=200$ in hours, and our branch-and-bound algorithm scales to instances where $n=25$ in minutes. Our numerical results demonstrate that our approach outperforms existing state-of-the-art approaches in terms of rank, sparsity, and mean-square error while maintaining a comparable runtime. △ Less

Submitted 1 October, 2023; v1 submitted 26 September, 2021; originally announced September 2021.

Journal ref: Journal of Machine Learning Research, 24(267), 1-51 (2023)

arXiv:2105.05947 [pdf, other]

A new perspective on low-rank optimization

Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet

Abstract: A key question in many low-rank problems throughout optimization, machine learning, and statistics is to characterize the convex hulls of simple low-rank sets and judiciously apply these convex hulls to obtain strong yet computationally tractable convex relaxations. We invoke the matrix perspective function - the matrix analog of the perspective function - and characterize explicitly the convex hu… ▽ More A key question in many low-rank problems throughout optimization, machine learning, and statistics is to characterize the convex hulls of simple low-rank sets and judiciously apply these convex hulls to obtain strong yet computationally tractable convex relaxations. We invoke the matrix perspective function - the matrix analog of the perspective function - and characterize explicitly the convex hull of epigraphs of simple matrix convex functions under low-rank constraints. Further, we combine the matrix perspective function with orthogonal projection matrices-the matrix analog of binary variables which capture the row-space of a matrix-to develop a matrix perspective reformulation technique that reliably obtains strong relaxations for a variety of low-rank problems, including reduced rank regression, non-negative matrix factorization, and factor analysis. Moreover, we establish that these relaxations can be modeled via semidefinite constraints and thus optimized over tractably. The proposed approach parallels and generalizes the perspective reformulation technique in mixed-integer optimization and leads to new relaxations for a broad class of problems. △ Less

Submitted 2 March, 2022; v1 submitted 12 May, 2021; originally announced May 2021.

Comments: Major revision submitted to Mathematical Programming

arXiv:2103.02506 [pdf, ps, other]

Stochastic Cutting Planes for Data-Driven Optimization

Authors: Dimitris Bertsimas, Michael Lingzhi Li

Abstract: We introduce a stochastic version of the cutting-plane method for a large class of data-driven Mixed-Integer Nonlinear Optimization (MINLO) problems. We show that under very weak assumptions the stochastic algorithm is able to converge to an $ε$-optimal solution with high probability. Numerical experiments on several problems show that stochastic cutting planes is able to deliver a multiple order-… ▽ More We introduce a stochastic version of the cutting-plane method for a large class of data-driven Mixed-Integer Nonlinear Optimization (MINLO) problems. We show that under very weak assumptions the stochastic algorithm is able to converge to an $ε$-optimal solution with high probability. Numerical experiments on several problems show that stochastic cutting planes is able to deliver a multiple order-of-magnitude speedup compared to the standard cutting-plane method. We further experimentally explore the lower limits of sampling for stochastic cutting planes and show that for many problems, a sampling size of $O(\sqrt[3]{n})$ appears to be sufficient for high quality solutions. △ Less

Submitted 3 March, 2021; originally announced March 2021.

arXiv:2102.10773 [pdf, other]

Slowly Varying Regression under Sparsity

Authors: Dimitris Bertsimas, Vassilis Digalakis Jr, Michael Linghzi Li, Omar Skali Lami

Abstract: We present the framework of slowly varying regression under sparsity, allowing sparse regression models to exhibit slow and sparse variations. The problem of parameter estimation is formulated as a mixed-integer optimization problem. We demonstrate that it can be precisely reformulated as a binary convex optimization problem through a novel relaxation technique. This relaxation involves a new equa… ▽ More We present the framework of slowly varying regression under sparsity, allowing sparse regression models to exhibit slow and sparse variations. The problem of parameter estimation is formulated as a mixed-integer optimization problem. We demonstrate that it can be precisely reformulated as a binary convex optimization problem through a novel relaxation technique. This relaxation involves a new equality on Moore-Penrose inverses, convexifying the non-convex objective function while matching the original objective on all feasible binary points. This enables us to efficiently solve the problem to provable optimality using a cutting plane-type algorithm. We develop a highly optimized implementation of this algorithm, substantially improving upon the asymptotic computational complexity of a straightforward implementation. Additionally, we propose a fast heuristic method that guarantees a feasible solution and, as empirically illustrated, produces high-quality warm-start solutions for the binary optimization problem. To tune the framework's hyperparameters, we suggest a practical procedure relying on binary search that, under certain assumptions, is guaranteed to recover the true model parameters. On both synthetic and real-world datasets, we demonstrate that the resulting algorithm outperforms competing formulations in comparable times across various metrics, including estimation accuracy, predictive power, and computational time. The algorithm is highly scalable, allowing us to train models with thousands of parameters. Our implementation is available open-source at https://github.com/vvdigalakis/SSVRegression.git. △ Less

Submitted 11 November, 2023; v1 submitted 21 February, 2021; originally announced February 2021.

Comments: Submitted to Operations Research. First submission: 02/2021

arXiv:2102.07309 [pdf, other]

doi 10.1002/nav.22007

Where to locate COVID-19 mass vaccination facilities?

Authors: Dimitris Bertsimas, Vassilis Digalakis Jr., Alexander Jacquillat, Michael Lingzhi Li, Alessandro Previero

Abstract: The outbreak of COVID-19 led to a record-breaking race to develop a vaccine. However, the limited vaccine capacity creates another massive challenge: how to distribute vaccines to mitigate the near-end impact of the pandemic? In the United States in particular, the new Biden administration is launching mass vaccination sites across the country, raising the obvious question of where to locate these… ▽ More The outbreak of COVID-19 led to a record-breaking race to develop a vaccine. However, the limited vaccine capacity creates another massive challenge: how to distribute vaccines to mitigate the near-end impact of the pandemic? In the United States in particular, the new Biden administration is launching mass vaccination sites across the country, raising the obvious question of where to locate these clinics to maximize the effectiveness of the vaccination campaign. This paper tackles this question with a novel data-driven approach to optimize COVID-19 vaccine distribution. We first augment a state-of-the-art epidemiological model, called DELPHI, to capture the effects of vaccinations and the variability in mortality rates across age groups. We then integrate this predictive model into a prescriptive model to optimize the location of vaccination sites and subsequent vaccine allocation. The model is formulated as a bilinear, non-convex optimization model. To solve it, we propose a coordinate descent algorithm that iterates between optimizing vaccine distribution and simulating the dynamics of the pandemic. As compared to benchmarks based on demographic and epidemiological information, the proposed optimization approach increases the effectiveness of the vaccination campaign by an estimated $20\%$, saving an extra $4000$ extra lives in the United States over a three-month period. The proposed solution achieves critical fairness objectives -- by reducing the death toll of the pandemic in several states without hurting others -- and is highly robust to uncertainties and forecast errors -- by achieving similar benefits under a vast range of perturbations. △ Less

Submitted 18 July, 2021; v1 submitted 14 February, 2021; originally announced February 2021.

arXiv:2012.13331 [pdf, other]

Computation of Convex Hull Prices in Electricity Markets with Non-Convexities using Dantzig-Wolfe Decomposition

Authors: Panagiotis Andrianesis, Dimitris Bertsimas, Michael C. Caramanis, William W. Hogan

Abstract: The presence of non-convexities in electricity markets has been an active research area for about two decades. The -- inevitable under current marginal cost pricing -- problem of guaranteeing that no market participant incurs losses in the day-ahead market is addressed in current practice through make-whole payments a.k.a. uplift. Alternative pricing rules have been studied to deal with this probl… ▽ More The presence of non-convexities in electricity markets has been an active research area for about two decades. The -- inevitable under current marginal cost pricing -- problem of guaranteeing that no market participant incurs losses in the day-ahead market is addressed in current practice through make-whole payments a.k.a. uplift. Alternative pricing rules have been studied to deal with this problem. Among them, Convex Hull (CH) prices associated with minimum uplift have attracted significant attention. Several US Independent System Operators (ISOs) have considered CH prices but resorted to approximations, mainly because determining exact CH prices is computationally challenging, while providing little intuition about the price formation rationale. In this paper, we describe the CH price estimation problem by relying on Dantzig-Wolfe decomposition and Column Generation, as a tractable, highly paralellizable, and exact method -- i.e., yielding exact, not approximate, CH prices -- with guaranteed finite convergence. Moreover, the approach provides intuition on the underlying price formation rationale. A test bed of stylized examples provide an exposition of the intuition in the CH price formation. In addition, a realistic ISO dataset is used to support scalability and validate the proof-of-concept. △ Less

Submitted 24 October, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

Comments: 11 pages

arXiv:2012.04419 [pdf, ps, other]

Pareto Adaptive Robust Optimality via a Fourier-Motzkin Elimination Lens

Authors: Dimitris Bertsimas, Stefan ten Eikelder, Dick den Hertog, Nikolaos Trichakis

Abstract: We formalize the concept of Pareto Adaptive Robust Optimality (PARO) for linear Adaptive Robust Optimization (ARO) problems. A worst-case optimal solution pair of here-and-now decisions and wait-and-see decisions is PARO if it cannot be Pareto dominated by another solution, i.e., there does not exist another such pair that performs at least as good in all scenarios in the uncertainty set and stric… ▽ More We formalize the concept of Pareto Adaptive Robust Optimality (PARO) for linear Adaptive Robust Optimization (ARO) problems. A worst-case optimal solution pair of here-and-now decisions and wait-and-see decisions is PARO if it cannot be Pareto dominated by another solution, i.e., there does not exist another such pair that performs at least as good in all scenarios in the uncertainty set and strictly better in at least one scenario. We argue that, unlike PARO, extant solution approaches -- including those that adopt Pareto Robust Optimality from static robust optimization -- could fail in ARO and yield solutions that can be Pareto dominated. The latter could lead to inefficiencies and suboptimal performance in practice. We prove the existence of PARO solutions, and present particular approaches for finding and approximating such solutions. We present numerical results for a facility location problem that demonstrate the practical value of PARO solutions. Our analysis of PARO relies on an application of Fourier-Motzkin Elimination as a proof technique. We demonstrate how this technique can be valuable in the analysis of ARO problems, besides PARO. In particular, we employ it to devise more concise and more insightful proofs of known results on (worst-case) optimality of decision rule structures. △ Less

Submitted 5 May, 2022; v1 submitted 8 December, 2020; originally announced December 2020.

Comments: Revised version. 41 pages, 3 figures

arXiv:2009.10395 [pdf, other]

doi 10.1287/opre.2021.2182

Mixed-Projection Conic Optimization: A New Paradigm for Modeling Rank Constraints

Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet

Abstract: We propose a framework for modeling and solving low-rank optimization problems to certifiable optimality. We introduce symmetric projection matrices that satisfy $Y^2=Y$, the matrix analog of binary variables that satisfy $z^2=z$, to model rank constraints. By leveraging regularization and strong duality, we prove that this modeling paradigm yields tractable convex optimization problems over the n… ▽ More We propose a framework for modeling and solving low-rank optimization problems to certifiable optimality. We introduce symmetric projection matrices that satisfy $Y^2=Y$, the matrix analog of binary variables that satisfy $z^2=z$, to model rank constraints. By leveraging regularization and strong duality, we prove that this modeling paradigm yields tractable convex optimization problems over the non-convex set of orthogonal projection matrices. Furthermore, we design outer-approximation algorithms to solve low-rank problems to certifiable optimality, compute lower bounds via their semidefinite relaxations, and provide near-optimal solutions through rounding and local search techniques. We implement these numerical ingredients and, for the first time, solve low-rank optimization problems to certifiable optimality. Using currently available spatial branch-and-bound codes, not tailored to projection matrices, we can scale our exact (resp. near-exact) algorithms to matrices with up to 30 (resp. 600) rows/columns. Our algorithms also supply certifiably near-optimal solutions for larger problem sizes and outperform existing heuristics, by deriving an alternative to the popular nuclear norm relaxation which generalizes the perspective relaxation from vectors to matrices. All in all, our framework, which we name Mixed-Projection Conic Optimization, solves low-rank problems to certifiable optimality in a tractable and unified fashion. △ Less

Submitted 2 April, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

Comments: major revision submitted to Operations Research

Journal ref: Operations Research, Articles in Advance 2021

arXiv:2006.16509 [pdf, other]

From predictions to prescriptions: A data-driven response to COVID-19

Authors: Dimitris Bertsimas, Léonard Boussioux, Ryan Cory Wright, Arthur Delarue, Vassilis Digalakis Jr., Alexandre Jacquillat, Driss Lahlou Kitane, Galit Lukin, Michael Lingzhi Li, Luca Mingardi, Omid Nohadani, Agni Orfanoudaki, Theodore Papalexopoulos, Ivan Paskov, Jean Pauphilet, Omar Skali Lami, Bartolomeo Stellato, Hamza Tazi Bouardi, Kimberly Villalobos Carballo, Holly Wiberg, Cynthia Zeng

Abstract: The COVID-19 pandemic has created unprecedented challenges worldwide. Strained healthcare providers make difficult decisions on patient triage, treatment and care management on a daily basis. Policy makers have imposed social distancing measures to slow the disease, at a steep economic price. We design analytical tools to support these decisions and combat the pandemic. Specifically, we propose a… ▽ More The COVID-19 pandemic has created unprecedented challenges worldwide. Strained healthcare providers make difficult decisions on patient triage, treatment and care management on a daily basis. Policy makers have imposed social distancing measures to slow the disease, at a steep economic price. We design analytical tools to support these decisions and combat the pandemic. Specifically, we propose a comprehensive data-driven approach to understand the clinical characteristics of COVID-19, predict its mortality, forecast its evolution, and ultimately alleviate its impact. By leveraging cohort-level clinical data, patient-level hospital data, and census-level epidemiological data, we develop an integrated four-step approach, combining descriptive, predictive and prescriptive analytics. First, we aggregate hundreds of clinical studies into the most comprehensive database on COVID-19 to paint a new macroscopic picture of the disease. Second, we build personalized calculators to predict the risk of infection and mortality as a function of demographics, symptoms, comorbidities, and lab values. Third, we develop a novel epidemiological model to project the pandemic's spread and inform social distancing policies. Fourth, we propose an optimization model to re-allocate ventilators and alleviate shortages. Our results have been used at the clinical level by several hospitals to triage patients, guide care management, plan ICU capacity, and re-distribute ventilators. At the policy level, they are currently supporting safe back-to-work policies at a major institution and equitable vaccine distribution planning at a major pharmaceutical company, and have been integrated into the US Center for Disease Control's pandemic forecast. △ Less

Submitted 29 June, 2020; originally announced June 2020.

Comments: Submitted to PNAS

arXiv:2005.05195 [pdf, other]

Solving Large-Scale Sparse PCA to Certifiable (Near) Optimality

Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet

Abstract: Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem,… ▽ More Sparse principal component analysis (PCA) is a popular dimensionality reduction technique for obtaining principal components which are linear combinations of a small subset of the original features. Existing approaches cannot supply certifiably optimal principal components with more than $p=100s$ of variables. By reformulating sparse PCA as a convex mixed-integer semidefinite optimization problem, we design a cutting-plane method which solves the problem to certifiable optimality at the scale of selecting k=5 covariates from p=300 variables, and provides small bound gaps at a larger scale. We also propose a convex relaxation and greedy rounding scheme that provides bound gaps of $1-2\%$ in practice within minutes for $p=100$s or hours for $p=1,000$s and is therefore a viable alternative to the exact method at scale. Using real-world financial and medical datasets, we illustrate our approach's ability to derive interpretable principal components tractably at scale. △ Less

Submitted 25 August, 2021; v1 submitted 11 May, 2020; originally announced May 2020.

Comments: Revision submitted to JMLR

Journal ref: Journal of Machine Learning Research 23(13):1-35, 2022

arXiv:1910.09092 [pdf, ps, other]

Fast Exact Matrix Completion: A Unified Optimization Framework for Matrix Completion

Authors: Dimitris Bertsimas, Michael Lingzhi Li

Abstract: We formulate the problem of matrix completion with and without side information as a non-convex optimization problem. We design fastImpute based on non-convex gradient descent and show it converges to a global minimum that is guaranteed to recover closely the underlying matrix while it scales to matrices of sizes beyond $10^5 \times 10^5$. We report experiments on both synthetic and real-world dat… ▽ More We formulate the problem of matrix completion with and without side information as a non-convex optimization problem. We design fastImpute based on non-convex gradient descent and show it converges to a global minimum that is guaranteed to recover closely the underlying matrix while it scales to matrices of sizes beyond $10^5 \times 10^5$. We report experiments on both synthetic and real-world datasets that show fastImpute is competitive in both the accuracy of the matrix recovered and the time needed across all cases. Furthermore, when a high number of entries are missing, fastImpute is over $75\%$ lower in MAPE and $15$ times faster than current state-of-the-art matrix completion methods in both the case with side information and without. △ Less

Submitted 31 December, 2020; v1 submitted 20 October, 2019; originally announced October 2019.

Journal ref: Journal of Machine Learning Research 21 (2020) 1-43

arXiv:1910.03143 [pdf, other]

doi 10.1016/j.orl.2019.12.003

On Polyhedral and Second-Order Cone Decompositions of Semidefinite Optimization Problems

Authors: Dimitris Bertsimas, Ryan Cory-Wright

Abstract: We study a cutting-plane method for semidefinite optimization problems (SDOs), and supply a proof of the method's convergence, under a boundedness assumption. By relating the method's rate of convergence to an initial outer approximation's diameter, we argue that the method performs well when initialized with a second-order-cone approximation, instead of a linear approximation. We invoke the metho… ▽ More We study a cutting-plane method for semidefinite optimization problems (SDOs), and supply a proof of the method's convergence, under a boundedness assumption. By relating the method's rate of convergence to an initial outer approximation's diameter, we argue that the method performs well when initialized with a second-order-cone approximation, instead of a linear approximation. We invoke the method to provide bound gaps of 0.5-6.5% for sparse PCA problems with $1000$s of covariates, and solve nuclear norm problems over 500x500 matrices. △ Less

Submitted 2 December, 2019; v1 submitted 7 October, 2019; originally announced October 2019.

Comments: Submitted minor revision to Operations Research Letters; removed footnotes and corrected some minor typos in previous version

Journal ref: Operations Research Letters 48(1):78-85 (2020)

arXiv:1907.07307 [pdf, other]

Dynamic optimization with side information

Authors: Dimitris Bertsimas, Christopher McCord, Bradley Sturt

Abstract: We develop a tractable and flexible approach for incorporating side information into dynamic optimization under uncertainty. The proposed framework uses predictive machine learning methods (such as $k$-nearest neighbors, kernel regression, and random forests) to weight the relative importance of various data-driven uncertainty sets in a robust optimization formulation. Through a novel measure conc… ▽ More We develop a tractable and flexible approach for incorporating side information into dynamic optimization under uncertainty. The proposed framework uses predictive machine learning methods (such as $k$-nearest neighbors, kernel regression, and random forests) to weight the relative importance of various data-driven uncertainty sets in a robust optimization formulation. Through a novel measure concentration result for a class of machine learning methods, we prove that the proposed approach is asymptotically optimal for multi-period stochastic programming with side information. We also describe a general-purpose approximation for these optimization problems, based on overlap** linear decision rules, which is computationally tractable and produces high-quality solutions for dynamic problems with many stages. Across a variety of examples in inventory management, finance, and shipment planning, our method achieves improvements of up to 15\% over alternatives and requires less than one minute of computation time on problems with twelve stages. △ Less

Submitted 21 July, 2020; v1 submitted 16 July, 2019; originally announced July 2019.

arXiv:1907.07142 [pdf, other]

Two-stage sample robust optimization

Authors: Dimitris Bertsimas, Shimrit Shtern, Bradley Sturt

Abstract: We investigate a simple approximation scheme, based on overlap** linear decision rules, for solving data-driven two-stage distributionally robust optimization problems with the type-$\infty$ Wasserstein ambiguity set. Our main result establishes that this approximation scheme is asymptotically optimal for two-stage stochastic linear optimization problems; that is, under mild assumptions, the opt… ▽ More We investigate a simple approximation scheme, based on overlap** linear decision rules, for solving data-driven two-stage distributionally robust optimization problems with the type-$\infty$ Wasserstein ambiguity set. Our main result establishes that this approximation scheme is asymptotically optimal for two-stage stochastic linear optimization problems; that is, under mild assumptions, the optimal cost and optimal first-stage decisions obtained by approximating the robust optimization problem converge to those of the underlying stochastic problem as the number of data points grows to infinity. These guarantees notably apply to two-stage stochastic problems that do not have relatively complete recourse, which arise frequently in applications. In this context, we show through numerical experiments that the approximation scheme is practically tractable and produces decisions which significantly outperform those obtained from state-of-the-art data-driven alternatives. △ Less

Submitted 3 November, 2020; v1 submitted 16 July, 2019; originally announced July 2019.

arXiv:1907.02206 [pdf, other]

Online Mixed-Integer Optimization in Milliseconds

Authors: Dimitris Bertsimas, Bartolomeo Stellato

Abstract: We propose a method to solve online mixed-integer optimization (MIO) problems at very high speed using machine learning. By exploiting the repetitive nature of online optimization, we are able to greatly speedup the solution time. Our approach encodes the optimal solution into a small amount of information denoted as strategy using the Voice of Optimization framework proposed in [BS21]. In this wa… ▽ More We propose a method to solve online mixed-integer optimization (MIO) problems at very high speed using machine learning. By exploiting the repetitive nature of online optimization, we are able to greatly speedup the solution time. Our approach encodes the optimal solution into a small amount of information denoted as strategy using the Voice of Optimization framework proposed in [BS21]. In this way the core part of the optimization algorithm becomes a multiclass classification problem which can be solved very quickly. In this work, we extend that framework to real-time and high-speed applications focusing on parametric mixed-integer quadratic optimization (MIQO). We propose an extremely fast online optimization algorithm consisting of a feedforward neural network (NN) evaluation and a linear system solution where the matrix has already been factorized. Therefore, this online approach does not require any solver nor iterative algorithm. We show the speed of the proposed method both in terms of total computations required and measured execution time. We estimate the number of floating point operations (flops) required to completely recover the optimal solution as a function of the problem dimensions. Compared to state-of-the-art MIO routines, the online running time of our method is very predictable and can be lower than a single matrix factorization time. We benchmark our method against the state-of-the-art solver Gurobi obtaining from two to three orders of magnitude speedups on examples from fuel cell energy management, sparse portfolio optimization and motion planning with obstacle avoidance. △ Less

Submitted 22 March, 2021; v1 submitted 3 July, 2019; originally announced July 2019.

arXiv:1907.02109 [pdf, other]

doi 10.1137/20M1346778

A unified approach to mixed-integer optimization problems with logical constraints

Authors: Dimitris Bertsimas, Ryan Cory-Wright, Jean Pauphilet

Abstract: We propose a unified framework to address a family of classical mixed-integer optimization problems with logically constrained decision variables, including network design, facility location, unit commitment, sparse portfolio selection, binary quadratic optimization, sparse principal analysis and sparse learning problems. These problems exhibit logical relationships between continuous and discrete… ▽ More We propose a unified framework to address a family of classical mixed-integer optimization problems with logically constrained decision variables, including network design, facility location, unit commitment, sparse portfolio selection, binary quadratic optimization, sparse principal analysis and sparse learning problems. These problems exhibit logical relationships between continuous and discrete variables, which are usually reformulated linearly using a big-M formulation. In this work, we challenge this longstanding modeling practice and express the logical constraints in a non-linear way. By imposing a regularization condition, we reformulate these problems as convex binary optimization problems, which are solvable using an outer-approximation procedure. In numerical experiments, we establish that a general-purpose numerical strategy, which combines cutting-plane, first-order and local search methods, solves these problems faster and at a larger scale than state-of-the-art mixed-integer linear or second-order cone methods. Our approach successfully solves network design problems with 100s of nodes and provides solutions up to 40\% better than the state-of-the-art; sparse portfolio selection problems with up to 3,200 securities compared with 400 securities for previous attempts; and sparse regression problems with up to 100,000 covariates. △ Less

Submitted 25 January, 2021; v1 submitted 3 July, 2019; originally announced July 2019.

Comments: Revised version (including title change). The old title was "A unified approach to mixed-integer optimization: Nonlinear formulations and scalable algorithms"

arXiv:1906.10283 [pdf, other]

doi 10.1007/s10107-019-01419-7

Certifiably Optimal Sparse Inverse Covariance Estimation

Authors: Dimitris Bertsimas, Jourdain Lamperski, Jean Pauphilet

Abstract: We consider the maximum likelihood estimation of sparse inverse covariance matrices. We demonstrate that current heuristic approaches primarily encourage robustness, instead of the desired sparsity. We give a novel approach that solves the cardinality constrained likelihood problem to certifiable optimality. The approach uses techniques from mixed-integer optimization and convex optimization, and… ▽ More We consider the maximum likelihood estimation of sparse inverse covariance matrices. We demonstrate that current heuristic approaches primarily encourage robustness, instead of the desired sparsity. We give a novel approach that solves the cardinality constrained likelihood problem to certifiable optimality. The approach uses techniques from mixed-integer optimization and convex optimization, and provides a high-quality solution with a guarantee on its suboptimality, even if the algorithm is terminated early. Using a variety of synthetic and real datasets, we demonstrate that our approach can solve problems where the dimension of the inverse covariance matrix is up to 1,000s. We also demonstrate that our approach produces significantly sparser solutions than Glasso and other popular learning procedures, makes less false discoveries, while still maintaining state-of-the-art accuracy. △ Less

Submitted 24 June, 2019; originally announced June 2019.

MSC Class: 90C11; 90C22; 62H12

Journal ref: Mathematical Programming 184 (2020) 491-530

arXiv:1902.03272 [pdf, ps, other]

doi 10.1016/j.orl.2020.02.008

Scalable Holistic Linear Regression

Authors: Dimitris Bertsimas, Michael Lingzhi Li

Abstract: We propose a new scalable algorithm for holistic linear regression building on Bertsimas & King (2016). Specifically, we develop new theory to model significance and multicollinearity as lazy constraints rather than checking the conditions iteratively. The resulting algorithm scales with the number of samples $n$ in the 10,000s, compared to the low 100s in the previous framework. Computational res… ▽ More We propose a new scalable algorithm for holistic linear regression building on Bertsimas & King (2016). Specifically, we develop new theory to model significance and multicollinearity as lazy constraints rather than checking the conditions iteratively. The resulting algorithm scales with the number of samples $n$ in the 10,000s, compared to the low 100s in the previous framework. Computational results on real and synthetic datasets show it greatly improves from previous algorithms in accuracy, false detection rate, computational time and scalability. △ Less

Submitted 3 March, 2020; v1 submitted 8 February, 2019; originally announced February 2019.

Comments: Accepted by Operation Research Letters

arXiv:1812.09991 [pdf, other]

The Voice of Optimization

Authors: Dimitris Bertsimas, Bartolomeo Stellato

Abstract: We introduce the idea that using optimal classification trees (OCTs) and optimal classification trees with-hyperplanes (OCT-Hs), interpretable machine learning algorithms developed by Bertsimas and Dunn [2017, 2018], we are able to obtain insight on the strategy behind the optimal solution in continuous and mixed-integer convex optimization problem as a function of key parameters that affect the p… ▽ More We introduce the idea that using optimal classification trees (OCTs) and optimal classification trees with-hyperplanes (OCT-Hs), interpretable machine learning algorithms developed by Bertsimas and Dunn [2017, 2018], we are able to obtain insight on the strategy behind the optimal solution in continuous and mixed-integer convex optimization problem as a function of key parameters that affect the problem. In this way, optimization is not a black box anymore. Instead, we redefine optimization as a multiclass classification problem where the predictor gives insights on the logic behind the optimal solution. In other words, OCTs and OCT-Hs give optimization a voice. We show on several realistic examples that the accuracy behind our method is in the 90%-100% range, while even when the predictions are not correct, the degree of suboptimality or infeasibility is very low. We compare optimal strategy predictions of OCTs and OCT-Hs and feedforward neural networks (NNs) and conclude that the performance of OCT-Hs and NNs is comparable. OCTs are somewhat weaker but often competitive. Therefore, our approach provides a novel insightful understanding of optimal strategies to solve a broad class of continuous and mixed-integer optimization problems. △ Less

Submitted 2 June, 2020; v1 submitted 24 December, 2018; originally announced December 2018.

arXiv:1812.06647 [pdf, ps, other]

Interpretable Matrix Completion: A Discrete Optimization Approach

Authors: Dimitris Bertsimas, Michael Lingzhi Li

Abstract: We consider the problem of matrix completion on an $n \times m$ matrix. We introduce the problem of Interpretable Matrix Completion that aims to provide meaningful insights for the low-rank matrix using side information. We show that the problem can be reformulated as a binary convex optimization problem. We design OptComplete, based on a novel concept of stochastic cutting planes to enable effici… ▽ More We consider the problem of matrix completion on an $n \times m$ matrix. We introduce the problem of Interpretable Matrix Completion that aims to provide meaningful insights for the low-rank matrix using side information. We show that the problem can be reformulated as a binary convex optimization problem. We design OptComplete, based on a novel concept of stochastic cutting planes to enable efficient scaling of the algorithm up to matrices of sizes $n=10^6$ and $m=10^6$. We report experiments on both synthetic and real-world datasets that show that OptComplete has favorable scaling behavior and accuracy when compared with state-of-the-art methods for other types of matrix completion, while providing insight on the factors that affect the matrix. △ Less

Submitted 3 March, 2020; v1 submitted 17 December, 2018; originally announced December 2018.

Comments: Submitted to Operational Research

arXiv:1811.00138 [pdf, other]

doi 10.1287/ijoc.2021.1127

A Scalable Algorithm For Sparse Portfolio Selection

Authors: Dimitris Bertsimas, Ryan Cory-Wright

Abstract: The sparse portfolio selection problem is one of the most famous and frequently-studied problems in the optimization and financial economics literatures. In a universe of risky assets, the goal is to construct a portfolio with maximal expected return and minimum variance, subject to an upper bound on the number of positions, linear inequalities and minimum investment constraints. Existing certifia… ▽ More The sparse portfolio selection problem is one of the most famous and frequently-studied problems in the optimization and financial economics literatures. In a universe of risky assets, the goal is to construct a portfolio with maximal expected return and minimum variance, subject to an upper bound on the number of positions, linear inequalities and minimum investment constraints. Existing certifiably optimal approaches to this problem do not converge within a practical amount of time at real world problem sizes with more than 400 securities. In this paper, we propose a more scalable approach. By imposing a ridge regularization term, we reformulate the problem as a convex binary optimization problem, which is solvable via an efficient outer-approximation procedure. We propose various techniques for improving the performance of the procedure, including a heuristic which supplies high-quality warm-starts, a preprocessing technique for decreasing the gap at the root node, and an analytic technique for strengthening our cuts. We also study the problem's Boolean relaxation, establish that it is second-order-cone representable, and supply a sufficient condition for its tightness. In numerical experiments, we establish that the outer-approximation procedure gives rise to dramatic speedups for sparse portfolio selection problems. △ Less

Submitted 28 March, 2021; v1 submitted 31 October, 2018; originally announced November 2018.

Comments: Minor revision submitted to INFORMS Journal on Computing

Journal ref: INFORMS Journal on Computing, Articles in Advance, 2022

arXiv:1807.02812 [pdf, other]

A Scalable Algorithm for Two-Stage Adaptive Linear Optimization

Authors: Dimitris Bertsimas, Shimrit Shtern

Abstract: The column-and-constraint generation (CCG) method was introduced by \citet{Zeng2013} for solving two-stage adaptive optimization. We found that the CCG method is quite scalable, but sometimes, and in some applications often, produces infeasible first-stage solutions, even though the problem is feasible. In this research, we extend the CCG method in a way that (a) maintains scalability and (b) alwa… ▽ More The column-and-constraint generation (CCG) method was introduced by \citet{Zeng2013} for solving two-stage adaptive optimization. We found that the CCG method is quite scalable, but sometimes, and in some applications often, produces infeasible first-stage solutions, even though the problem is feasible. In this research, we extend the CCG method in a way that (a) maintains scalability and (b) always produces feasible first-stage decisions if they exist. We compare our method to several recently proposed methods and find that it reaches high accuracies faster and solves significantly larger problems. △ Less

Submitted 8 July, 2018; originally announced July 2018.

arXiv:1711.09974 [pdf, other]

Bootstrap Robust Prescriptive Analytics

Authors: Dimitris Bertsimas, Bart Van Parys

Abstract: We address the problem of prescribing an optimal decision in a framework where the cost function depends on uncertain problem parameters that need to be learned from data. Earlier work proposed prescriptive formulations based on supervised machine learning methods. These prescriptive methods can factor in contextual information on a potentially large number of covariates to take context specific a… ▽ More We address the problem of prescribing an optimal decision in a framework where the cost function depends on uncertain problem parameters that need to be learned from data. Earlier work proposed prescriptive formulations based on supervised machine learning methods. These prescriptive methods can factor in contextual information on a potentially large number of covariates to take context specific actions which are superior to any static decision. When working with noisy or corrupt data, however, such nominal prescriptive methods can be prone to adverse overfitting phenomena and fail to generalize on out-of-sample data. In this paper we combine ideas from robust optimization and the statistical bootstrap to propose novel prescriptive methods which safeguard against overfitting. We show indeed that a particular entropic robust counterpart to such nominal formulations guarantees good performance on synthetic bootstrap data. As bootstrap data is often a sensible proxy to actual out-of-sample data, our robust counterpart can be interpreted to directly encourage good out-of-sample performance. The associated robust prescriptive methods furthermore reduce to convenient tractable convex optimization problems in the context of local learning methods such as nearest neighbors and Nadaraya-Watson learning. We illustrate our data-driven decision-making framework and our novel robustness notion on a small newsvendor problem. △ Less

Submitted 8 June, 2021; v1 submitted 27 November, 2017; originally announced November 2017.

arXiv:1710.01352 [pdf, other]

doi 10.1007/s10994-021-06085-5

Sparse Classification: a scalable discrete optimization perspective

Authors: Dimitris Bertsimas, Jean Pauphilet, Bart Van Parys

Abstract: We formulate the sparse classification problem of $n$ samples with $p$ features as a binary convex optimization problem and propose a cutting-plane algorithm to solve it exactly. For sparse logistic regression and sparse SVM, our algorithm finds optimal solutions for $n$ and $p$ in the $10,000$s within minutes. On synthetic data our algorithm achieves perfect support recovery in the large sample r… ▽ More We formulate the sparse classification problem of $n$ samples with $p$ features as a binary convex optimization problem and propose a cutting-plane algorithm to solve it exactly. For sparse logistic regression and sparse SVM, our algorithm finds optimal solutions for $n$ and $p$ in the $10,000$s within minutes. On synthetic data our algorithm achieves perfect support recovery in the large sample regime. Namely, there exists a $n_0$ such that the algorithm takes a long time to find the optimal solution and does not recover the correct support for $n<n_0$, while for $n\geqslant n_0$, the algorithm quickly detects all the true features, and does not return any false features. In contrast, while Lasso accurately detects all the true features, it persistently returns incorrect features, even as the number of observations increases. Consequently, on numerous real-world experiments, our outer-approximation algorithms returns sparser classifiers while achieving similar predictive accuracy as Lasso. To support our observations, we analyze conditions on the sample size needed to ensure full support recovery in classification. Under some assumptions on the data generating process, we prove that information-theoretic limitations impose $n_0 < C \left(2 + σ^2\right) k \log(p-k)$, for some constant $C>0$. △ Less

Submitted 30 June, 2020; v1 submitted 3 October, 2017; originally announced October 2017.

Journal ref: Machine Learning, 2021

arXiv:1709.10030 [pdf, other]

Sparse Hierarchical Regression with Polynomials

Authors: Dimitris Bertsimas, Bart Van Parys

Abstract: We present a novel method for exact hierarchical sparse polynomial regression. Our regressor is that degree $r$ polynomial which depends on at most $k$ inputs, counting at most $\ell$ monomial terms, which minimizes the sum of the squares of its prediction errors. The previous hierarchical sparse specification aligns well with modern big data settings where many inputs are not relevant for predict… ▽ More We present a novel method for exact hierarchical sparse polynomial regression. Our regressor is that degree $r$ polynomial which depends on at most $k$ inputs, counting at most $\ell$ monomial terms, which minimizes the sum of the squares of its prediction errors. The previous hierarchical sparse specification aligns well with modern big data settings where many inputs are not relevant for prediction purposes and the functional complexity of the regressor needs to be controlled as to avoid overfitting. We present a two-step approach to this hierarchical sparse regression problem. First, we discard irrelevant inputs using an extremely fast input ranking heuristic. Secondly, we take advantage of modern cutting plane methods for integer optimization to solve our resulting reduced hierarchical $(k, \ell)$-sparse problem exactly. The ability of our method to identify all $k$ relevant inputs and all $\ell$ monomial terms is shown empirically to experience a phase transition. Crucially, the same transition also presents itself in our ability to reject all irrelevant features and monomials as well. In the regime where our method is statistically powerful, its computational complexity is interestingly on par with Lasso based heuristics. The presented work fills a void in terms of a lack of powerful disciplined nonlinear sparse regression methods in high-dimensional settings. Our method is shown empirically to scale to regression problems with $n\approx 10,000$ observations for input dimension $p\approx 1,000$. △ Less

Submitted 28 September, 2017; originally announced September 2017.

arXiv:1709.10029 [pdf, other]

Sparse High-Dimensional Regression: Exact Scalable Algorithms and Phase Transitions

Authors: Dimitris Bertsimas, Bart Van Parys

Abstract: We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes n and number of regressors p in the 100,000s, that is two orders of magnitude better than the current state of the art, in seconds… ▽ More We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes n and number of regressors p in the 100,000s, that is two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transition phenomena. Contrary to traditional complexity theory which suggests that the difficulty of a problem increases with problem size, the sparse regression problem has the property that as the number of samples $n$ increases the problem becomes easier in that the solution recovers 100% of the true signal, and our approach solves the problem extremely fast (in fact faster than Lasso), while for small number of samples n, our approach takes a larger amount of time to solve the problem, but importantly the optimal solution provides a statistically more relevant regressor. We argue that our exact sparse regression approach presents a superior alternative over heuristic methods available at present. △ Less

Submitted 28 September, 2017; originally announced September 2017.

arXiv:1708.04527 [pdf, other]

The Trimmed Lasso: Sparsity and Robustness

Authors: Dimitris Bertsimas, Martin S. Copenhaver, Rahul Mazumder

Abstract: Nonconvex penalty methods for sparse modeling in linear regression have been a topic of fervent interest in recent years. Herein, we study a family of nonconvex penalty functions that we call the trimmed Lasso and that offers exact control over the desired level of sparsity of estimators. We analyze its structural properties and in doing so show the following: 1) Drawing parallels between robust… ▽ More Nonconvex penalty methods for sparse modeling in linear regression have been a topic of fervent interest in recent years. Herein, we study a family of nonconvex penalty functions that we call the trimmed Lasso and that offers exact control over the desired level of sparsity of estimators. We analyze its structural properties and in doing so show the following: 1) Drawing parallels between robust statistics and robust optimization, we show that the trimmed-Lasso-regularized least squares problem can be viewed as a generalized form of total least squares under a specific model of uncertainty. In contrast, this same model of uncertainty, viewed instead through a robust optimization lens, leads to the convex SLOPE (or OWL) penalty. 2) Further, in relating the trimmed Lasso to commonly used sparsity-inducing penalty functions, we provide a succinct characterization of the connection between trimmed-Lasso- like approaches and penalty functions that are coordinate-wise separable, showing that the trimmed penalties subsume existing coordinate-wise separable penalties, with strict containment in general. 3) Finally, we describe a variety of exact and heuristic algorithms, both existing and new, for trimmed Lasso regularized estimation problems. We include a comparison between the different approaches and an accompanying implementation of the algorithms. △ Less

Submitted 15 August, 2017; originally announced August 2017.

Comments: 32 pages (excluding appendix); 4 figures

arXiv:1605.02347 [pdf, other]

The Power and Limits of Predictive Approaches to Observational-Data-Driven Optimization

Authors: Dimitris Bertsimas, Nathan Kallus

Abstract: While data-driven decision-making is transforming modern operations, most large-scale data is of an observational nature, such as transactional records. These data pose unique challenges in a variety of operational problems posed as stochastic optimization problems, including pricing and inventory management, where one must evaluate the effect of a decision, such as price or order quantity, on an… ▽ More While data-driven decision-making is transforming modern operations, most large-scale data is of an observational nature, such as transactional records. These data pose unique challenges in a variety of operational problems posed as stochastic optimization problems, including pricing and inventory management, where one must evaluate the effect of a decision, such as price or order quantity, on an uncertain cost/reward variable, such as demand, based on historical data where decision and outcome may be confounded. Often, the data lacks the features necessary to enable sound assessment of causal effects and/or the strong assumptions necessary may be dubious. Nonetheless, common practice is to assign a decision an objective value equal to the best prediction of cost/reward given the observation of the decision in the data. While in general settings this identification is spurious, for optimization purposes it is only the objective value of the final decision that matters, rather than the validity of any model used to arrive at it. In this paper, we formalize this statement in the case of observational-data-driven optimization and study both the power and limits of predictive approaches to observational-data-driven optimization with a particular focus on pricing. We provide rigorous bounds on optimality gaps of such approaches even when optimal decisions cannot be identified from data. To study potential limits of predictive approaches in real datasets, we develop a new hypothesis test for causal-effect objective optimality. Applying it to interest-rate-setting data, we empirically demonstrate that predictive approaches can be powerful in practice but with some critical limitations. △ Less

Submitted 20 May, 2017; v1 submitted 8 May, 2016; originally announced May 2016.

arXiv:1604.06837 [pdf, other]

Certifiably Optimal Low Rank Factor Analysis

Authors: Dimitris Bertsimas, Martin S. Copenhaver, Rahul Mazumder

Abstract: Factor Analysis (FA) is a technique of fundamental importance that is widely used in classical and modern multivariate statistics, psychometrics and econometrics. In this paper, we revisit the classical rank-constrained FA problem, which seeks to approximate an observed covariance matrix ($\boldsymbolΣ$), by the sum of a Positive Semidefinite (PSD) low-rank component ($\boldsymbolΘ$) and a diagona… ▽ More Factor Analysis (FA) is a technique of fundamental importance that is widely used in classical and modern multivariate statistics, psychometrics and econometrics. In this paper, we revisit the classical rank-constrained FA problem, which seeks to approximate an observed covariance matrix ($\boldsymbolΣ$), by the sum of a Positive Semidefinite (PSD) low-rank component ($\boldsymbolΘ$) and a diagonal matrix ($\boldsymbolΦ$) (with nonnegative entries) subject to $\boldsymbolΣ- \boldsymbolΦ$ being PSD. We propose a flexible family of rank-constrained, nonlinear Semidefinite Optimization based formulations for this task. We introduce a reformulation of the problem as a smooth optimization problem with convex compact constraints and propose a unified algorithmic framework, utilizing state of the art techniques in nonlinear optimization to obtain high-quality feasible solutions for our proposed formulation. At the same time, by using a variety of techniques from discrete and global optimization, we show that these solutions are certifiably optimal in many cases, even for problems with thousands of variables. Our techniques are general and make no assumption on the underlying problem data. The estimator proposed herein, aids statistical interpretability, provides computational scalability and significantly improved accuracy when compared to current, publicly available popular methods for rank-constrained FA. We demonstrate the effectiveness of our proposal on an array of synthetic and real-life datasets. To our knowledge, this is the first paper that demonstrates how a previously intractable rank-constrained optimization problem can be solved to provable optimality by coupling developments in convex analysis and in discrete optimization. △ Less

Submitted 22 April, 2016; originally announced April 2016.

Journal ref: JMLR 18(29) (2017)

arXiv:1507.03133 [pdf, other]

Best Subset Selection via a Modern Optimization Lens

Authors: Dimitris Bertsimas, Angela King, Rahul Mazumder

Abstract: In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing $k$ out of $p$ features in linear regression given $n$ observations.… ▽ More In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing $k$ out of $p$ features in linear regression given $n$ observations. We develop a discrete extension of modern first order continuous optimization methods to find high quality feasible solutions that we use as warm starts to a MIO solver that finds provably optimal solutions. The resulting algorithm (a) provides a solution with a guarantee on its suboptimality even if we terminate the algorithm early, (b) can accommodate side constraints on the coefficients of the linear regression and (c) extends to finding best subset solutions for the least absolute deviation loss function. Using a wide variety of synthetic and real datasets, we demonstrate that our approach solves problems with $n$ in the 1000s and $p$ in the 100s in minutes to provable optimality, and finds near optimal solutions for $n$ in the 100s and $p$ in the 1000s in minutes. We also establish via numerical experiments that the MIO approach performs better than {\texttt {Lasso}} and other popularly used sparse learning procedures, in terms of achieving sparse solutions with good predictive power. △ Less

Submitted 11 July, 2015; originally announced July 2015.

Comments: This is a revised version (May, 2015) of the first submission in June 2014

arXiv:1411.6160 [pdf, ps, other]

Characterization of the equivalence of robustification and regularization in linear and matrix regression

Authors: Dimitris Bertsimas, Martin S. Copenhaver

Abstract: The notion of develo** statistical methods in machine learning which are robust to adversarial perturbations in the underlying data has been the subject of increasing interest in recent years. A common feature of this work is that the adversarial robustification often corresponds exactly to regularization methods which appear as a loss function plus a penalty. In this paper we deepen and extend… ▽ More The notion of develo** statistical methods in machine learning which are robust to adversarial perturbations in the underlying data has been the subject of increasing interest in recent years. A common feature of this work is that the adversarial robustification often corresponds exactly to regularization methods which appear as a loss function plus a penalty. In this paper we deepen and extend the understanding of the connection between robustification and regularization (as achieved by penalization) in regression problems. Specifically, (a) in the context of linear regression, we characterize precisely under which conditions on the model of uncertainty used and on the loss function penalties robustification and regularization are equivalent, and (b) we extend the characterization of robustification and regularization to matrix regression problems (matrix completion and Principal Component Analysis). △ Less

Submitted 25 February, 2017; v1 submitted 22 November, 2014; originally announced November 2014.

MSC Class: 62J; 90C25; 49M29; 90C11; 15A83

Showing 1–50 of 59 results for author: Bertsimas, D