-
An MILP-Based Solution Scheme for Factored and Robust Factored Markov Decision Processes
Authors:
Huikang Liu,
Wolfram Wiesemann,
Man-Chung Yue
Abstract:
Factored Markov decision processes (MDPs) are a prominent paradigm within the artificial intelligence community for modeling and solving large-scale MDPs whose rewards and dynamics decompose into smaller, loosely interacting components. Through the use of dynamic Bayesian networks and context-specific independence, factored MDPs can achieve an exponential reduction in the state space of an MDP and…
▽ More
Factored Markov decision processes (MDPs) are a prominent paradigm within the artificial intelligence community for modeling and solving large-scale MDPs whose rewards and dynamics decompose into smaller, loosely interacting components. Through the use of dynamic Bayesian networks and context-specific independence, factored MDPs can achieve an exponential reduction in the state space of an MDP and thus scale to problem sizes that are beyond the reach of classical MDP algorithms. However, factored MDPs are typically solved using custom-designed algorithms that can require meticulous implementations and considerable fine-tuning. In this paper, we propose a mathematical programming approach to solving factored MDPs. In contrast to existing solution schemes, our approach leverages off-the-shelf solvers, which allows for a streamlined implementation and maintenance; it effectively capitalizes on the factored structure present in both state and action spaces; and it readily extends to the largely unexplored class of robust factored MDPs, whose transition kernels are only known to reside in a pre-specified ambiguity set. Our numerical experiments demonstrate the potential of our approach.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
It's All in the Mix: Wasserstein Machine Learning with Mixed Features
Authors:
Reza Belbasi,
Aras Selvi,
Wolfram Wiesemann
Abstract:
Problem definition: The recent advent of data-driven and end-to-end decision-making across different areas of operations management has led to an ever closer integration of prediction models from machine learning and optimization models from operations research. A key challenge in this context is the presence of estimation errors in the prediction models, which tend to be amplified by the subseque…
▽ More
Problem definition: The recent advent of data-driven and end-to-end decision-making across different areas of operations management has led to an ever closer integration of prediction models from machine learning and optimization models from operations research. A key challenge in this context is the presence of estimation errors in the prediction models, which tend to be amplified by the subsequent optimization model -- a phenomenon that is often referred to as the Optimizer's Curse or the Error-Maximization Effect of Optimization.
Methodology/results: A contemporary approach to combat such estimation errors is offered by distributionally robust problem formulations that consider all data-generating distributions close to the empirical distribution derived from historical samples, where `closeness' is determined by the Wasserstein distance. While those techniques show significant promise in problems where all input features are continuous, they scale exponentially when binary and/or categorical features are present. This paper demonstrates that such mixed-feature problems can indeed be solved in polynomial time. We present a practically efficient algorithm to solve mixed-feature problems, and we compare our method against alternative techniques both theoretically and empirically on standard benchmark instances.
Managerial implications: Data-driven operations management problems often involve prediction models with discrete features. We develop and analyze a methodology that faithfully accounts for the presence of discrete features, and we demonstrate that our approach can significantly outperform existing methods that are agnostic to the presence of discrete features, both theoretically and across standard benchmark instances.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Differential Privacy via Distributionally Robust Optimization
Authors:
Aras Selvi,
Huikang Liu,
Wolfram Wiesemann
Abstract:
In recent years, differential privacy has emerged as the de facto standard for sharing statistics of datasets while limiting the disclosure of private information about the involved individuals. This is achieved by randomly perturbing the statistics to be published, which in turn leads to a privacy-accuracy trade-off: larger perturbations provide stronger privacy guarantees, but they result in les…
▽ More
In recent years, differential privacy has emerged as the de facto standard for sharing statistics of datasets while limiting the disclosure of private information about the involved individuals. This is achieved by randomly perturbing the statistics to be published, which in turn leads to a privacy-accuracy trade-off: larger perturbations provide stronger privacy guarantees, but they result in less accurate statistics that offer lower utility to the recipients. Of particular interest are therefore optimal mechanisms that provide the highest accuracy for a pre-selected level of privacy. To date, work in this area has focused on specifying families of perturbations a priori and subsequently proving their asymptotic and/or best-in-class optimality. In this paper, we develop a class of mechanisms that enjoy non-asymptotic and unconditional optimality guarantees. To this end, we formulate the mechanism design problem as an infinite-dimensional distributionally robust optimization problem. We show that the problem affords a strong dual, and we exploit this duality to develop converging hierarchies of finite-dimensional upper and lower bounding problems. Our upper (primal) bounds correspond to implementable perturbations whose suboptimality can be bounded by our lower (dual) bounds. Both bounding problems can be solved within seconds via cutting plane techniques that exploit the inherent problem structure. Our numerical experiments demonstrate that our perturbations can outperform the previously best results from the literature on artificial as well as standard benchmark problems.
△ Less
Submitted 23 May, 2024; v1 submitted 25 April, 2023;
originally announced April 2023.
-
On Approximations of Data-Driven Chance Constrained Programs over Wasserstein Balls
Authors:
Zhi Chen,
Daniel Kuhn,
Wolfram Wiesemann
Abstract:
Distributionally robust chance constrained programs minimize a deterministic cost function subject to the satisfaction of one or more safety conditions with high probability, given that the probability distribution of the uncertain problem parameters affecting the safety condition(s) is only known to belong to some ambiguity set. We study three popular approximation schemes for distributionally ro…
▽ More
Distributionally robust chance constrained programs minimize a deterministic cost function subject to the satisfaction of one or more safety conditions with high probability, given that the probability distribution of the uncertain problem parameters affecting the safety condition(s) is only known to belong to some ambiguity set. We study three popular approximation schemes for distributionally robust chance constrained programs over Wasserstein balls, where the ambiguity set contains all probability distributions within a certain Wasserstein distance to a reference distribution. The first approximation replaces the chance constraint with a bound on the conditional value-at-risk, the second approximation decouples different safety conditions via Bonferroni's inequality, and the third approximation restricts the expected violation of the safety condition(s) so that the chance constraint is satisfied. We show that the conditional value-at-risk approximation can be characterized as a tight convex approximation, which complements earlier findings on classical (non-robust) chance constraints, and we offer a novel interpretation in terms of transportation savings. We also show that the three approximations can perform arbitrarily poorly in data-driven settings, and that they are generally incomparable with each other.
△ Less
Submitted 20 November, 2022; v1 submitted 1 June, 2022;
originally announced June 2022.
-
Robust Phi-Divergence MDPs
Authors:
Chin Pang Ho,
Marek Petrik,
Wolfram Wiesemann
Abstract:
In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty. In contrast to classical MDPs, which only account for stochasticity by modeling the dynamics through a stochastic process with a known transition kernel, robust MDPs additionally account for ambiguity by optimizing in view of the most advers…
▽ More
In recent years, robust Markov decision processes (MDPs) have emerged as a prominent modeling framework for dynamic decision problems affected by uncertainty. In contrast to classical MDPs, which only account for stochasticity by modeling the dynamics through a stochastic process with a known transition kernel, robust MDPs additionally account for ambiguity by optimizing in view of the most adverse transition kernel from a prescribed ambiguity set. In this paper, we develop a novel solution framework for robust MDPs with s-rectangular ambiguity sets that decomposes the problem into a sequence of robust Bellman updates and simplex projections. Exploiting the rich structure present in the simplex projections corresponding to phi-divergence ambiguity sets, we show that the associated s-rectangular robust MDPs can be solved substantially faster than with state-of-the-art commercial solvers as well as a recent first-order solution scheme, thus rendering them attractive alternatives to classical MDPs in practical applications.
△ Less
Submitted 12 January, 2023; v1 submitted 27 May, 2022;
originally announced May 2022.
-
Wasserstein Logistic Regression with Mixed Features
Authors:
Aras Selvi,
Mohammad Reza Belbasi,
Martin B Haugh,
Wolfram Wiesemann
Abstract:
Recent work has leveraged the popular distributionally robust optimization paradigm to combat overfitting in classical logistic regression. While the resulting classification scheme displays a promising performance in numerical experiments, it is inherently limited to numerical features. In this paper, we show that distributionally robust logistic regression with mixed (i.e., numerical and categor…
▽ More
Recent work has leveraged the popular distributionally robust optimization paradigm to combat overfitting in classical logistic regression. While the resulting classification scheme displays a promising performance in numerical experiments, it is inherently limited to numerical features. In this paper, we show that distributionally robust logistic regression with mixed (i.e., numerical and categorical) features, despite amounting to an optimization problem of exponential size, admits a polynomial-time solution scheme. We subsequently develop a practically efficient column-and-constraint approach that solves the problem as a sequence of polynomial-time solvable exponential conic programs. Our model retains many of the desirable theoretical features of previous works, but -- in contrast to the literature -- it does not admit an equivalent representation as a regularized logistic regression, that is, it represents a genuinely novel variant of logistic regression. We show that our method outperforms both the unregularized and the regularized logistic regression on categorical as well as mixed-feature benchmark instances.
△ Less
Submitted 14 January, 2023; v1 submitted 26 May, 2022;
originally announced May 2022.
-
A Unified Theory of Robust and Distributionally Robust Optimization via the Primal-Worst-Equals-Dual-Best Principle
Authors:
Jianzhe Zhen,
Daniel Kuhn,
Wolfram Wiesemann
Abstract:
Robust and distributionally robust optimization are modeling paradigms for decision-making under uncertainty where the uncertain parameters are only known to reside in an uncertainty set or are governed by any probability distribution from within an ambiguity set, respectively, and a decision is sought that minimizes a cost function under the most adverse outcome of the uncertainty. In this paper,…
▽ More
Robust and distributionally robust optimization are modeling paradigms for decision-making under uncertainty where the uncertain parameters are only known to reside in an uncertainty set or are governed by any probability distribution from within an ambiguity set, respectively, and a decision is sought that minimizes a cost function under the most adverse outcome of the uncertainty. In this paper, we develop a rigorous and general theory of robust and distributionally robust nonlinear optimization using the language of convex analysis. Our framework is based on a generalized `primal-worst-equals-dual-best' principle that establishes strong duality between a semi-infinite primal worst and a non-convex dual best formulation, both of which admit finite convex reformulations. This principle offers an alternative formulation for robust optimization problems that obviates the need to mobilize the machinery of abstract semi-infinite duality theory to prove strong duality in distributionally robust optimization. We illustrate the modeling power of our approach through convex reformulations for distributionally robust optimization problems whose ambiguity sets are defined through general optimal transport distances, which generalize earlier results for Wasserstein ambiguity sets.
△ Less
Submitted 19 July, 2023; v1 submitted 3 May, 2021;
originally announced May 2021.
-
A Planner-Trader Decomposition for Multi-Market Hydro Scheduling
Authors:
Kilian Schindler,
Napat Rujeerapaiboon,
Daniel Kuhn,
Wolfram Wiesemann
Abstract:
Peak/off-peak spreads on European electricity forward and spot markets are eroding due to the ongoing nuclear phaseout in Germany and the steady growth in photovoltaic capacity. The reduced profitability of peak/off-peak arbitrage forces hydropower producers to recover part of their original profitability on the reserve markets. We propose a bi-layer stochastic programming framework for the optima…
▽ More
Peak/off-peak spreads on European electricity forward and spot markets are eroding due to the ongoing nuclear phaseout in Germany and the steady growth in photovoltaic capacity. The reduced profitability of peak/off-peak arbitrage forces hydropower producers to recover part of their original profitability on the reserve markets. We propose a bi-layer stochastic programming framework for the optimal operation of a fleet of interconnected hydropower plants that sells energy on both the spot and the reserve markets. The outer layer (the planner's problem) optimizes end-of-day reservoir filling levels over one year, whereas the inner layer (the trader's problem) selects optimal hourly market bids within each day. Using an information restriction whereby the planner prescribes the end-of-day reservoir targets one day in advance, we prove that the trader's problem simplifies from an infinite-dimensional stochastic program with 25 stages to a finite two-stage stochastic program with only two scenarios. Substituting this reformulation back into the outer layer and approximating the reservoir targets by affine decision rules allows us to simplify the planner's problem from an infinite-dimensional stochastic program with 365 stages to a two-stage stochastic program that can conveniently be solved via the sample average approximation. Numerical experiments based on a cascade in the Salzburg region of Austria demonstrate the effectiveness of the suggested framework.
△ Less
Submitted 2 September, 2022; v1 submitted 3 March, 2021;
originally announced March 2021.
-
Partial Policy Iteration for L1-Robust Markov Decision Processes
Authors:
Chin Pang Ho,
Marek Petrik,
Wolfram Wiesemann
Abstract:
Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which severely limits their scalability. This paper describ…
▽ More
Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which severely limits their scalability. This paper describes new efficient algorithms for solving the common class of robust MDPs with s- and sa-rectangular ambiguity sets defined by weighted $L_1$ norms. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. We also propose fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the linear complexity the non-robust Bellman operator. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach which uses linear programming solvers combined with a robust value iteration.
△ Less
Submitted 16 June, 2020;
originally announced June 2020.
-
On Linear Optimization over Wasserstein Balls
Authors:
Man-Chung Yue,
Daniel Kuhn,
Wolfram Wiesemann
Abstract:
Wasserstein balls, which contain all probability measures within a pre-specified Wasserstein distance to a reference measure, have recently enjoyed wide popularity in the distributionally robust optimization and machine learning communities to formulate and solve data-driven optimization problems with rigorous statistical guarantees. In this technical note we prove that the Wasserstein ball is wea…
▽ More
Wasserstein balls, which contain all probability measures within a pre-specified Wasserstein distance to a reference measure, have recently enjoyed wide popularity in the distributionally robust optimization and machine learning communities to formulate and solve data-driven optimization problems with rigorous statistical guarantees. In this technical note we prove that the Wasserstein ball is weakly compact under mild conditions, and we offer necessary and sufficient conditions for the existence of optimal solutions. We also characterize the sparsity of solutions if the Wasserstein ball is centred at a discrete reference measure. In comparison with the existing literature, which has proved similar results under different conditions, our proofs are self-contained and shorter, yet mathematically rigorous, and our necessary and sufficient conditions for the existence of optimal solutions are easily verifiable in practice.
△ Less
Submitted 6 June, 2021; v1 submitted 15 April, 2020;
originally announced April 2020.
-
Optimistic Distributionally Robust Optimization for Nonparametric Likelihood Approximation
Authors:
Viet Anh Nguyen,
Soroosh Shafieezadeh-Abadeh,
Man-Chung Yue,
Daniel Kuhn,
Wolfram Wiesemann
Abstract:
The likelihood function is a fundamental component in Bayesian statistics. However, evaluating the likelihood of an observation is computationally intractable in many applications. In this paper, we propose a non-parametric approximation of the likelihood that identifies a probability measure which lies in the neighborhood of the nominal measure and that maximizes the probability of observing the…
▽ More
The likelihood function is a fundamental component in Bayesian statistics. However, evaluating the likelihood of an observation is computationally intractable in many applications. In this paper, we propose a non-parametric approximation of the likelihood that identifies a probability measure which lies in the neighborhood of the nominal measure and that maximizes the probability of observing the given sample point. We show that when the neighborhood is constructed by the Kullback-Leibler divergence, by moment conditions or by the Wasserstein distance, then our \textit{optimistic likelihood} can be determined through the solution of a convex optimization problem, and it admits an analytical expression in particular cases. We also show that the posterior inference problem with our optimistic likelihood approximation enjoys strong theoretical performance guarantees, and it performs competitively in a probabilistic classification task.
△ Less
Submitted 23 October, 2019;
originally announced October 2019.
-
Calculating Optimistic Likelihoods Using (Geodesically) Convex Optimization
Authors:
Viet Anh Nguyen,
Soroosh Shafieezadeh-Abadeh,
Man-Chung Yue,
Daniel Kuhn,
Wolfram Wiesemann
Abstract:
A fundamental problem arising in many areas of machine learning is the evaluation of the likelihood of a given observation under different nominal distributions. Frequently, these nominal distributions are themselves estimated from data, which makes them susceptible to estimation errors. We thus propose to replace each nominal distribution with an ambiguity set containing all distributions in its…
▽ More
A fundamental problem arising in many areas of machine learning is the evaluation of the likelihood of a given observation under different nominal distributions. Frequently, these nominal distributions are themselves estimated from data, which makes them susceptible to estimation errors. We thus propose to replace each nominal distribution with an ambiguity set containing all distributions in its vicinity and to evaluate an \emph{optimistic likelihood}, that is, the maximum of the likelihood over all distributions in the ambiguity set. When the proximity of distributions is quantified by the Fisher-Rao distance or the Kullback-Leibler divergence, the emerging optimistic likelihoods can be computed efficiently using either geodesic or standard convex optimization techniques. We showcase the advantages of working with optimistic likelihoods on a classification problem using synthetic as well as empirical data.
△ Less
Submitted 17 October, 2019;
originally announced October 2019.
-
Data-Driven Chance Constrained Programs over Wasserstein Balls
Authors:
Zhi Chen,
Daniel Kuhn,
Wolfram Wiesemann
Abstract:
We provide an exact deterministic reformulation for data-driven chance constrained programs over Wasserstein balls. For individual chance constraints as well as joint chance constraints with right-hand side uncertainty, our reformulation amounts to a mixed-integer conic program. In the special case of a Wasserstein ball with the $1$-norm or the $\infty$-norm, the cone is the nonnegative orthant, a…
▽ More
We provide an exact deterministic reformulation for data-driven chance constrained programs over Wasserstein balls. For individual chance constraints as well as joint chance constraints with right-hand side uncertainty, our reformulation amounts to a mixed-integer conic program. In the special case of a Wasserstein ball with the $1$-norm or the $\infty$-norm, the cone is the nonnegative orthant, and the chance constrained program can be reformulated as a mixed-integer linear program. Our reformulation compares favourably to several state-of-the-art data-driven optimization schemes in our numerical experiments.
△ Less
Submitted 31 May, 2022; v1 submitted 1 September, 2018;
originally announced September 2018.
-
K-Adaptability in Two-Stage Mixed-Integer Robust Optimization
Authors:
Anirudh Subramanyam,
Chrysanthos E. Gounaris,
Wolfram Wiesemann
Abstract:
We study two-stage robust optimization problems with mixed discrete-continuous decisions in both stages. Despite their broad range of applications, these problems pose two fundamental challenges: (i) they constitute infinite-dimensional problems that require a finite-dimensional approximation, and (ii) the presence of discrete recourse decisions typically prohibits duality-based solution schemes.…
▽ More
We study two-stage robust optimization problems with mixed discrete-continuous decisions in both stages. Despite their broad range of applications, these problems pose two fundamental challenges: (i) they constitute infinite-dimensional problems that require a finite-dimensional approximation, and (ii) the presence of discrete recourse decisions typically prohibits duality-based solution schemes. We address the first challenge by studying a $K$-adaptability formulation that selects $K$ candidate recourse policies before observing the realization of the uncertain parameters and that implements the best of these policies after the realization is known. We address the second challenge through a branch-and-bound scheme that enjoys asymptotic convergence in general and finite convergence under specific conditions. We illustrate the performance of our algorithm in numerical experiments involving benchmark data from several application domains.
△ Less
Submitted 27 July, 2018; v1 submitted 21 June, 2017;
originally announced June 2017.
-
Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization
Authors:
Napat Rujeerapaiboon,
Kilian Schindler,
Daniel Kuhn,
Wolfram Wiesemann
Abstract:
Plain vanilla K-means clustering has proven to be successful in practice, yet it suffers from outlier sensitivity and may produce highly unbalanced clusters. To mitigate both shortcomings, we formulate a joint outlier detection and clustering problem, which assigns a prescribed number of datapoints to an auxiliary outlier cluster and performs cardinality-constrained K-means clustering on the resid…
▽ More
Plain vanilla K-means clustering has proven to be successful in practice, yet it suffers from outlier sensitivity and may produce highly unbalanced clusters. To mitigate both shortcomings, we formulate a joint outlier detection and clustering problem, which assigns a prescribed number of datapoints to an auxiliary outlier cluster and performs cardinality-constrained K-means clustering on the residual dataset, treating the cluster cardinalities as a given input. We cast this problem as a mixed-integer linear program (MILP) that admits tractable semidefinite and linear programming relaxations. We propose deterministic rounding schemes that transform the relaxed solutions to feasible solutions for the MILP. We also prove that these solutions are optimal in the MILP if a cluster separation condition holds.
△ Less
Submitted 10 January, 2019; v1 submitted 22 May, 2017;
originally announced May 2017.
-
Scenario Reduction Revisited: Fundamental Limits and Guarantees
Authors:
Napat Rujeerapaiboon,
Kilian Schindler,
Daniel Kuhn,
Wolfram Wiesemann
Abstract:
The goal of scenario reduction is to approximate a given discrete distribution with another discrete distribution that has fewer atoms. We distinguish continuous scenario reduction, where the new atoms may be chosen freely, and discrete scenario reduction, where the new atoms must be chosen from among the existing ones. Using the Wasserstein distance as measure of proximity between distributions,…
▽ More
The goal of scenario reduction is to approximate a given discrete distribution with another discrete distribution that has fewer atoms. We distinguish continuous scenario reduction, where the new atoms may be chosen freely, and discrete scenario reduction, where the new atoms must be chosen from among the existing ones. Using the Wasserstein distance as measure of proximity between distributions, we identify those $n$-point distributions on the unit ball that are least susceptible to scenario reduction, i.e., that have maximum Wasserstein distance to their closest $m$-point distributions for some prescribed $m<n$. We also provide sharp bounds on the added benefit of continuous over discrete scenario reduction. Finally, to our best knowledge, we propose the first polynomial-time constant-factor approximations for both discrete and continuous scenario reduction as well as the first exact exponential-time algorithms for continuous scenario reduction.
△ Less
Submitted 15 January, 2017;
originally announced January 2017.
-
Chebyshev Inequalities for Products of Random Variables
Authors:
Napat Rujeerapaiboon,
Daniel Kuhn,
Wolfram Wiesemann
Abstract:
We derive sharp probability bounds on the tails of a product of symmetric non-negative random variables using only information about their first two moments. If the covariance matrix of the random variables is known exactly, these bounds can be computed numerically using semidefinite programming. If only an upper bound on the covariance matrix is available, the probability bounds on the right tail…
▽ More
We derive sharp probability bounds on the tails of a product of symmetric non-negative random variables using only information about their first two moments. If the covariance matrix of the random variables is known exactly, these bounds can be computed numerically using semidefinite programming. If only an upper bound on the covariance matrix is available, the probability bounds on the right tails can be evaluated analytically. The bounds under precise and imprecise covariance information coincide for all left tails as well as for all right tails corresponding to quantiles that are either sufficiently small or sufficiently large. We also prove that all left probability bounds reduce to the trivial bound 1 if the number of random variables in the product exceeds an explicit threshold. Thus, in the worst case, the weak-sense geometric random walk defined through the running product of the random variables is absorbed at 0 with certainty as soon as time exceeds the given threshold. The techniques devised for constructing Chebyshev bounds for products can also be used to derive Chebyshev bounds for sums, maxima and minima of non-negative random variables.
△ Less
Submitted 18 May, 2016;
originally announced May 2016.