-
End-to-end Conditional Robust Optimization
Authors:
Abhilash Chenreddy,
Erick Delage
Abstract:
The field of Contextual Optimization (CO) integrates machine learning and optimization to solve decision making problems under uncertainty. Recently, a risk sensitive variant of CO, known as Conditional Robust Optimization (CRO), combines uncertainty quantification with robust optimization in order to promote safety and reliability in high stake applications. Exploiting modern differentiable optim…
▽ More
The field of Contextual Optimization (CO) integrates machine learning and optimization to solve decision making problems under uncertainty. Recently, a risk sensitive variant of CO, known as Conditional Robust Optimization (CRO), combines uncertainty quantification with robust optimization in order to promote safety and reliability in high stake applications. Exploiting modern differentiable optimization methods, we propose a novel end-to-end approach to train a CRO model in a way that accounts for both the empirical risk of the prescribed decisions and the quality of conditional coverage of the contextual uncertainty set that supports them. While guarantees of success for the latter objective are impossible to obtain from the point of view of conformal prediction theory, high quality conditional coverage is achieved empirically by ingeniously employing a logistic regression differentiable layer within the calculation of coverage quality in our training loss. We show that the proposed training algorithms produce decisions that outperform the traditional estimate then optimize approaches.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Conformal Inverse Optimization
Authors:
Bo Lin,
Erick Delage,
Timothy C. Y. Chan
Abstract:
Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackl…
▽ More
Inverse optimization has been increasingly used to estimate unknown parameters in an optimization model based on decision data. We show that such a point estimation is insufficient in a prescriptive setting where the estimated parameters are used to prescribe new decisions. The prescribed decisions may be low-quality and misaligned with human intuition and thus are unlikely to be adopted. To tackle this challenge, we propose conformal inverse optimization, which seeks to learn an uncertainty set for the unknown parameters and then solve a robust optimization model to prescribe new decisions. Under mild assumptions, we show that our method enjoys provable guarantees on solution quality, as evaluated using both the ground-truth parameters and the decision maker's perception of the unknown parameters. Our method demonstrates strong empirical performance compared to classic inverse optimization.
△ Less
Submitted 15 May, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
A Survey of Contextual Optimization Methods for Decision Making under Uncertainty
Authors:
Utsav Sadana,
Abhilash Chenreddy,
Erick Delage,
Alexandre Forel,
Emma Fre**ger,
Thibaut Vidal
Abstract:
Recently there has been a surge of interest in operations research (OR) and the machine learning (ML) community in combining prediction algorithms and optimization techniques to solve decision-making problems in the face of uncertainty. This gave rise to the field of contextual optimization, under which data-driven procedures are developed to prescribe actions to the decision-maker that make the b…
▽ More
Recently there has been a surge of interest in operations research (OR) and the machine learning (ML) community in combining prediction algorithms and optimization techniques to solve decision-making problems in the face of uncertainty. This gave rise to the field of contextual optimization, under which data-driven procedures are developed to prescribe actions to the decision-maker that make the best use of the most recently updated information. A large variety of models and methods have been presented in both OR and ML literature under a variety of names, including data-driven optimization, prescriptive optimization, predictive stochastic programming, policy optimization, (smart) predict/estimate-then-optimize, decision-focused learning, (task-based) end-to-end learning/forecasting/optimization, etc. Focusing on single and two-stage stochastic programming problems, this review article identifies three main frameworks for learning policies from data and discusses their strengths and limitations. We present the existing models and methods under a uniform notation and terminology and classify them according to the three main frameworks identified. Our objective with this survey is to both strengthen the general understanding of this active field of research and stimulate further theoretical and algorithmic advancements in integrating ML and stochastic programming.
△ Less
Submitted 2 February, 2024; v1 submitted 17 June, 2023;
originally announced June 2023.
-
Robust Data-driven Prescriptiveness Optimization
Authors:
Mehran Poursoltani,
Erick Delage,
Angelos Georghiou
Abstract:
The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information to provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both t…
▽ More
The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information to provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.
△ Less
Submitted 3 June, 2024; v1 submitted 9 June, 2023;
originally announced June 2023.
-
On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes
Authors:
Jia Lin Hau,
Erick Delage,
Mohammad Ghavamzadeh,
Marek Petrik
Abstract:
Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the state space with discrete risk levels have recently gained popularity in the RL community. Prior work has shown that these decompositions are optimal…
▽ More
Optimizing static risk-averse objectives in Markov decision processes is difficult because they do not admit standard dynamic programming equations common in Reinforcement Learning (RL) algorithms. Dynamic programming decompositions that augment the state space with discrete risk levels have recently gained popularity in the RL community. Prior work has shown that these decompositions are optimal when the risk level is discretized sufficiently. However, we show that these popular decompositions for Conditional-Value-at-Risk (CVaR) and Entropic-Value-at-Risk (EVaR) are inherently suboptimal regardless of the discretization level. In particular, we show that a saddle point property assumed to hold in prior literature may be violated. However, a decomposition does hold for Value-at-Risk and our proof demonstrates how this risk measure differs from CVaR and EVaR. Our findings are significant because risk-averse algorithms are used in high-stake environments, making their correctness much more critical.
△ Less
Submitted 23 April, 2024; v1 submitted 24 April, 2023;
originally announced April 2023.
-
Risk-Aware Bid Optimization for Online Display Advertisement
Authors:
Rui Fan,
Erick Delage
Abstract:
This research focuses on the bid optimization problem in the real-time bidding setting for online display advertisements, where an advertiser, or the advertiser's agent, has access to the features of the website visitor and the type of ad slots, to decide the optimal bid prices given a predetermined total advertisement budget. We propose a risk-aware data-driven bid optimization model that maximiz…
▽ More
This research focuses on the bid optimization problem in the real-time bidding setting for online display advertisements, where an advertiser, or the advertiser's agent, has access to the features of the website visitor and the type of ad slots, to decide the optimal bid prices given a predetermined total advertisement budget. We propose a risk-aware data-driven bid optimization model that maximizes the expected profit for the advertiser by exploiting historical data to design upfront a bidding policy, map** the type of advertisement opportunity to a bid price, and accounting for the risk of violating the budget constraint during a given period of time. After employing a Lagrangian relaxation, we derive a parametrized closed-form expression for the optimal bidding strategy. Using a real-world dataset, we demonstrate that our risk-averse method can effectively control the risk of overspending the budget while achieving a competitive level of profit compared with the risk-neutral model and a state-of-the-art data-driven risk-aware bidding approach.
△ Less
Submitted 27 October, 2022;
originally announced October 2022.
-
A Double-oracle, Logic-based Benders decomposition approach to solve the K-adaptability problem
Authors:
Alireza Ghahtarani,
Ahmed Saif,
Alireza Ghasemi,
Erick Delage
Abstract:
We propose a novel approach to solve K-adaptability problems with convex objective and constraints and integer first-stage decisions. A logic-based Benders decomposition is applied to handle the first-stage decisions in a master problem, thus the sub-problem becomes a min-max-min robust combinatorial optimization problem that is solved via a double-oracle algorithm that iteratively generates adver…
▽ More
We propose a novel approach to solve K-adaptability problems with convex objective and constraints and integer first-stage decisions. A logic-based Benders decomposition is applied to handle the first-stage decisions in a master problem, thus the sub-problem becomes a min-max-min robust combinatorial optimization problem that is solved via a double-oracle algorithm that iteratively generates adverse scenarios and recourse decisions and assigns scenarios to K subsets of the decisions by solving p-center problems. Extensions of the proposed approach to handle parameter uncertainty in both the first-stage objective and the second-stage constraints are also provided. We show that the proposed algorithm converges to an optimal solution and terminates in finite number of iterations. Numerical results obtained from experiments on benchmark instances of the adaptive shortest path problem, the regular knapsack problem, and a generic K-adaptability problem demonstrate the performance advantage of the proposed approach when compared to state-of-the-art methods in the literature.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
WaveCorr: Correlation-savvy Deep Reinforcement Learning for Portfolio Management
Authors:
Saeed Marzban,
Erick Delage,
Jonathan Yumeng Li,
Jeremie Desgagne-Bouchard,
Carl Dussault
Abstract:
The problem of portfolio management represents an important and challenging class of dynamic decision making problems, where rebalancing decisions need to be made over time with the consideration of many factors such as investors preferences, trading environments, and market conditions. In this paper, we present a new portfolio policy network architecture for deep reinforcement learning (DRL)that…
▽ More
The problem of portfolio management represents an important and challenging class of dynamic decision making problems, where rebalancing decisions need to be made over time with the consideration of many factors such as investors preferences, trading environments, and market conditions. In this paper, we present a new portfolio policy network architecture for deep reinforcement learning (DRL)that can exploit more effectively cross-asset dependency information and achieve better performance than state-of-the-art architectures. In particular, we introduce a new property, referred to as \textit{asset permutation invariance}, for portfolio policy networks that exploit multi-asset time series data, and design the first portfolio policy network, named WaveCorr, that preserves this invariance property when treating asset correlation information. At the core of our design is an innovative permutation invariant correlation processing layer. An extensive set of experiments are conducted using data from both Canadian (TSX) and American stock markets (S&P 500), and WaveCorr consistently outperforms other architectures with an impressive 3%-25% absolute improvement in terms of average annual return, and up to more than 200% relative improvement in average Sharpe ratio. We also measured an improvement of a factor of up to 5 in the stability of performance under random choices of initial asset ordering and weights. The stability of the network has been found as particularly valuable by our industrial partner.
△ Less
Submitted 28 September, 2021; v1 submitted 14 September, 2021;
originally announced September 2021.
-
Deep Reinforcement Learning for Equal Risk Pricing and Hedging under Dynamic Expectile Risk Measures
Authors:
Saeed Marzban,
Erick Delage,
Jonathan Yumeng Li
Abstract:
Recently equal risk pricing, a framework for fair derivative pricing, was extended to consider dynamic risk measures. However, all current implementations either employ a static risk measure that violates time consistency, or are based on traditional dynamic programming solution schemes that are impracticable in problems with a large number of underlying assets (due to the curse of dimensionality)…
▽ More
Recently equal risk pricing, a framework for fair derivative pricing, was extended to consider dynamic risk measures. However, all current implementations either employ a static risk measure that violates time consistency, or are based on traditional dynamic programming solution schemes that are impracticable in problems with a large number of underlying assets (due to the curse of dimensionality) or with incomplete asset dynamics information. In this paper, we extend for the first time a famous off-policy deterministic actor-critic deep reinforcement learning (ACRL) algorithm to the problem of solving a risk averse Markov decision process that models risk using a time consistent recursive expectile risk measure. This new ACRL algorithm allows us to identify high quality time consistent hedging policies (and equal risk prices) for options, such as basket options, that cannot be handled using traditional methods, or in context where only historical trajectories of the underlying assets are available. Our numerical experiments, which involve both a simple vanilla option and a more exotic basket option, confirm that the new ACRL algorithm can produce 1) in simple environments, nearly optimal hedging policies, and highly accurate prices, simultaneously for a range of maturities 2) in complex environments, good quality policies and prices using reasonable amount of computing resources; and 3) overall, hedging strategies that actually outperform the strategies produced using static risk measures when the risk is evaluated at later points of time.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
Deep Reinforcement Learning for Optimal Stop** with Application in Financial Engineering
Authors:
Abderrahim Fathan,
Erick Delage
Abstract:
Optimal stop** is the problem of deciding the right time at which to take a particular action in a stochastic system, in order to maximize an expected reward. It has many applications in areas such as finance, healthcare, and statistics. In this paper, we employ deep Reinforcement Learning (RL) to learn optimal stop** policies in two financial engineering applications: namely option pricing, a…
▽ More
Optimal stop** is the problem of deciding the right time at which to take a particular action in a stochastic system, in order to maximize an expected reward. It has many applications in areas such as finance, healthcare, and statistics. In this paper, we employ deep Reinforcement Learning (RL) to learn optimal stop** policies in two financial engineering applications: namely option pricing, and optimal option exercise. We present for the first time a comprehensive empirical evaluation of the quality of optimal stop** policies identified by three state of the art deep RL algorithms: double deep Q-learning (DDQN), categorical distributional RL (C51), and Implicit Quantile Networks (IQN). In the case of option pricing, our findings indicate that in a theoretical Black-Schole environment, IQN successfully identifies nearly optimal prices. On the other hand, it is slightly outperformed by C51 when confronted to real stock data movements in a put option exercise problem that involves assets from the S&P500 index. More importantly, the C51 algorithm is able to identify an optimal stop** policy that achieves 8% more out-of-sample returns than the best of four natural benchmark policies. We conclude with a discussion of our findings which should pave the way for relevant future research.
△ Less
Submitted 18 May, 2021;
originally announced May 2021.
-
Robustifying Conditional Portfolio Decisions via Optimal Transport
Authors:
Viet Anh Nguyen,
Fan Zhang,
Shanshan Wang,
Jose Blanchet,
Erick Delage,
Yinyu Ye
Abstract:
We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariat…
▽ More
We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariate-return probability distribution in an optimal transport ambiguity set. Despite the non-linearity of the objective function in the probability measure, we show that the distributionally robust portfolio allocation with side information problem can be reformulated as a finite-dimensional optimization problem. If portfolio decisions are made based on either the mean-variance or the mean-Conditional Value-at-Risk criterion, the resulting reformulation can be further simplified to second-order or semi-definite cone programs. Empirical studies in the US equity market demonstrate the advantage of our integrative framework against other benchmarks.
△ Less
Submitted 9 April, 2024; v1 submitted 30 March, 2021;
originally announced March 2021.
-
Distributionally Robust Local Non-parametric Conditional Estimation
Authors:
Viet Anh Nguyen,
Fan Zhang,
Jose Blanchet,
Erick Delage,
Yinyu Ye
Abstract:
Conditional estimation given specific covariate values (i.e., local conditional estimation or functional estimation) is ubiquitously useful with applications in engineering, social and natural sciences. Existing data-driven non-parametric estimators mostly focus on structured homogeneous data (e.g., weakly independent and stationary data), thus they are sensitive to adversarial noise and may perfo…
▽ More
Conditional estimation given specific covariate values (i.e., local conditional estimation or functional estimation) is ubiquitously useful with applications in engineering, social and natural sciences. Existing data-driven non-parametric estimators mostly focus on structured homogeneous data (e.g., weakly independent and stationary data), thus they are sensitive to adversarial noise and may perform poorly under a low sample size. To alleviate these issues, we propose a new distributionally robust estimator that generates non-parametric local estimates by minimizing the worst-case conditional expected loss over all adversarial distributions in a Wasserstein ambiguity set. We show that despite being generally intractable, the local estimator can be efficiently found via convex optimization under broadly applicable settings, and it is robust to the corruption and heterogeneity of the data. Experiments with synthetic and MNIST datasets show the competitive performance of this new class of estimators.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
The value of randomized strategies in distributionally robust risk averse network interdiction games
Authors:
Utsav Sadana,
Erick Delage
Abstract:
Conditional Value at Risk (CVaR) is widely used to account for the preferences of a risk-averse agent in the extreme loss scenarios. To study the effectiveness of randomization in interdiction games with an interdictor that is both risk and ambiguity averse, we introduce a distributionally robust network interdiction game where the interdictor randomizes over the feasible interdiction plans in ord…
▽ More
Conditional Value at Risk (CVaR) is widely used to account for the preferences of a risk-averse agent in the extreme loss scenarios. To study the effectiveness of randomization in interdiction games with an interdictor that is both risk and ambiguity averse, we introduce a distributionally robust network interdiction game where the interdictor randomizes over the feasible interdiction plans in order to minimize the worst-case CVaR of the flow with respect to both the unknown distribution of the capacity of the arcs and his mixed strategy over interdicted arcs. The flow player, on the contrary, maximizes the total flow in the network. By using the budgeted uncertainty set, we control the degree of conservatism in the model and reformulate the interdictor's non-linear problem as a bi-convex optimization problem. For solving this problem to any given optimality level, we devise a spatial branch and bound algorithm that uses the McCormick inequalities and reduced reformulation linearization technique (RRLT) to obtain convex relaxation of the problem. We also develop a column generation algorithm to identify the optimal support of the convex relaxation which is then used in the coordinate descent algorithm to determine the upper bounds. The efficiency and convergence of the spatial branch and bound algorithm is established in the numerical experiments. Further, our numerical experiments show that randomized strategies can have significantly better in-sample and out-of-sample performance than optimal deterministic ones.
△ Less
Submitted 17 March, 2020;
originally announced March 2020.
-
Equal Risk Pricing and Hedging of Financial Derivatives with Convex Risk Measures
Authors:
Saeed Marzban,
Erick Delage,
Jonathan Yumeng Li
Abstract:
In this paper, we consider the problem of equal risk pricing and hedging in which the fair price of an option is the price that exposes both sides of the contract to the same level of risk. Focusing for the first time on the context where risk is measured according to convex risk measures, we establish that the problem reduces to solving independently the writer and the buyer's hedging problem wit…
▽ More
In this paper, we consider the problem of equal risk pricing and hedging in which the fair price of an option is the price that exposes both sides of the contract to the same level of risk. Focusing for the first time on the context where risk is measured according to convex risk measures, we establish that the problem reduces to solving independently the writer and the buyer's hedging problem with zero initial capital. By further imposing that the risk measures decompose in a way that satisfies a Markovian property, we provide dynamic programming equations that can be used to solve the hedging problems for both the case of European and American options. All of our results are general enough to accommodate situations where the risk is measured according to a worst-case risk measure as is typically done in robust optimization. Our numerical study illustrates the advantages of equal risk pricing over schemes that only account for a single party, pricing based on quadratic hedging (i.e. $ε$-arbitrage pricing), or pricing based on a fixed equivalent martingale measure (i.e. Black-Scholes pricing). In particular, the numerical results confirm that when employing an equal risk price both the writer and the buyer end up being exposed to risks that are more similar and on average smaller than what they would experience with the other approaches.
△ Less
Submitted 16 September, 2020; v1 submitted 7 February, 2020;
originally announced February 2020.
-
A Unified Framework for Dynamic Pari-Mutuel Information Market Design
Authors:
Shipra Agrawal,
Erick Delage,
Mark Peters,
Zizhuo Wang,
Yinyu Ye
Abstract:
Recently, several new pari-mutuel mechanisms have been introduced to organize markets for contingent claims. Hanson introduced a market maker derived from the logarithmic scoring rule, and later Chen and Pennock developed a cost function formulation for the market maker. On the other hand, the SCPM model of Peters et al. is based on ideas from a call auction setting using a convex optimization m…
▽ More
Recently, several new pari-mutuel mechanisms have been introduced to organize markets for contingent claims. Hanson introduced a market maker derived from the logarithmic scoring rule, and later Chen and Pennock developed a cost function formulation for the market maker. On the other hand, the SCPM model of Peters et al. is based on ideas from a call auction setting using a convex optimization model. In this work, we develop a unified framework that bridges these seemingly unrelated models for centrally organizing contingent claim markets. The framework, developed as a generalization of the SCPM, will support many desirable properties such as proper scoring, truthful bidding (in a myopic sense), efficient computation, and guarantees on worst case loss. In fact, our unified framework will allow us to express various proper scoring rules, existing or new, from classical utility functions in a convex optimization problem representing the market organizer. Additionally, we utilize concepts from duality to show that the market model is equivalent to a risk minimization problem where a convex risk measure is employed. This will allow us to more clearly understand the differences in the risk attitudes adopted by various mechanisms, and particularly deepen our intuition about popular mechanisms like Hanson's market-maker. In aggregate, we believe this work advances our understanding of the objectives that the market organizer is optimizing in popular pari-mutuel mechanisms by recasting them into one unified framework.
△ Less
Submitted 13 February, 2009;
originally announced February 2009.