-
Dynamic Capital Requirements for Markov Decision Processes
Authors:
William B. Haskell,
Abhishek Gupta,
Shi** Shao
Abstract:
We build on the theory of capital requirements (CRs) to create a new framework for modeling dynamic risk preferences. The key question is how to evaluate the risk of a payoff stream sequentially as new information is revealed. In our model, we associate each payoff stream with a disbursement strategy and a premium schedule to form a triple of stochastic processes. We characterize risk preferences…
▽ More
We build on the theory of capital requirements (CRs) to create a new framework for modeling dynamic risk preferences. The key question is how to evaluate the risk of a payoff stream sequentially as new information is revealed. In our model, we associate each payoff stream with a disbursement strategy and a premium schedule to form a triple of stochastic processes. We characterize risk preferences in terms of a single set that we call the risk frontier which characterizes acceptable triples. We then propose the generalized capital requirement (GCR) which evaluates the risk of a payoff stream by minimizing the premium schedule over acceptable triples. We apply this model to a risk-aware decision maker (DM) who controls a Markov decision process (MDP) and wants to find a policy to minimize the GCR of its payoff stream. The resulting GCR-MDP recovers many well-known risk-aware MDPs as special cases. To make this approach computationally viable, we obtain the temporal decomposition of the GCR in terms of the risk frontier. Then, we connect the temporal decomposition with the notion of an information state to compactly capture the dependence of DM's risk preferences on the problem history, where augmented dynamic programming can be used to compute an optimal policy. We report numerical experiments for the GCR-minimizing newsvendor.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Robustness to Modeling Errors in Risk-Sensitive Markov Decision Problems with Markov Risk Measures
Authors:
Shi** Shao,
Abhishek Gupta,
William B. Haskell
Abstract:
We consider risk-sensitive Markov decision processes (MDPs), where the MDP model is influenced by a parameter which takes values in a compact metric space. We identify sufficient conditions under which small perturbations in the model parameters lead to small changes in the optimal value function and optimal policy. We further establish the robustness of the risk-sensitive optimal policies to mode…
▽ More
We consider risk-sensitive Markov decision processes (MDPs), where the MDP model is influenced by a parameter which takes values in a compact metric space. We identify sufficient conditions under which small perturbations in the model parameters lead to small changes in the optimal value function and optimal policy. We further establish the robustness of the risk-sensitive optimal policies to modeling errors. Implications of the results for data-driven decision-making, decision-making with preference uncertainty, and systems with changing noise distributions are discussed.
△ Less
Submitted 26 September, 2022;
originally announced September 2022.
-
Preference Robust Optimization with Quasi-Concave Choice Functions for Multi-Attribute Prospects
Authors:
Jian Wu,
William B. Haskell,
Wenjie Huang,
Huifu Xu
Abstract:
Preference robust choice models concern decision-making problems where the decision maker's (DM) utility/risk preferences are ambiguous and the evaluation is based on the worst-case utility function/risk measure from a set of plausible utility functions/risk measures. The current preference robust choice models are mostly built upon von Neumann-Morgenstern expected utility theory, the theory of co…
▽ More
Preference robust choice models concern decision-making problems where the decision maker's (DM) utility/risk preferences are ambiguous and the evaluation is based on the worst-case utility function/risk measure from a set of plausible utility functions/risk measures. The current preference robust choice models are mostly built upon von Neumann-Morgenstern expected utility theory, the theory of convex risk measures, or Yaari's dual theory of choice, all of which assume the DM's preferences satisfy some specified axioms. In this paper, we extend the preference robust approach to a broader class of choice functions which satisfy monotonicity and quasi-concavity in the space of multi-attribute random prospects. While our new model is non-parametric and significantly extends the coverage of decision-making problems, it also brings new computational challenges due to the non-convexity of the optimization formulations, which arises from the non-concavity of the class of quasi-concave choice functions. To tackle these challenges, we develop a sorting-based algorithm that efficiently evaluates the robust choice function (RCF) by solving a sequence of linear programming problems. Then, we show how to solve preference robust optimization (PRO) problems by solving a sequence of convex optimization problems. We test our robust choice model and computational scheme on a single-attribute portfolio optimization problem and a multi-attribute capital allocation problem.
△ Less
Submitted 5 April, 2022; v1 submitted 30 August, 2020;
originally announced August 2020.
-
Asymptotic Analysis for Data-Driven Inventory Policies
Authors:
Xun Zhang,
Zhisheng Ye,
William B. Haskell
Abstract:
We study periodic review stochastic inventory control in the data-driven setting where the retailer makes ordering decisions based only on historical demand observations without any knowledge of the probability distribution of the demand. Since an (s, S)-policy is optimal when the demand distribution is known, we investigate the statistical properties of the data-driven (s, S)-policy obtained by r…
▽ More
We study periodic review stochastic inventory control in the data-driven setting where the retailer makes ordering decisions based only on historical demand observations without any knowledge of the probability distribution of the demand. Since an (s, S)-policy is optimal when the demand distribution is known, we investigate the statistical properties of the data-driven (s, S)-policy obtained by recursively computing the empirical cost-to-go functions. This policy is inherently challenging to analyze because the recursion induces propagation of the estimation error backwards in time. In this work, we establish the asymptotic properties of this data-driven policy by fully accounting for the error propagation. First, we rigorously show the consistency of the estimated parameters by filling in some gaps (due to unaccounted error propagation) in the existing studies. In this setting, empirical process theory (EPT) cannot be directly applied to show asymptotic normality. To explain, the empirical cost-to-go functions for the estimated parameters are not i.i.d. sums due to the error propagation. Our main methodological innovation comes from an asymptotic representation for multi-sample U-processes in terms of i.i.d. sums. This representation enables us to apply EPT to derive the influence functions of the estimated parameters and to establish joint asymptotic normality. Based on these results, we also propose an entirely data-driven estimator of the optimal expected cost and we derive its asymptotic distribution. We demonstrate some useful applications of our asymptotic results, including sample size determination and interval estimation. The results from our numerical simulations conform to our theoretical analysis.lations conform to our theoretical analysis.
△ Less
Submitted 4 November, 2021; v1 submitted 19 August, 2020;
originally announced August 2020.
-
A dynamic analytic method for risk-aware controlled martingale problems
Authors:
Jukka Isohätälä,
William B. Haskell
Abstract:
We present a new, tractable method for solving and analyzing risk-aware control problems over finite and infinite, discounted time-horizons where the dynamics of the controlled process are described as a martingale problem. Supposing general Polish state and action spaces, and using generalized, relaxed controls, we state a risk-aware dynamic optimal control problem of minimizing risk of costs des…
▽ More
We present a new, tractable method for solving and analyzing risk-aware control problems over finite and infinite, discounted time-horizons where the dynamics of the controlled process are described as a martingale problem. Supposing general Polish state and action spaces, and using generalized, relaxed controls, we state a risk-aware dynamic optimal control problem of minimizing risk of costs described by a generic risk function. We then construct an alternative formulation that takes the form of a nonlinear programming problem, constrained by the dynamic, {i.e.} time-dependent, and linear Kolmogorov forward equation describing the distribution of the state and accumulated costs. We show that the formulations are equivalent, and that the optimal control process can be taken to be Markov in the controlled process state, running costs, and time. We further prove that under additional conditions, the optimal value is attained. An example numeric problem is presented and solved.
△ Less
Submitted 22 June, 2020;
originally announced June 2020.
-
Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence
Authors:
Abhishek Gupta,
William B. Haskell
Abstract:
This paper develops a unified framework, based on iterated random operator theory, to analyze the convergence of constant stepsize recursive stochastic algorithms (RSAs). RSAs use randomization to efficiently compute expectations, and so their iterates form a stochastic process. The key idea of our analysis is to lift the RSA into an appropriate higher-dimensional space and then express it as an e…
▽ More
This paper develops a unified framework, based on iterated random operator theory, to analyze the convergence of constant stepsize recursive stochastic algorithms (RSAs). RSAs use randomization to efficiently compute expectations, and so their iterates form a stochastic process. The key idea of our analysis is to lift the RSA into an appropriate higher-dimensional space and then express it as an equivalent Markov chain. Instead of determining the convergence of this Markov chain (which may not converge under constant stepsize), we study the convergence of the distribution of this Markov chain. To study this, we define a new notion of Wasserstein divergence. We show that if the distribution of the iterates in the Markov chain satisfy a contraction property with respect to the Wasserstein divergence, then the Markov chain admits an invariant distribution. We show that convergence of a large family of constant stepsize RSAs can be understood using this framework, and we provide several detailed examples.
△ Less
Submitted 5 January, 2021; v1 submitted 25 March, 2020;
originally announced March 2020.
-
A Randomized Nonlinear Rescaling Method in Large-Scale Constrained Convex Optimization
Authors:
Bo Wei,
William B. Haskell,
Sixiang Zhao
Abstract:
We propose a new randomized algorithm for solving convex optimization problems that have a large number of constraints (with high probability). Existing methods like interior-point or Newton-type algorithms are hard to apply to such problems because they have expensive computation and storage requirements for Hessians and matrix inversions. Our algorithm is based on nonlinear rescaling (NLR), whic…
▽ More
We propose a new randomized algorithm for solving convex optimization problems that have a large number of constraints (with high probability). Existing methods like interior-point or Newton-type algorithms are hard to apply to such problems because they have expensive computation and storage requirements for Hessians and matrix inversions. Our algorithm is based on nonlinear rescaling (NLR), which is a primal-dual-type algorithm by Griva and Polyak {[{Math. Program., 106(2):237-259, 2006}]}. NLR introduces an equivalent problem through a transformation of the constraint functions, minimizes the corresponding augmented Lagrangian for given dual variables, and then uses this minimizer to update the dual variables for the next iteration. The primal update at each iteration is the solution of an unconstrained finite sum minimization problem where the terms are weighted by the current dual variables. We use randomized first-order algorithms to do these primal updates, for which they are especially well suited. In particular, we use the scaled dual variables as the sampling distribution for each primal update, and we show that this distribution is the optimal one among all probability distributions. We conclude by demonstrating the favorable numerical performance of our algorithm.
△ Less
Submitted 24 March, 2020;
originally announced March 2020.
-
A Flexible Multi-Facility Capacity Expansion Problem with Risk Aversion
Authors:
Sixiang Zhao,
William B. Haskell,
Michel-Alexandre Cardin
Abstract:
This paper studies flexible multi-facility capacity expansion with risk aversion. In this setting, the decision maker can periodically expand the capacity of facilities given observations of uncertain demand. We model this situation as a multi-stage stochastic programming problem. We express risk aversion in this problem through conditional value-at-risk (CVaR), and we formulate a mean-CVaR object…
▽ More
This paper studies flexible multi-facility capacity expansion with risk aversion. In this setting, the decision maker can periodically expand the capacity of facilities given observations of uncertain demand. We model this situation as a multi-stage stochastic programming problem. We express risk aversion in this problem through conditional value-at-risk (CVaR), and we formulate a mean-CVaR objective. To solve the multi-stage problem, we optimize over decision rules. In particular, we approximate the full policy space of the problem with a tractable family of if-then policies. Subsequently, a decomposition algorithm is proposed to optimize the decision rule. This algorithm decomposes the model over scenarios and it updates solutions via the subgradients of the recourse function. We demonstrate that this algorithm can quickly converge to high-performance policies. To illustrate the practical effectiveness of this method, a case study on the waste-to-energy system in Singapore is presented. These simulation results show that by adjusting the weight factor of the objective function, decision makers are able to trade off between a risk-averse policy that has a higher expected cost but a lower value-at-risk, and a risk-neutral policy that has a lower expected cost but a higher value-at-risk risk.
△ Less
Submitted 13 May, 2019;
originally announced May 2019.
-
A Multi-Level Simulation Optimization Approach for Quantile Functions
Authors:
Songhao Wang,
Szu Hui Ng,
William Benjamin Haskell
Abstract:
Quantile is a popular performance measure for a stochastic system to evaluate its variability and risk. To reduce the risk, selecting the actions that minimize the tail quantiles of some loss distributions is typically of interest for decision makers. When the loss distribution is observed via simulations, evaluating and optimizing its quantile functions can be challenging, especially when the sim…
▽ More
Quantile is a popular performance measure for a stochastic system to evaluate its variability and risk. To reduce the risk, selecting the actions that minimize the tail quantiles of some loss distributions is typically of interest for decision makers. When the loss distribution is observed via simulations, evaluating and optimizing its quantile functions can be challenging, especially when the simulations are expensive, as it may cost a large number of simulation runs to obtain accurate quantile estimators. In this work, we propose a multi-level metamodel (co-kriging) based algorithm to optimize quantile functions more efficiently. Utilizing non-decreasing properties of quantile functions, we first search on cheaper and informative lower quantiles which are more accurate and easier to optimize. The quantile level iteratively increases to the objective level while the search has a focus on the possible promising regions identified by the previous levels. This enables us to leverage the accurate information from the lower quantiles to find the optimums faster and improve algorithm efficiency.
△ Less
Submitted 17 January, 2019;
originally announced January 2019.
-
An Accelerated Fitted Value Iteration Algorithm for MDPs with Finite and Vector-Valued Action Space
Authors:
Sixiang Zhao,
William B. Haskell,
Michel-Alexandre Cardin
Abstract:
This paper studies an accelerated fitted value iteration (FVI) algorithm to solve high-dimensional Markov decision processes (MDPs). FVI is an approximate dynamic programming algorithm that has desirable theoretical properties. However, it can be intractable when the action space is finite but vector-valued. To solve such MDPs via FVI, we first approximate the value functions by a two-layer neural…
▽ More
This paper studies an accelerated fitted value iteration (FVI) algorithm to solve high-dimensional Markov decision processes (MDPs). FVI is an approximate dynamic programming algorithm that has desirable theoretical properties. However, it can be intractable when the action space is finite but vector-valued. To solve such MDPs via FVI, we first approximate the value functions by a two-layer neural network (NN) with rectified linear units (ReLU) being activation functions. We then verify that such approximators are strong enough for the MDP. To speed up the FVI, we recast the action selection problem as a two-stage stochastic programming problem, where the resulting recourse function comes from the two-layer NN. Then, the action selection problem is solved with a specialized multi-cut decomposition algorithm. More specifically, we design valid cuts by exploiting the structure of the approximated value functions to update the actions. We prove that the decomposition can find the global optimal solution in a finite number of iterations and the overall accelerated FVI is consistent. Finally, we verify the performance of the FVI algorithm via a multi-facility capacity investment problem (MCIP). A comprehensive numerical study is implemented, where the results show that the FVI is significantly accelerated without sacrificing too much in precision.
△ Less
Submitted 25 November, 2020; v1 submitted 16 January, 2019;
originally announced January 2019.
-
Model and Reinforcement Learning for Markov Games with Risk Preferences
Authors:
Wenjie Huang,
Pham Viet Hai,
William B. Haskell
Abstract:
We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic "risk" from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria…
▽ More
We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic "risk" from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.
△ Less
Submitted 21 November, 2019; v1 submitted 15 January, 2019;
originally announced January 2019.
-
Risk aware minimum principle for optimal control of stochastic differential equations
Authors:
Jukka Isohätälä,
William B. Haskell
Abstract:
We present a probabilistic formulation of risk aware optimal control problems for stochastic differential equations. Risk awareness is in our framework captured by objective functions in which the risk neutral expectation is replaced by a risk function, a nonlinear functional of random variables that account for the controller's risk preferences. We state and prove a risk aware minimum principle t…
▽ More
We present a probabilistic formulation of risk aware optimal control problems for stochastic differential equations. Risk awareness is in our framework captured by objective functions in which the risk neutral expectation is replaced by a risk function, a nonlinear functional of random variables that account for the controller's risk preferences. We state and prove a risk aware minimum principle that is a parsimonious generalization of the well-known risk neutral, stochastic Pontryagin's minimum principle. As our main results we give necessary and also sufficient conditions for optimality of control processes taking values on probability measures defined on a given action space. We show that remarkably, going from the risk neutral to the risk aware case, the minimum principle is simply modified by the introduction of one additional real-valued stochastic process that acts as a risk adjustment factor for given cost rate and terminal cost functions. This adjustment process is explicitly given as the expectation, conditional on the filtration at the given time, of an appropriately defined functional derivative of the risk function evaluated at the random total cost. For our results we rely on the Fréchet differentiability of the risk function, and for completeness, we prove under mild assumptions the existence of Fréchet derivatives of some common risk functions. We give a simple application of the results for a portfolio allocation problem and show that the risk awareness of the objective function gives rise to a risk premium term that is characterized by the risk adjustment process described above. This suggests uses of our results in e.g. pricing of risk modeled by generic risk functions in financial applications.
△ Less
Submitted 18 October, 2019; v1 submitted 21 December, 2018;
originally announced December 2018.
-
Corporative Stochastic Approximation with Random Constraint Sampling for Semi-Infinite Programming
Authors:
Bo Wei,
William B. Haskell,
Sixiang Zhao
Abstract:
We developed a corporative stochastic approximation (CSA) type algorithm for semi-infinite programming (SIP), where the cut generation problem is solved inexactly. First, we provide general error bounds for inexact CSA. Then, we propose two specific random constraint sampling schemes to approximately solve the cut generation problem. When the objective and constraint functions are generally convex…
▽ More
We developed a corporative stochastic approximation (CSA) type algorithm for semi-infinite programming (SIP), where the cut generation problem is solved inexactly. First, we provide general error bounds for inexact CSA. Then, we propose two specific random constraint sampling schemes to approximately solve the cut generation problem. When the objective and constraint functions are generally convex, we show that our randomized CSA algorithms achieve an $\mathcal{O}(1/\sqrt{N})$ rate of convergence in expectation (in terms of optimality gap as well as SIP constraint violation). When the objective and constraint functions are all strongly convex, this rate can be improved to $\mathcal{O}(1/N)$.
△ Less
Submitted 21 December, 2018;
originally announced December 2018.
-
Index-Based Policy for Risk-Averse Multi-Armed Bandit
Authors:
Jianyu Xu,
William B. Haskell,
Zhisheng Ye
Abstract:
The multi-armed bandit (MAB) is a classical online optimization model for the trade-off between exploration and exploitation. The traditional MAB is concerned with finding the arm that minimizes the mean cost. However, minimizing the mean does not take the risk of the problem into account. We now want to accommodate risk-averse decision makers. In this work, we introduce a coherent risk measure as…
▽ More
The multi-armed bandit (MAB) is a classical online optimization model for the trade-off between exploration and exploitation. The traditional MAB is concerned with finding the arm that minimizes the mean cost. However, minimizing the mean does not take the risk of the problem into account. We now want to accommodate risk-averse decision makers. In this work, we introduce a coherent risk measure as the criterion to form a risk-averse MAB. In particular, we derive an index-based online sampling framework for the risk-averse MAB. We develop this framework in detail for three specific risk measures, i.e. the conditional value-at-risk, the mean-deviation and the shortfall risk measures. Under each risk measure, the convergence rate for the upper bound on the pseudo regret, defined as the difference between the expectation of the empirical risk based on the observation sequence and the true risk of the optimal arm, is established.
△ Less
Submitted 14 September, 2018;
originally announced September 2018.
-
Preference Elicitation and Robust Optimization with Multi-Attribute Quasi-Concave Choice Functions
Authors:
William B. Haskell,
Wenjie Huang,
Huifu Xu
Abstract:
Decision maker's preferences are often captured by some choice functions which are used to rank prospects. In this paper, we consider ambiguity in choice functions over a multi-attribute prospect space. Our main result is a robust preference model where the optimal decision is based on the worst-case choice function from an ambiguity set constructed through preference elicitation with pairwise com…
▽ More
Decision maker's preferences are often captured by some choice functions which are used to rank prospects. In this paper, we consider ambiguity in choice functions over a multi-attribute prospect space. Our main result is a robust preference model where the optimal decision is based on the worst-case choice function from an ambiguity set constructed through preference elicitation with pairwise comparisons of prospects. Differing from existing works in the area, our focus is on quasi-concave choice functions rather than concave functions and this enables us to cover a wide range of utility/risk preference problems including multi-attribute expected utility and $S$-shaped aspirational risk preferences. The robust choice function is increasing and quasi-concave but not necessarily translation invariant, a key property of monetary risk measures. We propose two approaches based respectively on the support functions and level functions of quasi-concave functions to develop tractable formulations of the maximin preference robust optimization model. The former gives rise to a mixed integer linear programming problem whereas the latter is equivalent to solving a sequence of convex risk minimization problems. To assess the effectiveness of the proposed robust preference optimization model and numerical schemes, we apply them to a security budget allocation problem and report some preliminary results from experiments.
△ Less
Submitted 17 May, 2018;
originally announced May 2018.
-
Stochastic Approximation for Risk-aware Markov Decision Processes
Authors:
Wenjie Huang,
William B. Haskell
Abstract:
We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs $Q$-learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g. conditional value-at-risk,…
▽ More
We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs $Q$-learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g. conditional value-at-risk, optimized certainty equivalent, and absolute semi-deviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance $ε>0$ for the optimal $Q$-value estimation gap and learning rate $k\in(1/2,\,1]$, the overall convergence rate of our algorithm is $Ω((\ln(1/δε)/ε^{2})^{1/k}+(\ln(1/ε))^{1/(1-k)})$ with probability at least $1-δ$.
△ Less
Submitted 3 December, 2019; v1 submitted 11 May, 2018;
originally announced May 2018.
-
An Inexact Primal-Dual Algorithm for Semi-Infinite Programming
Authors:
Bo Wei,
William B. Haskell,
Sixiang Zhao
Abstract:
This paper considers an inexact primal-dual algorithm for semi-infinite programming (SIP) for which it provides general error bounds. To implement the dual variable update, we create a new prox function for nonnegative measures which turns out to be a generalization of the Kullback-Leibler divergence for probability distributions. We show that under suitable conditions on the error, this algorithm…
▽ More
This paper considers an inexact primal-dual algorithm for semi-infinite programming (SIP) for which it provides general error bounds. To implement the dual variable update, we create a new prox function for nonnegative measures which turns out to be a generalization of the Kullback-Leibler divergence for probability distributions. We show that under suitable conditions on the error, this algorithm achieves an $\mathcal{O}(1/\sqrt{K})$ rate of convergence in terms of the optimality gap and constraint violation. We then use our general error bounds to analyze the convergence and sample complexity of a specific primal-dual SIP algorithm based on Monte Carlo integration. Finally, we provide numerical experiments to demonstrate the performance of our algorithm.
△ Less
Submitted 15 January, 2019; v1 submitted 28 March, 2018;
originally announced March 2018.
-
Distributionally Robust Optimization for Sequential Decision Making
Authors:
Zhi Chen,
Pengqian Yu,
William B. Haskell
Abstract:
The distributionally robust Markov Decision Process (MDP) approach asks for a distributionally robust policy that achieves the maximal expected total reward under the most adversarial distribution of uncertain parameters. In this paper, we study distributionally robust MDPs where ambiguity sets for the uncertain parameters are of a format that can easily incorporate in its description the uncertai…
▽ More
The distributionally robust Markov Decision Process (MDP) approach asks for a distributionally robust policy that achieves the maximal expected total reward under the most adversarial distribution of uncertain parameters. In this paper, we study distributionally robust MDPs where ambiguity sets for the uncertain parameters are of a format that can easily incorporate in its description the uncertainty's generalized moment as well as statistical distance information. In this way, we generalize existing works on distributionally robust MDP with generalized-moment-based and statistical-distance-based ambiguity sets to incorporate information from the former class such as moments and dispersions to the latter class that critically depends on empirical observations of the uncertain parameters. We show that, under this format of ambiguity sets, the resulting distributionally robust MDP remains tractable under mild technical conditions. To be more specific, a distributionally robust policy can be constructed by solving a sequence of one-stage convex optimization subproblems.
△ Less
Submitted 9 October, 2018; v1 submitted 15 January, 2018;
originally announced January 2018.
-
An Inexact Primal-Dual Smoothing Framework for Large-Scale Non-Bilinear Saddle Point Problems
Authors:
Le Thi Khanh Hien,
Renbo Zhao,
William B. Haskell
Abstract:
We develop an inexact primal-dual first-order smoothing framework to solve a class of non-bilinear saddle point problems with primal strong convexity. Compared with existing methods, our framework yields a significant improvement over the primal oracle complexity, while it has competitive dual oracle complexity. In addition, we consider the situation where the primal-dual coupling term has a large…
▽ More
We develop an inexact primal-dual first-order smoothing framework to solve a class of non-bilinear saddle point problems with primal strong convexity. Compared with existing methods, our framework yields a significant improvement over the primal oracle complexity, while it has competitive dual oracle complexity. In addition, we consider the situation where the primal-dual coupling term has a large number of component functions. To efficiently handle this situation, we develop a randomized version of our smoothing framework, which allows the primal and dual sub-problems in each iteration to be inexactly solved by randomized algorithms in expectation. The convergence of this framework is analyzed both in expectation and with high probability. In terms of the primal and dual oracle complexities, this framework significantly improves over its deterministic counterpart. As an important application, we adapt both frameworks for solving convex optimization problems with many functional constraints. To obtain an $\varepsilon$-optimal and $\varepsilon$-feasible solution, both frameworks achieve the best-known oracle complexities.
△ Less
Submitted 24 July, 2023; v1 submitted 9 November, 2017;
originally announced November 2017.
-
An Empirical Dynamic Programming Algorithm for Continuous MDPs
Authors:
William B. Haskell,
Rahul Jain,
Hiteshi Sharma,
Pengqian Yu
Abstract:
We propose universal randomized function approximation-based empirical value iteration (EVI) algorithms for Markov decision processes. The `empirical' nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a non-parametric method for function approximation using a paramet…
▽ More
We propose universal randomized function approximation-based empirical value iteration (EVI) algorithms for Markov decision processes. The `empirical' nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a non-parametric method for function approximation using a parametric function space and the Reproducing Kernel Hilbert Space (RKHS) respectively are then combined with EVI. Both function spaces have the universal function approximation property. Basis functions are picked randomly. Convergence analysis is done using a random operator framework with techniques from the theory of stochastic dominance. Finite time sample complexity bounds are derived for both universal approximate dynamic programming algorithms. Numerical experiments support the versatility and effectiveness of this approach.
△ Less
Submitted 23 April, 2019; v1 submitted 21 September, 2017;
originally announced September 2017.
-
A Unified Framework for Stochastic Matrix Factorization via Variance Reduction
Authors:
Renbo Zhao,
William B. Haskell,
Jiashi Feng
Abstract:
We propose a unified framework to speed up the existing stochastic matrix factorization (SMF) algorithms via variance reduction. Our framework is general and it subsumes several well-known SMF formulations in the literature. We perform a non-asymptotic convergence analysis of our framework and derive computational and sample complexities for our algorithm to converge to an $ε$-stationary point in…
▽ More
We propose a unified framework to speed up the existing stochastic matrix factorization (SMF) algorithms via variance reduction. Our framework is general and it subsumes several well-known SMF formulations in the literature. We perform a non-asymptotic convergence analysis of our framework and derive computational and sample complexities for our algorithm to converge to an $ε$-stationary point in expectation. In addition, extensive experiments for a wide class of SMF formulations demonstrate that our framework consistently yields faster convergence and a more accurate output dictionary vis-à-vis state-of-the-art frameworks.
△ Less
Submitted 21 May, 2017; v1 submitted 19 May, 2017;
originally announced May 2017.
-
Stochastic L-BFGS: Improved Convergence Rates and Practical Acceleration Strategies
Authors:
Renbo Zhao,
William B. Haskell,
Vincent Y. F. Tan
Abstract:
We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. By proposing a new framework for the convergence analysis, we prove improved convergence rates and computational complexities of the stochastic L-BFGS algorithms compared to previous works. In addition, we propose several practical acceleration strategies to speed up the empirical performance of such algorithms. We also provide theo…
▽ More
We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. By proposing a new framework for the convergence analysis, we prove improved convergence rates and computational complexities of the stochastic L-BFGS algorithms compared to previous works. In addition, we propose several practical acceleration strategies to speed up the empirical performance of such algorithms. We also provide theoretical analyses for most of the strategies. Experiments on large-scale logistic and ridge regression problems demonstrate that our proposed strategies yield significant improvements vis-à-vis competing state-of-the-art algorithms.
△ Less
Submitted 24 October, 2017; v1 submitted 31 March, 2017;
originally announced April 2017.
-
Approximate Value Iteration for Risk-aware Markov Decision Processes
Authors:
Pengqian Yu,
William B. Haskell,
Huan Xu
Abstract:
We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling risk, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that model real-life problems are typically…
▽ More
We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling risk, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that model real-life problems are typically prohibitively large for such approaches. In this paper, we employ an approximate dynamic programming approach, and develop a family of simulation-based algorithms to approximately solve large-scale risk-aware MDPs. In parallel, we develop a unified convergence analysis technique to derive sample complexity bounds for this new family of algorithms.
△ Less
Submitted 16 May, 2017; v1 submitted 5 January, 2017;
originally announced January 2017.
-
Random constraint sampling and duality for convex optimization
Authors:
William B. Haskell,
Yu Pengqian
Abstract:
We are interested in solving convex optimization problems with large numbers of constraints. Randomized algorithms, such as random constraint sampling, have been very successful in giving nearly optimal solutions to such problems. In this paper, we combine random constraint sampling with the classical primal-dual algorithm for convex optimization problems with large numbers of constraints, and we…
▽ More
We are interested in solving convex optimization problems with large numbers of constraints. Randomized algorithms, such as random constraint sampling, have been very successful in giving nearly optimal solutions to such problems. In this paper, we combine random constraint sampling with the classical primal-dual algorithm for convex optimization problems with large numbers of constraints, and we give a convergence rate analysis. We then report numerical experiments that verify the effectiveness of this algorithm.
△ Less
Submitted 26 November, 2016; v1 submitted 21 October, 2016;
originally announced October 2016.
-
Empirical Dynamic Programming
Authors:
William B. Haskell,
Rahul Jain,
Dileep Kalathil
Abstract:
We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical value iteration' (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get `empirical policy iterati…
▽ More
We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical value iteration' (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get `empirical policy iteration' (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. We introduce notions of probabilistic fixed points for such random monotone operators. We develop a stochastic dominance framework for convergence analysis of such operators. We then use this to give sample complexity bounds for both EVI and EPI. We then provide various variations and extensions to asynchronous empirical dynamic programming, the minimax empirical dynamic program, and show how this can also be used to solve the dynamic newsvendor problem. Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms.
△ Less
Submitted 22 November, 2013;
originally announced November 2013.
-
Stochastic dominance-constrained Markov decision processes
Authors:
William B. Haskell,
Rahul Jain
Abstract:
We are interested in risk constraints for infinite horizon discrete time Markov decision processes (MDPs). Starting with average reward MDPs, we show that increasing concave stochastic dominance constraints on the empirical distribution of reward lead to linear constraints on occupation measures. The optimal policy for the resulting class of dominance-constrained MDPs is obtained by solving a line…
▽ More
We are interested in risk constraints for infinite horizon discrete time Markov decision processes (MDPs). Starting with average reward MDPs, we show that increasing concave stochastic dominance constraints on the empirical distribution of reward lead to linear constraints on occupation measures. The optimal policy for the resulting class of dominance-constrained MDPs is obtained by solving a linear program. We compute the dual of this linear program to obtain average dynamic programming optimality equations that reflect the dominance constraint. In particular, a new pricing term appears in the optimality equations corresponding to the dominance constraint. We show that many types of stochastic orders can be used in place of the increasing concave stochastic order. We also carry out a parallel development for discounted reward MDPs with stochastic dominance constraints. The paper concludes with a portfolio optimization example.
△ Less
Submitted 20 June, 2012;
originally announced June 2012.