Search | arXiv e-print repository

Dynamic Capital Requirements for Markov Decision Processes

Authors: William B. Haskell, Abhishek Gupta, Shi** Shao

Abstract: We build on the theory of capital requirements (CRs) to create a new framework for modeling dynamic risk preferences. The key question is how to evaluate the risk of a payoff stream sequentially as new information is revealed. In our model, we associate each payoff stream with a disbursement strategy and a premium schedule to form a triple of stochastic processes. We characterize risk preferences… ▽ More We build on the theory of capital requirements (CRs) to create a new framework for modeling dynamic risk preferences. The key question is how to evaluate the risk of a payoff stream sequentially as new information is revealed. In our model, we associate each payoff stream with a disbursement strategy and a premium schedule to form a triple of stochastic processes. We characterize risk preferences in terms of a single set that we call the risk frontier which characterizes acceptable triples. We then propose the generalized capital requirement (GCR) which evaluates the risk of a payoff stream by minimizing the premium schedule over acceptable triples. We apply this model to a risk-aware decision maker (DM) who controls a Markov decision process (MDP) and wants to find a policy to minimize the GCR of its payoff stream. The resulting GCR-MDP recovers many well-known risk-aware MDPs as special cases. To make this approach computationally viable, we obtain the temporal decomposition of the GCR in terms of the risk frontier. Then, we connect the temporal decomposition with the notion of an information state to compactly capture the dependence of DM's risk preferences on the problem history, where augmented dynamic programming can be used to compute an optimal policy. We report numerical experiments for the GCR-minimizing newsvendor. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2209.12937 [pdf, ps, other]

Robustness to Modeling Errors in Risk-Sensitive Markov Decision Problems with Markov Risk Measures

Authors: Shi** Shao, Abhishek Gupta, William B. Haskell

Abstract: We consider risk-sensitive Markov decision processes (MDPs), where the MDP model is influenced by a parameter which takes values in a compact metric space. We identify sufficient conditions under which small perturbations in the model parameters lead to small changes in the optimal value function and optimal policy. We further establish the robustness of the risk-sensitive optimal policies to mode… ▽ More We consider risk-sensitive Markov decision processes (MDPs), where the MDP model is influenced by a parameter which takes values in a compact metric space. We identify sufficient conditions under which small perturbations in the model parameters lead to small changes in the optimal value function and optimal policy. We further establish the robustness of the risk-sensitive optimal policies to modeling errors. Implications of the results for data-driven decision-making, decision-making with preference uncertainty, and systems with changing noise distributions are discussed. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: 24 pages, submitted to SIAM Journal on Control and Optimization

arXiv:2008.13309 [pdf, other]

Preference Robust Optimization with Quasi-Concave Choice Functions for Multi-Attribute Prospects

Authors: Jian Wu, William B. Haskell, Wenjie Huang, Huifu Xu

Abstract: Preference robust choice models concern decision-making problems where the decision maker's (DM) utility/risk preferences are ambiguous and the evaluation is based on the worst-case utility function/risk measure from a set of plausible utility functions/risk measures. The current preference robust choice models are mostly built upon von Neumann-Morgenstern expected utility theory, the theory of co… ▽ More Preference robust choice models concern decision-making problems where the decision maker's (DM) utility/risk preferences are ambiguous and the evaluation is based on the worst-case utility function/risk measure from a set of plausible utility functions/risk measures. The current preference robust choice models are mostly built upon von Neumann-Morgenstern expected utility theory, the theory of convex risk measures, or Yaari's dual theory of choice, all of which assume the DM's preferences satisfy some specified axioms. In this paper, we extend the preference robust approach to a broader class of choice functions which satisfy monotonicity and quasi-concavity in the space of multi-attribute random prospects. While our new model is non-parametric and significantly extends the coverage of decision-making problems, it also brings new computational challenges due to the non-convexity of the optimization formulations, which arises from the non-concavity of the class of quasi-concave choice functions. To tackle these challenges, we develop a sorting-based algorithm that efficiently evaluates the robust choice function (RCF) by solving a sequence of linear programming problems. Then, we show how to solve preference robust optimization (PRO) problems by solving a sequence of convex optimization problems. We test our robust choice model and computational scheme on a single-attribute portfolio optimization problem and a multi-attribute capital allocation problem. △ Less

Submitted 5 April, 2022; v1 submitted 30 August, 2020; originally announced August 2020.

Comments: 59 pages, 6 figures, submitted

arXiv:2008.08275

Asymptotic Analysis for Data-Driven Inventory Policies

Authors: Xun Zhang, Zhisheng Ye, William B. Haskell

Abstract: We study periodic review stochastic inventory control in the data-driven setting where the retailer makes ordering decisions based only on historical demand observations without any knowledge of the probability distribution of the demand. Since an (s, S)-policy is optimal when the demand distribution is known, we investigate the statistical properties of the data-driven (s, S)-policy obtained by r… ▽ More We study periodic review stochastic inventory control in the data-driven setting where the retailer makes ordering decisions based only on historical demand observations without any knowledge of the probability distribution of the demand. Since an (s, S)-policy is optimal when the demand distribution is known, we investigate the statistical properties of the data-driven (s, S)-policy obtained by recursively computing the empirical cost-to-go functions. This policy is inherently challenging to analyze because the recursion induces propagation of the estimation error backwards in time. In this work, we establish the asymptotic properties of this data-driven policy by fully accounting for the error propagation. First, we rigorously show the consistency of the estimated parameters by filling in some gaps (due to unaccounted error propagation) in the existing studies. In this setting, empirical process theory (EPT) cannot be directly applied to show asymptotic normality. To explain, the empirical cost-to-go functions for the estimated parameters are not i.i.d. sums due to the error propagation. Our main methodological innovation comes from an asymptotic representation for multi-sample U-processes in terms of i.i.d. sums. This representation enables us to apply EPT to derive the influence functions of the estimated parameters and to establish joint asymptotic normality. Based on these results, we also propose an entirely data-driven estimator of the optimal expected cost and we derive its asymptotic distribution. We demonstrate some useful applications of our asymptotic results, including sample size determination and interval estimation. The results from our numerical simulations conform to our theoretical analysis.lations conform to our theoretical analysis. △ Less

Submitted 4 November, 2021; v1 submitted 19 August, 2020; originally announced August 2020.

Comments: The authors plan to include the updated version into a research proposal. To avoid the possible inconvinence, the authors decided to remove the updated version for now

arXiv:2006.12450 [pdf, other]

A dynamic analytic method for risk-aware controlled martingale problems

Authors: Jukka Isohätälä, William B. Haskell

Abstract: We present a new, tractable method for solving and analyzing risk-aware control problems over finite and infinite, discounted time-horizons where the dynamics of the controlled process are described as a martingale problem. Supposing general Polish state and action spaces, and using generalized, relaxed controls, we state a risk-aware dynamic optimal control problem of minimizing risk of costs des… ▽ More We present a new, tractable method for solving and analyzing risk-aware control problems over finite and infinite, discounted time-horizons where the dynamics of the controlled process are described as a martingale problem. Supposing general Polish state and action spaces, and using generalized, relaxed controls, we state a risk-aware dynamic optimal control problem of minimizing risk of costs described by a generic risk function. We then construct an alternative formulation that takes the form of a nonlinear programming problem, constrained by the dynamic, {i.e.} time-dependent, and linear Kolmogorov forward equation describing the distribution of the state and accumulated costs. We show that the formulations are equivalent, and that the optimal control process can be taken to be Markov in the controlled process state, running costs, and time. We further prove that under additional conditions, the optimal value is attained. An example numeric problem is presented and solved. △ Less

Submitted 22 June, 2020; originally announced June 2020.

MSC Class: Primary; 93E20; 60J25; Secondary; 60J35; 90C30

arXiv:2003.11403 [pdf, ps, other]

Convergence of Recursive Stochastic Algorithms using Wasserstein Divergence

Authors: Abhishek Gupta, William B. Haskell

Abstract: This paper develops a unified framework, based on iterated random operator theory, to analyze the convergence of constant stepsize recursive stochastic algorithms (RSAs). RSAs use randomization to efficiently compute expectations, and so their iterates form a stochastic process. The key idea of our analysis is to lift the RSA into an appropriate higher-dimensional space and then express it as an e… ▽ More This paper develops a unified framework, based on iterated random operator theory, to analyze the convergence of constant stepsize recursive stochastic algorithms (RSAs). RSAs use randomization to efficiently compute expectations, and so their iterates form a stochastic process. The key idea of our analysis is to lift the RSA into an appropriate higher-dimensional space and then express it as an equivalent Markov chain. Instead of determining the convergence of this Markov chain (which may not converge under constant stepsize), we study the convergence of the distribution of this Markov chain. To study this, we define a new notion of Wasserstein divergence. We show that if the distribution of the iterates in the Markov chain satisfy a contraction property with respect to the Wasserstein divergence, then the Markov chain admits an invariant distribution. We show that convergence of a large family of constant stepsize RSAs can be understood using this framework, and we provide several detailed examples. △ Less

Submitted 5 January, 2021; v1 submitted 25 March, 2020; originally announced March 2020.

Comments: 34 pages, submitted to SIMODS

MSC Class: 93E35; 60J20; 68Q32

arXiv:2003.10888 [pdf, other]

A Randomized Nonlinear Rescaling Method in Large-Scale Constrained Convex Optimization

Authors: Bo Wei, William B. Haskell, Sixiang Zhao

Abstract: We propose a new randomized algorithm for solving convex optimization problems that have a large number of constraints (with high probability). Existing methods like interior-point or Newton-type algorithms are hard to apply to such problems because they have expensive computation and storage requirements for Hessians and matrix inversions. Our algorithm is based on nonlinear rescaling (NLR), whic… ▽ More We propose a new randomized algorithm for solving convex optimization problems that have a large number of constraints (with high probability). Existing methods like interior-point or Newton-type algorithms are hard to apply to such problems because they have expensive computation and storage requirements for Hessians and matrix inversions. Our algorithm is based on nonlinear rescaling (NLR), which is a primal-dual-type algorithm by Griva and Polyak {[{Math. Program., 106(2):237-259, 2006}]}. NLR introduces an equivalent problem through a transformation of the constraint functions, minimizes the corresponding augmented Lagrangian for given dual variables, and then uses this minimizer to update the dual variables for the next iteration. The primal update at each iteration is the solution of an unconstrained finite sum minimization problem where the terms are weighted by the current dual variables. We use randomized first-order algorithms to do these primal updates, for which they are especially well suited. In particular, we use the scaled dual variables as the sampling distribution for each primal update, and we show that this distribution is the optimal one among all probability distributions. We conclude by demonstrating the favorable numerical performance of our algorithm. △ Less

Submitted 24 March, 2020; originally announced March 2020.

arXiv:1905.05328 [pdf, other]

A Flexible Multi-Facility Capacity Expansion Problem with Risk Aversion

Authors: Sixiang Zhao, William B. Haskell, Michel-Alexandre Cardin

Abstract: This paper studies flexible multi-facility capacity expansion with risk aversion. In this setting, the decision maker can periodically expand the capacity of facilities given observations of uncertain demand. We model this situation as a multi-stage stochastic programming problem. We express risk aversion in this problem through conditional value-at-risk (CVaR), and we formulate a mean-CVaR object… ▽ More This paper studies flexible multi-facility capacity expansion with risk aversion. In this setting, the decision maker can periodically expand the capacity of facilities given observations of uncertain demand. We model this situation as a multi-stage stochastic programming problem. We express risk aversion in this problem through conditional value-at-risk (CVaR), and we formulate a mean-CVaR objective. To solve the multi-stage problem, we optimize over decision rules. In particular, we approximate the full policy space of the problem with a tractable family of if-then policies. Subsequently, a decomposition algorithm is proposed to optimize the decision rule. This algorithm decomposes the model over scenarios and it updates solutions via the subgradients of the recourse function. We demonstrate that this algorithm can quickly converge to high-performance policies. To illustrate the practical effectiveness of this method, a case study on the waste-to-energy system in Singapore is presented. These simulation results show that by adjusting the weight factor of the objective function, decision makers are able to trade off between a risk-averse policy that has a higher expected cost but a lower value-at-risk, and a risk-neutral policy that has a lower expected cost but a higher value-at-risk risk. △ Less

Submitted 13 May, 2019; originally announced May 2019.

arXiv:1901.05768 [pdf, other]

A Multi-Level Simulation Optimization Approach for Quantile Functions

Authors: Songhao Wang, Szu Hui Ng, William Benjamin Haskell

Abstract: Quantile is a popular performance measure for a stochastic system to evaluate its variability and risk. To reduce the risk, selecting the actions that minimize the tail quantiles of some loss distributions is typically of interest for decision makers. When the loss distribution is observed via simulations, evaluating and optimizing its quantile functions can be challenging, especially when the sim… ▽ More Quantile is a popular performance measure for a stochastic system to evaluate its variability and risk. To reduce the risk, selecting the actions that minimize the tail quantiles of some loss distributions is typically of interest for decision makers. When the loss distribution is observed via simulations, evaluating and optimizing its quantile functions can be challenging, especially when the simulations are expensive, as it may cost a large number of simulation runs to obtain accurate quantile estimators. In this work, we propose a multi-level metamodel (co-kriging) based algorithm to optimize quantile functions more efficiently. Utilizing non-decreasing properties of quantile functions, we first search on cheaper and informative lower quantiles which are more accurate and easier to optimize. The quantile level iteratively increases to the objective level while the search has a focus on the possible promising regions identified by the previous levels. This enables us to leverage the accurate information from the lower quantiles to find the optimums faster and improve algorithm efficiency. △ Less

Submitted 17 January, 2019; originally announced January 2019.

arXiv:1901.05154 [pdf, other]

An Accelerated Fitted Value Iteration Algorithm for MDPs with Finite and Vector-Valued Action Space

Authors: Sixiang Zhao, William B. Haskell, Michel-Alexandre Cardin

Abstract: This paper studies an accelerated fitted value iteration (FVI) algorithm to solve high-dimensional Markov decision processes (MDPs). FVI is an approximate dynamic programming algorithm that has desirable theoretical properties. However, it can be intractable when the action space is finite but vector-valued. To solve such MDPs via FVI, we first approximate the value functions by a two-layer neural… ▽ More This paper studies an accelerated fitted value iteration (FVI) algorithm to solve high-dimensional Markov decision processes (MDPs). FVI is an approximate dynamic programming algorithm that has desirable theoretical properties. However, it can be intractable when the action space is finite but vector-valued. To solve such MDPs via FVI, we first approximate the value functions by a two-layer neural network (NN) with rectified linear units (ReLU) being activation functions. We then verify that such approximators are strong enough for the MDP. To speed up the FVI, we recast the action selection problem as a two-stage stochastic programming problem, where the resulting recourse function comes from the two-layer NN. Then, the action selection problem is solved with a specialized multi-cut decomposition algorithm. More specifically, we design valid cuts by exploiting the structure of the approximated value functions to update the actions. We prove that the decomposition can find the global optimal solution in a finite number of iterations and the overall accelerated FVI is consistent. Finally, we verify the performance of the FVI algorithm via a multi-facility capacity investment problem (MCIP). A comprehensive numerical study is implemented, where the results show that the FVI is significantly accelerated without sacrificing too much in precision. △ Less

Submitted 25 November, 2020; v1 submitted 16 January, 2019; originally announced January 2019.

arXiv:1901.04882 [pdf, other]

Model and Reinforcement Learning for Markov Games with Risk Preferences

Authors: Wenjie Huang, Pham Viet Hai, William B. Haskell

Abstract: We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic "risk" from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria… ▽ More We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic "risk" from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making. △ Less

Submitted 21 November, 2019; v1 submitted 15 January, 2019; originally announced January 2019.

Comments: 38 pages, 6 tables, 5 figures

arXiv:1812.09179 [pdf, ps, other]

Risk aware minimum principle for optimal control of stochastic differential equations

Authors: Jukka Isohätälä, William B. Haskell

Abstract: We present a probabilistic formulation of risk aware optimal control problems for stochastic differential equations. Risk awareness is in our framework captured by objective functions in which the risk neutral expectation is replaced by a risk function, a nonlinear functional of random variables that account for the controller's risk preferences. We state and prove a risk aware minimum principle t… ▽ More We present a probabilistic formulation of risk aware optimal control problems for stochastic differential equations. Risk awareness is in our framework captured by objective functions in which the risk neutral expectation is replaced by a risk function, a nonlinear functional of random variables that account for the controller's risk preferences. We state and prove a risk aware minimum principle that is a parsimonious generalization of the well-known risk neutral, stochastic Pontryagin's minimum principle. As our main results we give necessary and also sufficient conditions for optimality of control processes taking values on probability measures defined on a given action space. We show that remarkably, going from the risk neutral to the risk aware case, the minimum principle is simply modified by the introduction of one additional real-valued stochastic process that acts as a risk adjustment factor for given cost rate and terminal cost functions. This adjustment process is explicitly given as the expectation, conditional on the filtration at the given time, of an appropriately defined functional derivative of the risk function evaluated at the random total cost. For our results we rely on the Fréchet differentiability of the risk function, and for completeness, we prove under mild assumptions the existence of Fréchet derivatives of some common risk functions. We give a simple application of the results for a portfolio allocation problem and show that the risk awareness of the objective function gives rise to a risk premium term that is characterized by the risk adjustment process described above. This suggests uses of our results in e.g. pricing of risk modeled by generic risk functions in financial applications. △ Less

Submitted 18 October, 2019; v1 submitted 21 December, 2018; originally announced December 2018.

arXiv:1812.09017 [pdf, ps, other]

Corporative Stochastic Approximation with Random Constraint Sampling for Semi-Infinite Programming

Authors: Bo Wei, William B. Haskell, Sixiang Zhao

Abstract: We developed a corporative stochastic approximation (CSA) type algorithm for semi-infinite programming (SIP), where the cut generation problem is solved inexactly. First, we provide general error bounds for inexact CSA. Then, we propose two specific random constraint sampling schemes to approximately solve the cut generation problem. When the objective and constraint functions are generally convex… ▽ More We developed a corporative stochastic approximation (CSA) type algorithm for semi-infinite programming (SIP), where the cut generation problem is solved inexactly. First, we provide general error bounds for inexact CSA. Then, we propose two specific random constraint sampling schemes to approximately solve the cut generation problem. When the objective and constraint functions are generally convex, we show that our randomized CSA algorithms achieve an $\mathcal{O}(1/\sqrt{N})$ rate of convergence in expectation (in terms of optimality gap as well as SIP constraint violation). When the objective and constraint functions are all strongly convex, this rate can be improved to $\mathcal{O}(1/N)$. △ Less

Submitted 21 December, 2018; originally announced December 2018.

arXiv:1809.05385 [pdf, ps, other]

Index-Based Policy for Risk-Averse Multi-Armed Bandit

Authors: Jianyu Xu, William B. Haskell, Zhisheng Ye

Abstract: The multi-armed bandit (MAB) is a classical online optimization model for the trade-off between exploration and exploitation. The traditional MAB is concerned with finding the arm that minimizes the mean cost. However, minimizing the mean does not take the risk of the problem into account. We now want to accommodate risk-averse decision makers. In this work, we introduce a coherent risk measure as… ▽ More The multi-armed bandit (MAB) is a classical online optimization model for the trade-off between exploration and exploitation. The traditional MAB is concerned with finding the arm that minimizes the mean cost. However, minimizing the mean does not take the risk of the problem into account. We now want to accommodate risk-averse decision makers. In this work, we introduce a coherent risk measure as the criterion to form a risk-averse MAB. In particular, we derive an index-based online sampling framework for the risk-averse MAB. We develop this framework in detail for three specific risk measures, i.e. the conditional value-at-risk, the mean-deviation and the shortfall risk measures. Under each risk measure, the convergence rate for the upper bound on the pseudo regret, defined as the difference between the expectation of the empirical risk based on the observation sequence and the true risk of the optimal arm, is established. △ Less

Submitted 14 September, 2018; originally announced September 2018.

arXiv:1805.06632 [pdf, other]

Preference Elicitation and Robust Optimization with Multi-Attribute Quasi-Concave Choice Functions

Authors: William B. Haskell, Wenjie Huang, Huifu Xu

Abstract: Decision maker's preferences are often captured by some choice functions which are used to rank prospects. In this paper, we consider ambiguity in choice functions over a multi-attribute prospect space. Our main result is a robust preference model where the optimal decision is based on the worst-case choice function from an ambiguity set constructed through preference elicitation with pairwise com… ▽ More Decision maker's preferences are often captured by some choice functions which are used to rank prospects. In this paper, we consider ambiguity in choice functions over a multi-attribute prospect space. Our main result is a robust preference model where the optimal decision is based on the worst-case choice function from an ambiguity set constructed through preference elicitation with pairwise comparisons of prospects. Differing from existing works in the area, our focus is on quasi-concave choice functions rather than concave functions and this enables us to cover a wide range of utility/risk preference problems including multi-attribute expected utility and $S$-shaped aspirational risk preferences. The robust choice function is increasing and quasi-concave but not necessarily translation invariant, a key property of monetary risk measures. We propose two approaches based respectively on the support functions and level functions of quasi-concave functions to develop tractable formulations of the maximin preference robust optimization model. The former gives rise to a mixed integer linear programming problem whereas the latter is equivalent to solving a sequence of convex risk minimization problems. To assess the effectiveness of the proposed robust preference optimization model and numerical schemes, we apply them to a security budget allocation problem and report some preliminary results from experiments. △ Less

Submitted 17 May, 2018; originally announced May 2018.

Comments: 36 pages, 4 figures, submitted to Operations Research

arXiv:1805.04238 [pdf, other]

Stochastic Approximation for Risk-aware Markov Decision Processes

Authors: Wenjie Huang, William B. Haskell

Abstract: We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs $Q$-learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g. conditional value-at-risk,… ▽ More We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point problem. The outer loop performs $Q$-learning to compute an optimal risk-aware policy. Several widely investigated risk measures (e.g. conditional value-at-risk, optimized certainty equivalent, and absolute semi-deviation) are covered by our algorithm. Almost sure convergence and the convergence rate of the algorithm are established. For an error tolerance $ε>0$ for the optimal $Q$-value estimation gap and learning rate $k\in(1/2,\,1]$, the overall convergence rate of our algorithm is $Ω((\ln(1/δε)/ε^{2})^{1/k}+(\ln(1/ε))^{1/(1-k)})$ with probability at least $1-δ$. △ Less

Submitted 3 December, 2019; v1 submitted 11 May, 2018; originally announced May 2018.

Comments: 34 pages, 4 figures, 2 tables

arXiv:1803.10898 [pdf, other]

An Inexact Primal-Dual Algorithm for Semi-Infinite Programming

Authors: Bo Wei, William B. Haskell, Sixiang Zhao

Abstract: This paper considers an inexact primal-dual algorithm for semi-infinite programming (SIP) for which it provides general error bounds. To implement the dual variable update, we create a new prox function for nonnegative measures which turns out to be a generalization of the Kullback-Leibler divergence for probability distributions. We show that under suitable conditions on the error, this algorithm… ▽ More This paper considers an inexact primal-dual algorithm for semi-infinite programming (SIP) for which it provides general error bounds. To implement the dual variable update, we create a new prox function for nonnegative measures which turns out to be a generalization of the Kullback-Leibler divergence for probability distributions. We show that under suitable conditions on the error, this algorithm achieves an $\mathcal{O}(1/\sqrt{K})$ rate of convergence in terms of the optimality gap and constraint violation. We then use our general error bounds to analyze the convergence and sample complexity of a specific primal-dual SIP algorithm based on Monte Carlo integration. Finally, we provide numerical experiments to demonstrate the performance of our algorithm. △ Less

Submitted 15 January, 2019; v1 submitted 28 March, 2018; originally announced March 2018.

arXiv:1801.04745 [pdf, ps, other]

Distributionally Robust Optimization for Sequential Decision Making

Authors: Zhi Chen, Pengqian Yu, William B. Haskell

Abstract: The distributionally robust Markov Decision Process (MDP) approach asks for a distributionally robust policy that achieves the maximal expected total reward under the most adversarial distribution of uncertain parameters. In this paper, we study distributionally robust MDPs where ambiguity sets for the uncertain parameters are of a format that can easily incorporate in its description the uncertai… ▽ More The distributionally robust Markov Decision Process (MDP) approach asks for a distributionally robust policy that achieves the maximal expected total reward under the most adversarial distribution of uncertain parameters. In this paper, we study distributionally robust MDPs where ambiguity sets for the uncertain parameters are of a format that can easily incorporate in its description the uncertainty's generalized moment as well as statistical distance information. In this way, we generalize existing works on distributionally robust MDP with generalized-moment-based and statistical-distance-based ambiguity sets to incorporate information from the former class such as moments and dispersions to the latter class that critically depends on empirical observations of the uncertain parameters. We show that, under this format of ambiguity sets, the resulting distributionally robust MDP remains tractable under mild technical conditions. To be more specific, a distributionally robust policy can be constructed by solving a sequence of one-stage convex optimization subproblems. △ Less

Submitted 9 October, 2018; v1 submitted 15 January, 2018; originally announced January 2018.

arXiv:1711.03669 [pdf, other]

An Inexact Primal-Dual Smoothing Framework for Large-Scale Non-Bilinear Saddle Point Problems

Authors: Le Thi Khanh Hien, Renbo Zhao, William B. Haskell

Abstract: We develop an inexact primal-dual first-order smoothing framework to solve a class of non-bilinear saddle point problems with primal strong convexity. Compared with existing methods, our framework yields a significant improvement over the primal oracle complexity, while it has competitive dual oracle complexity. In addition, we consider the situation where the primal-dual coupling term has a large… ▽ More We develop an inexact primal-dual first-order smoothing framework to solve a class of non-bilinear saddle point problems with primal strong convexity. Compared with existing methods, our framework yields a significant improvement over the primal oracle complexity, while it has competitive dual oracle complexity. In addition, we consider the situation where the primal-dual coupling term has a large number of component functions. To efficiently handle this situation, we develop a randomized version of our smoothing framework, which allows the primal and dual sub-problems in each iteration to be inexactly solved by randomized algorithms in expectation. The convergence of this framework is analyzed both in expectation and with high probability. In terms of the primal and dual oracle complexities, this framework significantly improves over its deterministic counterpart. As an important application, we adapt both frameworks for solving convex optimization problems with many functional constraints. To obtain an $\varepsilon$-optimal and $\varepsilon$-feasible solution, both frameworks achieve the best-known oracle complexities. △ Less

Submitted 24 July, 2023; v1 submitted 9 November, 2017; originally announced November 2017.

arXiv:1709.07506 [pdf, other]

An Empirical Dynamic Programming Algorithm for Continuous MDPs

Authors: William B. Haskell, Rahul Jain, Hiteshi Sharma, Pengqian Yu

Abstract: We propose universal randomized function approximation-based empirical value iteration (EVI) algorithms for Markov decision processes. The `empirical' nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a non-parametric method for function approximation using a paramet… ▽ More We propose universal randomized function approximation-based empirical value iteration (EVI) algorithms for Markov decision processes. The `empirical' nature comes from each iteration being done empirically from samples available from simulations of the next state. This makes the Bellman operator a random operator. A parametric and a non-parametric method for function approximation using a parametric function space and the Reproducing Kernel Hilbert Space (RKHS) respectively are then combined with EVI. Both function spaces have the universal function approximation property. Basis functions are picked randomly. Convergence analysis is done using a random operator framework with techniques from the theory of stochastic dominance. Finite time sample complexity bounds are derived for both universal approximate dynamic programming algorithms. Numerical experiments support the versatility and effectiveness of this approach. △ Less

Submitted 23 April, 2019; v1 submitted 21 September, 2017; originally announced September 2017.

Comments: Accepted for publication in IEEE Transactions on Automatic Control

arXiv:1705.06884 [pdf, other]

A Unified Framework for Stochastic Matrix Factorization via Variance Reduction

Authors: Renbo Zhao, William B. Haskell, Jiashi Feng

Abstract: We propose a unified framework to speed up the existing stochastic matrix factorization (SMF) algorithms via variance reduction. Our framework is general and it subsumes several well-known SMF formulations in the literature. We perform a non-asymptotic convergence analysis of our framework and derive computational and sample complexities for our algorithm to converge to an $ε$-stationary point in… ▽ More We propose a unified framework to speed up the existing stochastic matrix factorization (SMF) algorithms via variance reduction. Our framework is general and it subsumes several well-known SMF formulations in the literature. We perform a non-asymptotic convergence analysis of our framework and derive computational and sample complexities for our algorithm to converge to an $ε$-stationary point in expectation. In addition, extensive experiments for a wide class of SMF formulations demonstrate that our framework consistently yields faster convergence and a more accurate output dictionary vis-à-vis state-of-the-art frameworks. △ Less

Submitted 21 May, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

arXiv:1704.00116 [pdf, other]

doi 10.1109/TSP.2017.2784360

Stochastic L-BFGS: Improved Convergence Rates and Practical Acceleration Strategies

Authors: Renbo Zhao, William B. Haskell, Vincent Y. F. Tan

Abstract: We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. By proposing a new framework for the convergence analysis, we prove improved convergence rates and computational complexities of the stochastic L-BFGS algorithms compared to previous works. In addition, we propose several practical acceleration strategies to speed up the empirical performance of such algorithms. We also provide theo… ▽ More We revisit the stochastic limited-memory BFGS (L-BFGS) algorithm. By proposing a new framework for the convergence analysis, we prove improved convergence rates and computational complexities of the stochastic L-BFGS algorithms compared to previous works. In addition, we propose several practical acceleration strategies to speed up the empirical performance of such algorithms. We also provide theoretical analyses for most of the strategies. Experiments on large-scale logistic and ridge regression problems demonstrate that our proposed strategies yield significant improvements vis-à-vis competing state-of-the-art algorithms. △ Less

Submitted 24 October, 2017; v1 submitted 31 March, 2017; originally announced April 2017.

arXiv:1701.01290 [pdf, other]

Approximate Value Iteration for Risk-aware Markov Decision Processes

Authors: Pengqian Yu, William B. Haskell, Huan Xu

Abstract: We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling risk, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that model real-life problems are typically… ▽ More We consider large-scale Markov decision processes (MDPs) with a risk measure of variability in cost, under the risk-aware MDPs paradigm. Previous studies showed that risk-aware MDPs, based on a minimax approach to handling risk, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that model real-life problems are typically prohibitively large for such approaches. In this paper, we employ an approximate dynamic programming approach, and develop a family of simulation-based algorithms to approximately solve large-scale risk-aware MDPs. In parallel, we develop a unified convergence analysis technique to derive sample complexity bounds for this new family of algorithms. △ Less

Submitted 16 May, 2017; v1 submitted 5 January, 2017; originally announced January 2017.

arXiv:1610.06702

Random constraint sampling and duality for convex optimization

Authors: William B. Haskell, Yu Pengqian

Abstract: We are interested in solving convex optimization problems with large numbers of constraints. Randomized algorithms, such as random constraint sampling, have been very successful in giving nearly optimal solutions to such problems. In this paper, we combine random constraint sampling with the classical primal-dual algorithm for convex optimization problems with large numbers of constraints, and we… ▽ More We are interested in solving convex optimization problems with large numbers of constraints. Randomized algorithms, such as random constraint sampling, have been very successful in giving nearly optimal solutions to such problems. In this paper, we combine random constraint sampling with the classical primal-dual algorithm for convex optimization problems with large numbers of constraints, and we give a convergence rate analysis. We then report numerical experiments that verify the effectiveness of this algorithm. △ Less

Submitted 26 November, 2016; v1 submitted 21 October, 2016; originally announced October 2016.

Comments: Substantially revised draft in preparation, with much stronger results

arXiv:1311.5918 [pdf, other]

Empirical Dynamic Programming

Authors: William B. Haskell, Rahul Jain, Dileep Kalathil

Abstract: We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical value iteration' (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get `empirical policy iterati… ▽ More We propose empirical dynamic programming algorithms for Markov decision processes (MDPs). In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get `empirical value iteration' (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get `empirical policy iteration' (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. We introduce notions of probabilistic fixed points for such random monotone operators. We develop a stochastic dominance framework for convergence analysis of such operators. We then use this to give sample complexity bounds for both EVI and EPI. We then provide various variations and extensions to asynchronous empirical dynamic programming, the minimax empirical dynamic program, and show how this can also be used to solve the dynamic newsvendor problem. Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms. △ Less

Submitted 22 November, 2013; originally announced November 2013.

Comments: 34 Pages, 1 Figure

arXiv:1206.4568 [pdf, ps, other]

Stochastic dominance-constrained Markov decision processes

Authors: William B. Haskell, Rahul Jain

Abstract: We are interested in risk constraints for infinite horizon discrete time Markov decision processes (MDPs). Starting with average reward MDPs, we show that increasing concave stochastic dominance constraints on the empirical distribution of reward lead to linear constraints on occupation measures. The optimal policy for the resulting class of dominance-constrained MDPs is obtained by solving a line… ▽ More We are interested in risk constraints for infinite horizon discrete time Markov decision processes (MDPs). Starting with average reward MDPs, we show that increasing concave stochastic dominance constraints on the empirical distribution of reward lead to linear constraints on occupation measures. The optimal policy for the resulting class of dominance-constrained MDPs is obtained by solving a linear program. We compute the dual of this linear program to obtain average dynamic programming optimality equations that reflect the dominance constraint. In particular, a new pricing term appears in the optimality equations corresponding to the dominance constraint. We show that many types of stochastic orders can be used in place of the increasing concave stochastic order. We also carry out a parallel development for discounted reward MDPs with stochastic dominance constraints. The paper concludes with a portfolio optimization example. △ Less

Submitted 20 June, 2012; originally announced June 2012.

Showing 1–26 of 26 results for author: Haskell, W B