Search | arXiv e-print repository

From Optimization to Control: Quasi Policy Iteration

Authors: Mohammad Amin Sharifi Kolarijani, Peyman Mohajerin Esfahani

Abstract: Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms. In this paper, we make this analogy explicit across four problem classes with a unified solution characterization. This novel framework, in turn, allows for a systematic transformation of algorithms from one domain to the other. In particular, w… ▽ More Recent control algorithms for Markov decision processes (MDPs) have been designed using an implicit analogy with well-established optimization algorithms. In this paper, we make this analogy explicit across four problem classes with a unified solution characterization. This novel framework, in turn, allows for a systematic transformation of algorithms from one domain to the other. In particular, we identify equivalent optimization and control algorithms that have already been pointed out in the existing literature, but mostly in a scattered way. With this unifying framework in mind, we then exploit two linear structural constraints specific to MDPs for approximating the Hessian in a second-order-type algorithm from optimization, namely, Anderson mixing. This leads to a novel first-order control algorithm that modifies the standard value iteration (VI) algorithm by incorporating two new directions and adaptive step sizes. While the proposed algorithm, coined as quasi-policy iteration, has the same computational complexity as VI, it interestingly exhibits an empirical convergence behavior similar to policy iteration with a very low sensitivity to the discount factor. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2307.07357 [pdf, other]

Inverse Optimization for Routing Problems

Authors: Pedro Zattoni Scroccaro, Piet van Beek, Peyman Mohajerin Esfahani, Bilge Atasoy

Abstract: We propose a method for learning decision-makers' behavior in routing problems using Inverse Optimization (IO). The IO framework falls into the supervised learning category and builds on the premise that the target behavior is an optimizer of an unknown cost function. This cost function is to be learned through historical data, and in the context of routing problems, can be interpreted as the rout… ▽ More We propose a method for learning decision-makers' behavior in routing problems using Inverse Optimization (IO). The IO framework falls into the supervised learning category and builds on the premise that the target behavior is an optimizer of an unknown cost function. This cost function is to be learned through historical data, and in the context of routing problems, can be interpreted as the routing preferences of the decision-makers. In this view, the main contributions of this study are to propose an IO methodology with a hypothesis function, loss function, and stochastic first-order algorithm tailored to routing problems. We further test our IO approach in the Amazon Last Mile Routing Research Challenge, where the goal is to learn models that replicate the routing preferences of human drivers, using thousands of real-world routing examples. Our final IO-learned routing model achieves a score that ranks 2nd compared with the 48 models that qualified for the final round of the challenge. Our examples and results showcase the flexibility and real-world potential of the proposed IO methodology to learn from decision-makers' decisions in routing problems. △ Less

Submitted 18 June, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2306.03202 [pdf, other]

Nonlinear Distributionally Robust Optimization

Authors: Mohammed Rayyan Sheriff, Peyman Mohajerin Esfahani

Abstract: This article focuses on a class of distributionally robust optimization (DRO) problems where, unlike the growing body of the literature, the objective function is potentially nonlinear in the distribution. Existing methods to optimize nonlinear functions in probability space use the Frechet derivatives, which present both theoretical and computational challenges. Motivated by this, we propose an a… ▽ More This article focuses on a class of distributionally robust optimization (DRO) problems where, unlike the growing body of the literature, the objective function is potentially nonlinear in the distribution. Existing methods to optimize nonlinear functions in probability space use the Frechet derivatives, which present both theoretical and computational challenges. Motivated by this, we propose an alternative notion for the derivative and corresponding smoothness based on Gateaux (G)-derivative for generic risk measures. These concepts are explained via three running risk measure examples of variance, entropic risk, and risk on finite support sets. We then propose a G-derivative based Frank-Wolfe (FW) algorithm for generic nonlinear optimization problems in probability spaces and establish its convergence under the proposed notion of smoothness in a completely norm-independent manner. We use the set-up of the FW algorithm to devise a methodology to compute a saddle point of the nonlinear DRO problem. Finally, we validate our theoretical results on two cases of the entropic and variance risk measures in the context of portfolio selection problems. In particular, we analyze their regularity conditions and "sufficient statistic", compute the respective FW-oracle in various settings, and confirm the theoretical outcomes through numerical validation. △ Less

Submitted 9 June, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.07730 [pdf, other]

Learning in Inverse Optimization: Incenter Cost, Augmented Suboptimality Loss, and Algorithms

Authors: Pedro Zattoni Scroccaro, Bilge Atasoy, Peyman Mohajerin Esfahani

Abstract: In Inverse Optimization (IO), an expert agent solves an optimization problem parametric in an exogenous signal. From a learning perspective, the goal is to learn the expert's cost function given a dataset of signals and corresponding optimal actions. Motivated by the geometry of the IO set of consistent cost vectors, we introduce the "incenter" concept, a new notion akin to circumcenter recently p… ▽ More In Inverse Optimization (IO), an expert agent solves an optimization problem parametric in an exogenous signal. From a learning perspective, the goal is to learn the expert's cost function given a dataset of signals and corresponding optimal actions. Motivated by the geometry of the IO set of consistent cost vectors, we introduce the "incenter" concept, a new notion akin to circumcenter recently proposed by Besbes et al. (2023). Discussing the geometric and robustness interpretation of the incenter cost vector, we develop corresponding tractable convex reformulations, which are in contrast with the circumcenter, which we show is equivalent to an intractable optimization program. We further propose a novel loss function called Augmented Suboptimality Loss (ASL), a relaxation of the incenter concept for problems with inconsistent data. Exploiting the structure of the ASL, we propose a novel first-order algorithm, which we name Stochastic Approximate Mirror Descent. This algorithm combines stochastic and approximate subgradient evaluations, together with mirror descent update steps, which is provably efficient for the IO problems with discrete feasible sets with high cardinality. We implement the IO approaches developed in this paper as a Python package called InvOpt. Our numerical experiments are reproducible, and the underlying source code is available as examples in the InvOpt package. △ Less

Submitted 23 January, 2024; v1 submitted 12 May, 2023; originally announced May 2023.

arXiv:2212.01068 [pdf, other]

Fast Algorithm for Constrained Linear Inverse Problems

Authors: Mohammed Rayyan Sheriff, Floor Fenne Redel, Peyman Mohajerin Esfahani

Abstract: We consider the constrained Linear Inverse Problem (LIP), where a certain atomic norm (like the $\ell_1 $ norm) is minimized subject to a quadratic constraint. Typically, such cost functions are non-differentiable which makes them not amenable to the fast optimization methods existing in practice. We propose two equivalent reformulations of the constrained LIP with improved convex regularity: (i)… ▽ More We consider the constrained Linear Inverse Problem (LIP), where a certain atomic norm (like the $\ell_1 $ norm) is minimized subject to a quadratic constraint. Typically, such cost functions are non-differentiable which makes them not amenable to the fast optimization methods existing in practice. We propose two equivalent reformulations of the constrained LIP with improved convex regularity: (i) a smooth convex minimization problem, and (ii) a strongly convex min-max problem. These problems could be solved by applying existing acceleration-based convex optimization methods which provide better $ O \left( \frac{1}{k^2} \right) $ theoretical convergence guarantee, improving upon the current best rate of $ O \left( \frac{1}{k} \right) $. We also provide a novel algorithm named the Fast Linear Inverse Problem Solver (FLIPS), which is tailored to maximally exploit the structure of the reformulations. We demonstrate the performance of FLIPS on the classical problems of Binary Selection, Compressed Sensing, and Image Denoising. We also provide open source \texttt{MATLAB} package for these three examples, which can be easily adapted to other LIPs. △ Less

Submitted 24 January, 2024; v1 submitted 2 December, 2022; originally announced December 2022.

arXiv:2205.00446 [pdf, ps, other]

Adaptive Composite Online Optimization: Predictions in Static and Dynamic Environments

Authors: Pedro Zattoni Scroccaro, Arman Sharifi Kolarijani, Peyman Mohajerin Esfahani

Abstract: In the past few years, Online Convex Optimization (OCO) has received notable attention in the control literature thanks to its flexible real-time nature and powerful performance guarantees. In this paper, we propose new step-size rules and OCO algorithms that simultaneously exploit gradient predictions, function predictions and dynamics, features particularly pertinent to control applications. The… ▽ More In the past few years, Online Convex Optimization (OCO) has received notable attention in the control literature thanks to its flexible real-time nature and powerful performance guarantees. In this paper, we propose new step-size rules and OCO algorithms that simultaneously exploit gradient predictions, function predictions and dynamics, features particularly pertinent to control applications. The proposed algorithms enjoy static and dynamic regret bounds in terms of the dynamics of the reference action sequence, gradient prediction error, and function prediction error, which are generalizations of known regularity measures from the literature. We present results for both convex and strongly convex costs. We validate the performance of the proposed algorithms in a trajectory tracking case study, as well as portfolio optimization using real-world datasets. △ Less

Submitted 14 January, 2023; v1 submitted 1 May, 2022; originally announced May 2022.

arXiv:2105.12022 [pdf, other]

Principal Component Hierarchy for Sparse Quadratic Programs

Authors: Robbie Vreugdenhil, Viet Anh Nguyen, Armin Eftekhari, Peyman Mohajerin Esfahani

Abstract: We propose a novel approximation hierarchy for cardinality-constrained, convex quadratic programs that exploits the rank-dominating eigenvectors of the quadratic matrix. Each level of approximation admits a min-max characterization whose objective function can be optimized over the binary variables analytically, while preserving convexity in the continuous variables. Exploiting this property, we p… ▽ More We propose a novel approximation hierarchy for cardinality-constrained, convex quadratic programs that exploits the rank-dominating eigenvectors of the quadratic matrix. Each level of approximation admits a min-max characterization whose objective function can be optimized over the binary variables analytically, while preserving convexity in the continuous variables. Exploiting this property, we propose two scalable optimization algorithms, coined as the "best response" and the "dual program", that can efficiently screen the potential indices of the nonzero elements of the original program. We show that the proposed methods are competitive with the existing screening methods in the current sparse regression literature, and it is particularly fast on instances with high number of measurements in experiments with both synthetic and real datasets. △ Less

Submitted 25 May, 2021; originally announced May 2021.

Journal ref: ICML 2021

arXiv:2101.02776 [pdf, other]

The Nonconvex Geometry of Linear Inverse Problems

Authors: Armin Eftekhari, Peyman Mohajerin Esfahani

Abstract: The gauge function, closely related to the atomic norm, measures the complexity of a statistical model, and has found broad applications in machine learning and statistical signal processing. In a high-dimensional learning problem, the gauge function attempts to safeguard against overfitting by promoting a sparse (concise) representation within the learning alphabet. In this work, within the con… ▽ More The gauge function, closely related to the atomic norm, measures the complexity of a statistical model, and has found broad applications in machine learning and statistical signal processing. In a high-dimensional learning problem, the gauge function attempts to safeguard against overfitting by promoting a sparse (concise) representation within the learning alphabet. In this work, within the context of linear inverse problems, we pinpoint the source of its success, but also argue that the applicability of the gauge function is inherently limited by its convexity, and showcase several learning problems where the classical gauge function theory fails. We then introduce a new notion of statistical complexity, gauge$_p$ function, which overcomes the limitations of the gauge function. The gauge$_p$ function is a simple generalization of the gauge function that can tightly control the sparsity of a statistical model within the learning alphabet and, perhaps surprisingly, draws further inspiration from the Burer-Monteiro factorization in computational mathematics. We also propose a new learning machine, with the building block of gauge$_p$ function, and arm this machine with a number of statistical guarantees. The potential of the proposed gauge$_p$ function theory is then studied for two stylized applications. Finally, we discuss the computational aspects and, in particular, suggest a tractable numerical algorithm for implementing the new learning machine. △ Less

Submitted 9 March, 2022; v1 submitted 7 January, 2021; originally announced January 2021.

arXiv:2008.04477 [pdf, other]

doi 10.1109/CDC.2018.8619460

Security Versus Privacy

Authors: Farhad Farokhi, Peyman Mohajerin Esfahani

Abstract: Linear queries can be submitted to a server containing private data. The server provides a response to the queries systematically corrupted using an additive noise to preserve the privacy of those whose data is stored on the server. The measure of privacy is inversely proportional to the trace of the Fisher information matrix. It is assumed that an adversary can inject a false bias to the response… ▽ More Linear queries can be submitted to a server containing private data. The server provides a response to the queries systematically corrupted using an additive noise to preserve the privacy of those whose data is stored on the server. The measure of privacy is inversely proportional to the trace of the Fisher information matrix. It is assumed that an adversary can inject a false bias to the responses. The measure of the security, capturing the ease of detecting the presence of the false data injection, is the sensitivity of the Kullback-Leiber divergence to the additive bias. An optimization problem for balancing privacy and security is proposed and subsequently solved. It is shown that the level of guaranteed privacy times the level of security equals a constant. Therefore, by increasing the level of privacy, the security guarantees can only be weakened and vice versa. Similar results are developed under the differential privacy framework. △ Less

Submitted 10 August, 2020; originally announced August 2020.

Journal ref: 2018 IEEE Conference on Decision and Control (CDC)

arXiv:2004.13927 [pdf, other]

Dynamic Anomaly Detection with High-fidelity Simulators: A Convex Optimization Approach

Authors: Kaikai Pan, Peter Palensky, Peyman Mohajerin Esfahani

Abstract: The main objective of this article is to develop scalable dynamic anomaly detectors when high-fidelity simulators of power systems are at our disposal. On the one hand, mathematical models of these high-fidelity simulators are typically "intractable" to apply existing model-based approaches. On the other hand, pure data-driven methods developed primarily in the machine learning literature neglect… ▽ More The main objective of this article is to develop scalable dynamic anomaly detectors when high-fidelity simulators of power systems are at our disposal. On the one hand, mathematical models of these high-fidelity simulators are typically "intractable" to apply existing model-based approaches. On the other hand, pure data-driven methods developed primarily in the machine learning literature neglect our knowledge about the underlying dynamics of the systems. In this study, we combine tools from these two mainstream approaches to develop a diagnosis filter that utilizes the knowledge of both the dynamical system as well as the simulation data of the high-fidelity simulators. The proposed diagnosis filter aims to achieve two desired features: (i) performance robustness with respect to model mismatch; (ii) high scalability. To this end, we propose a tractable (convex) optimization-based reformulation in which decisions are the filter parameters, the model-based information introduces feasible sets, and the data from the simulator forms the objective function to-be-minimized regarding the effect of model mismatch on the filter performance. To validate the theoretical results, we implement the developed diagnosis filter in DIgSILENT PowerFactory to detect false data injection attacks on the Automatic Generation Control measurements in the three-area IEEE 39-bus system. △ Less

Submitted 6 October, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

Comments: 19 pages

arXiv:1911.03539 [pdf, other]

Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization

Authors: Viet Anh Nguyen, Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, Peyman Mohajerin Esfahani

Abstract: We introduce a distributionally robust minimium mean square error estimation model with a Wasserstein ambiguity set to recover an unknown signal from a noisy observation. The proposed model can be viewed as a zero-sum game between a statistician choosing an estimator -- that is, a measurable function of the observation -- and a fictitious adversary choosing a prior -- that is, a pair of signal and… ▽ More We introduce a distributionally robust minimium mean square error estimation model with a Wasserstein ambiguity set to recover an unknown signal from a noisy observation. The proposed model can be viewed as a zero-sum game between a statistician choosing an estimator -- that is, a measurable function of the observation -- and a fictitious adversary choosing a prior -- that is, a pair of signal and noise distributions ranging over independent Wasserstein balls -- with the goal to minimize and maximize the expected squared estimation error, respectively. We show that if the Wasserstein balls are centered at normal distributions, then the zero-sum game admits a Nash equilibrium, where the players' optimal strategies are given by an {\em affine} estimator and a {\em normal} prior, respectively. We further prove that this Nash equilibrium can be computed by solving a tractable convex program. Finally, we develop a Frank-Wolfe algorithm that can solve this convex program orders of magnitude faster than state-of-the-art general purpose solvers. We show that this algorithm enjoys a linear convergence rate and that its direction-finding subproblems can be solved in quasi-closed form. △ Less

Submitted 27 January, 2021; v1 submitted 8 November, 2019; originally announced November 2019.

arXiv:1908.08729 [pdf, other]

Wasserstein Distributionally Robust Optimization: Theory and Applications in Machine Learning

Authors: Daniel Kuhn, Peyman Mohajerin Esfahani, Viet Anh Nguyen, Soroosh Shafieezadeh-Abadeh

Abstract: Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the s… ▽ More Many decision problems in science, engineering and economics are affected by uncertain parameters whose distribution is only indirectly observable through samples. The goal of data-driven decision-making is to learn a decision from finitely many training samples that will perform well on unseen test samples. This learning task is difficult even if all training and test samples are drawn from the same distribution---especially if the dimension of the uncertainty is large relative to the training sample size. Wasserstein distributionally robust optimization seeks data-driven decisions that perform well under the most adverse distribution within a certain Wasserstein distance from a nominal distribution constructed from the training samples. In this tutorial we will argue that this approach has many conceptual and computational benefits. Most prominently, the optimal decisions can often be computed by solving tractable convex optimization problems, and they enjoy rigorous out-of-sample and asymptotic consistency guarantees. We will also show that Wasserstein distributionally robust optimization has interesting ramifications for statistical learning and motivates new approaches for fundamental learning tasks such as classification, regression, maximum likelihood estimation or minimum mean square error estimation, among others. △ Less

Submitted 23 August, 2019; originally announced August 2019.

Comments: 36 pages

arXiv:1905.13547 [pdf, other]

Learning robust control for LQR systems with multiplicative noise via policy gradient

Authors: Benjamin Gravell, Peyman Mohajerin Esfahani, Tyler Summers

Abstract: The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and vari… ▽ More The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients. △ Less

Submitted 1 May, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

arXiv:1809.08830 [pdf, other]

Wasserstein Distributionally Robust Kalman Filtering

Authors: Soroosh Shafieezadeh-Abadeh, Viet Anh Nguyen, Daniel Kuhn, Peyman Mohajerin Esfahani

Abstract: We study a distributionally robust mean square error estimation problem over a nonconvex Wasserstein ambiguity set containing only normal distributions. We show that the optimal estimator and the least favorable distribution form a Nash equilibrium. Despite the non-convex nature of the ambiguity set, we prove that the estimation problem is equivalent to a tractable convex program. We further devis… ▽ More We study a distributionally robust mean square error estimation problem over a nonconvex Wasserstein ambiguity set containing only normal distributions. We show that the optimal estimator and the least favorable distribution form a Nash equilibrium. Despite the non-convex nature of the ambiguity set, we prove that the estimation problem is equivalent to a tractable convex program. We further devise a Frank-Wolfe algorithm for this convex program whose direction-searching subproblem can be solved in a quasi-closed form. Using these ingredients, we introduce a distributionally robust Kalman filter that hedges against model risk. △ Less

Submitted 1 October, 2018; v1 submitted 24 September, 2018; originally announced September 2018.

arXiv:1808.09271 [pdf, other]

Distance Based Source Domain Selection for Sentiment Classification

Authors: Lex Razoux Schultz, Marco Loog, Peyman Mohajerin Esfahani

Abstract: Automated sentiment classification (SC) on short text fragments has received increasing attention in recent years. Performing SC on unseen domains with few or no labeled samples can significantly affect the classification performance due to different expression of sentiment in source and target domain. In this study, we aim to mitigate this undesired impact by proposing a methodology based on a pr… ▽ More Automated sentiment classification (SC) on short text fragments has received increasing attention in recent years. Performing SC on unseen domains with few or no labeled samples can significantly affect the classification performance due to different expression of sentiment in source and target domain. In this study, we aim to mitigate this undesired impact by proposing a methodology based on a predictive measure, which allows us to select an optimal source domain from a set of candidates. The proposed measure is a linear combination of well-known distance functions between probability distributions supported on the source and target domains (e.g. Earth Mover's distance and Kullback-Leibler divergence). The performance of the proposed methodology is validated through an SC case study in which our numerical experiments suggest a significant improvement in the cross domain classification error in comparison with a random selected source domain for both a naive and adaptive learning setting. In the case of more heterogeneous datasets, the predictability feature of the proposed model can be utilized to further select a subset of candidate domains, where the corresponding classifier outperforms the one trained on all available source domains. This observation reinforces a hypothesis that our proposed model may also be deployed as a means to filter out redundant information during a training phase of SC. △ Less

Submitted 28 August, 2018; originally announced August 2018.

arXiv:1710.10016 [pdf, other]

Regularization via Mass Transportation

Authors: Soroosh Shafieezadeh-Abadeh, Daniel Kuhn, Peyman Mohajerin Esfahani

Abstract: The goal of regression and classification methods in supervised learning is to minimize the empirical risk, that is, the expectation of some loss function quantifying the prediction error under the empirical distribution. When facing scarce training data, overfitting is typically mitigated by adding regularization terms to the objective that penalize hypothesis complexity. In this paper we introdu… ▽ More The goal of regression and classification methods in supervised learning is to minimize the empirical risk, that is, the expectation of some loss function quantifying the prediction error under the empirical distribution. When facing scarce training data, overfitting is typically mitigated by adding regularization terms to the objective that penalize hypothesis complexity. In this paper we introduce new regularization techniques using ideas from distributionally robust optimization, and we give new probabilistic interpretations to existing techniques. Specifically, we propose to minimize the worst-case expected loss, where the worst case is taken over the ball of all (continuous or discrete) distributions that have a bounded transportation distance from the (discrete) empirical distribution. By choosing the radius of this ball judiciously, we can guarantee that the worst-case expected loss provides an upper confidence bound on the loss on test data, thus offering new generalization bounds. We prove that the resulting regularized learning problems are tractable and can be tractably kernelized for many popular loss functions. We validate our theoretical out-of-sample guarantees through simulated and empirical experiments. △ Less

Submitted 12 July, 2019; v1 submitted 27 October, 2017; originally announced October 2017.

arXiv:1708.07311 [pdf, ps, other]

Generalized maximum entropy estimation

Authors: Tobias Sutter, David Sutter, Peyman Mohajerin Esfahani, John Lygeros

Abstract: We consider the problem of estimating a probability distribution that maximizes the entropy while satisfying a finite number of moment constraints, possibly corrupted by noise. Based on duality of convex programming, we present a novel approximation scheme using a smoothed fast gradient method that is equipped with explicit bounds on the approximation error. We further demonstrate how the presente… ▽ More We consider the problem of estimating a probability distribution that maximizes the entropy while satisfying a finite number of moment constraints, possibly corrupted by noise. Based on duality of convex programming, we present a novel approximation scheme using a smoothed fast gradient method that is equipped with explicit bounds on the approximation error. We further demonstrate how the presented scheme can be used for approximating the chemical master equation through the zero-information moment closure method, and for an approximate dynamic programming approach in the context of constrained Markov decision processes with uncountable state and action spaces. △ Less

Submitted 8 September, 2019; v1 submitted 24 August, 2017; originally announced August 2017.

Comments: 29 pages, 3 figures; v2: approximate dynamic programming section added, v3: published version

Report number: http://jmlr.org/papers/v20/17-486.html MSC Class: 94A17; 90C25; 90C34; 65K05

Journal ref: Journal of Machine Learning Research, vol 20, 2019

arXiv:1704.04118 [pdf, other]

From Data to Decisions: Distributionally Robust Optimization is Optimal

Authors: Bart P. G. Van Parys, Peyman Mohajerin Esfahani, Daniel Kuhn

Abstract: We study stochastic programs where the decision-maker cannot observe the distribution of the exogenous uncertainties but has access to a finite set of independent samples from this distribution. In this setting, the goal is to find a procedure that transforms the data to an estimate of the expected cost function under the unknown data-generating distribution, i.e., a predictor, and an optimizer of… ▽ More We study stochastic programs where the decision-maker cannot observe the distribution of the exogenous uncertainties but has access to a finite set of independent samples from this distribution. In this setting, the goal is to find a procedure that transforms the data to an estimate of the expected cost function under the unknown data-generating distribution, i.e., a predictor, and an optimizer of the estimated cost function that serves as a near-optimal candidate decision, i.e., a prescriptor. As functions of the data, predictors and prescriptors constitute statistical estimators. We propose a meta-optimization problem to find the least conservative predictors and prescriptors subject to constraints on their out-of-sample disappointment. The out-of-sample disappointment quantifies the probability that the actual expected cost of the candidate decision under the unknown true distribution exceeds its predicted cost. Leveraging tools from large deviations theory, we prove that this meta-optimization problem admits a unique solution: The best predictor-prescriptor pair is obtained by solving a distributionally robust optimization problem over all distributions within a given relative entropy distance from the empirical distribution of the data. △ Less

Submitted 22 December, 2019; v1 submitted 13 April, 2017; originally announced April 2017.

arXiv:1510.04214 [pdf, other]

LQG Control with Minimum Directed Information: Semidefinite Programming Approach

Authors: Takashi Tanaka, Peyman Mohajerin Esfahani, Sanjoy K. Mitter

Abstract: We consider a discrete-time Linear-Quadratic-Gaussian (LQG) control problem in which Massey's directed information from the observed output of the plant to the control input is minimized while required control performance is attainable. This problem arises in several different contexts, including joint encoder and controller design for data-rate minimization in networked control systems. We show t… ▽ More We consider a discrete-time Linear-Quadratic-Gaussian (LQG) control problem in which Massey's directed information from the observed output of the plant to the control input is minimized while required control performance is attainable. This problem arises in several different contexts, including joint encoder and controller design for data-rate minimization in networked control systems. We show that the optimal control law is a Linear-Gaussian randomized policy. We also identify the state space realization of the optimal policy, which can be synthesized by an efficient algorithm based on semidefinite programming. Our structural result indicates that the filter-controller separation principle from the LQG control theory, and the sensor-filter separation principle from the zero-delay rate-distortion theory for Gauss-Markov sources hold simultaneously in the considered problem. A connection to the data-rate theorem for mean-square stability by Nair and Evans is also established. △ Less

Submitted 10 June, 2017; v1 submitted 14 October, 2015; originally announced October 2015.

arXiv:1407.8202 [pdf, ps, other]

doi 10.1109/TIT.2015.2503755

Efficient Approximation of Quantum Channel Capacities

Authors: David Sutter, Tobias Sutter, Peyman Mohajerin Esfahani, Renato Renner

Abstract: We propose an iterative method for approximating the capacity of classical-quantum channels with a discrete input alphabet and a finite dimensional output, possibly under additional constraints on the input distribution. Based on duality of convex programming, we derive explicit upper and lower bounds for the capacity. To provide an $\varepsilon$-close estimate to the capacity, the presented algor… ▽ More We propose an iterative method for approximating the capacity of classical-quantum channels with a discrete input alphabet and a finite dimensional output, possibly under additional constraints on the input distribution. Based on duality of convex programming, we derive explicit upper and lower bounds for the capacity. To provide an $\varepsilon$-close estimate to the capacity, the presented algorithm requires $O(\tfrac{(N \vee M) M^3 \log(N)^{1/2}}{\varepsilon})$, where $N$ denotes the input alphabet size and $M$ the output dimension. We then generalize the method for the task of approximating the capacity of classical-quantum channels with a bounded continuous input alphabet and a finite dimensional output. For channels with a finite dimensional quantum mechanical input and output, the idea of a universal encoder allows us to approximate the Holevo capacity using the same method. In particular, we show that the problem of approximating the Holevo capacity can be reduced to a multidimensional integration problem. For families of quantum channels fulfilling a certain assumption we show that the complexity to derive an $\varepsilon$-close solution to the Holevo capacity is subexponential or even polynomial in the problem size. We provide several examples to illustrate the performance of the approximation scheme in practice. △ Less

Submitted 30 July, 2014; originally announced July 2014.

Comments: 36 pages, 1 figure

Journal ref: IEEE Transactions on Information Theory vol. 62, no 1, pages 578-598, 2016

arXiv:1407.7629 [pdf, other]

doi 10.1109/TIT.2015.2401002

Efficient Approximation of Channel Capacities

Authors: Tobias Sutter, David Sutter, Peyman Mohajerin Esfahani, John Lygeros

Abstract: We propose an iterative method for approximately computing the capacity of discrete memoryless channels, possibly under additional constraints on the input distribution. Based on duality of convex programming, we derive explicit upper and lower bounds for the capacity. The presented method requires $O(M^2 N \sqrt{\log N}/\varepsilon)$ to provide an estimate of the capacity to within $\varepsilon$,… ▽ More We propose an iterative method for approximately computing the capacity of discrete memoryless channels, possibly under additional constraints on the input distribution. Based on duality of convex programming, we derive explicit upper and lower bounds for the capacity. The presented method requires $O(M^2 N \sqrt{\log N}/\varepsilon)$ to provide an estimate of the capacity to within $\varepsilon$, where $N$ and $M$ denote the input and output alphabet size; a single iteration has a complexity $O(M N)$. We also show how to approximately compute the capacity of memoryless channels having a bounded continuous input alphabet and a countable output alphabet under some mild assumptions on the decay rate of the channel's tail. It is shown that discrete-time Poisson channels fall into this problem class. As an example, we compute sharp upper and lower bounds for the capacity of a discrete-time Poisson channel with a peak-power input constraint. △ Less

Submitted 3 April, 2015; v1 submitted 29 July, 2014; originally announced July 2014.

Comments: 32 pages, 3 figures, revised version

MSC Class: 94A15; 90C25

Journal ref: IEEE Transactions on Information Theory vol. 61, no 4, pages 1649-1666, 2015

Showing 1–21 of 21 results for author: Esfahani, P M