Search | arXiv e-print repository

arXiv:2402.19212 [pdf, ps, other]

Deep Reinforcement Learning: A Convex Optimization Approach

Abstract: In this paper, we consider reinforcement learning of nonlinear systems with continuous state and action spaces. We present an episodic learning algorithm, where we for each episode use convex optimization to find a two-layer neural network approximation of the optimal $Q$-function. The convex optimization approach guarantees that the weights calculated at each episode are optimal, with respect to… ▽ More In this paper, we consider reinforcement learning of nonlinear systems with continuous state and action spaces. We present an episodic learning algorithm, where we for each episode use convex optimization to find a two-layer neural network approximation of the optimal $Q$-function. The convex optimization approach guarantees that the weights calculated at each episode are optimal, with respect to the given sampled states and actions of the current episode. For stable nonlinear systems, we show that the algorithm converges and that the converging parameters of the trained neural network can be made arbitrarily close to the optimal neural network parameters. In particular, if the regularization parameter in the training phase is given by $ρ$, then the parameters of the trained neural network converge to $w$, where the distance between $w$ and the optimal parameters $w^\star$ is bounded by $\mathcal{O}(ρ)$. That is, when the number of episodes goes to infinity, there exists a constant $C$ such that \[ \|w-w^\star\| \le Cρ. \] In particular, our algorithm converges arbitrarily close to the optimal neural network parameters as the regularization parameter goes to zero. As a consequence, our algorithm converges fast due to the polynomial-time convergence of convex optimization algorithms. △ Less

Submitted 24 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2301.11802 [pdf, ps, other]

Decentralized Online Bandit Optimization on Directed Graphs with Regret Bounds

Authors: Johan Östman, Ather Gattami, Daniel Gillblad

Abstract: We consider a decentralized multiplayer game, played over $T$ rounds, with a leader-follower hierarchy described by a directed acyclic graph. For each round, the graph structure dictates the order of the players and how players observe the actions of one another. By the end of each round, all players receive a joint bandit-reward based on their joint action that is used to update the player strate… ▽ More We consider a decentralized multiplayer game, played over $T$ rounds, with a leader-follower hierarchy described by a directed acyclic graph. For each round, the graph structure dictates the order of the players and how players observe the actions of one another. By the end of each round, all players receive a joint bandit-reward based on their joint action that is used to update the player strategies towards the goal of minimizing the joint pseudo-regret. We present a learning algorithm inspired by the single-player multi-armed bandit problem and show that it achieves sub-linear joint pseudo-regret in the number of rounds for both adversarial and stochastic bandit rewards. Furthermore, we quantify the cost incurred due to the decentralized nature of our problem compared to the centralized setting. △ Less

Submitted 27 January, 2023; originally announced January 2023.

arXiv:2212.11567 [pdf, other]

Learning Team Decisions

Authors: Olle Kjellqvist, Ather Gattami

Abstract: In this paper, we treat linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over $T$ time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions map** the state of nature to their… ▽ More In this paper, we treat linear quadratic team decision problems, where a team of agents minimizes a convex quadratic cost function over $T$ time steps subject to possibly distinct linear measurements of the state of nature. We assume that the state of nature is a Gaussian random variable and that the agents do not know the cost function nor the linear functions map** the state of nature to their measurements. We present a gradient-descent based algorithm with an expected regret of $O(\log(T))$ for full information gradient feedback and $O(\sqrt(T))$ for bandit feedback. In the case of bandit feedback, the expected regret has an additional multiplicative term $O(d)$ where $d$ reflects the number of learned parameters. △ Less

Submitted 22 December, 2022; originally announced December 2022.

Comments: Accepted and presented at IEEE CDC 2022. A few typos have been corrected

arXiv:2006.05961

Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

Authors: Qinbo Bai, Vaneet Aggarwal, Ather Gattami

Abstract: In the optimization of dynamical systems, the variables typically have constraints. Such problems can be modeled as a constrained Markov Decision Process (CMDP). This paper considers a model-free approach to the problem, where the transition probabilities are not known. In the presence of long-term (or average) constraints, the agent has to choose a policy that maximizes the long-term average rewa… ▽ More In the optimization of dynamical systems, the variables typically have constraints. Such problems can be modeled as a constrained Markov Decision Process (CMDP). This paper considers a model-free approach to the problem, where the transition probabilities are not known. In the presence of long-term (or average) constraints, the agent has to choose a policy that maximizes the long-term average reward as well as satisfy the average constraints in each episode. The key challenge with the long-term constraints is that the optimal policy is not deterministic in general, and thus standard Q-learning approaches cannot be directly used. This paper uses concepts from constrained optimization and Q-learning to propose an algorithm for CMDP with long-term constraints. For any $γ\in(0,\frac{1}{2})$, the proposed algorithm is shown to achieve $O(T^{1/2+γ})$ regret bound for the obtained reward and $O(T^{1-γ/2})$ regret bound for the constraint violation, where $T$ is the total number of steps. We note that these are the first results on regret analysis for MDP with long-term constraints, where the transition probabilities are not known apriori. △ Less

Submitted 30 January, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

Comments: The result has error

arXiv:2003.05555 [pdf, other]

Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints

Authors: Qinbo Bai, Vaneet Aggarwal, Ather Gattami

Abstract: In the optimization of dynamic systems, the variables typically have constraints. Such problems can be modeled as a Constrained Markov Decision Process (CMDP). This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1. We propose a model… ▽ More In the optimization of dynamic systems, the variables typically have constraints. Such problems can be modeled as a Constrained Markov Decision Process (CMDP). This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1. We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied. We define the concept of probably approximately correct (PAC) to the proposed PCMDP problem. The proposed algorithm is proved to achieve an $(ε,p)$-PAC policy when the episode $K\geqΩ(\frac{I^2H^6SA\ell}{ε^2})$, where $S$ and $A$ are the number of states and actions, respectively. $H$ is the number of epochs per episode. $I$ is the number of constraint functions, and $\ell=\log(\frac{SAT}{p})$. We note that this is the first result on PAC kind of analysis for PCMDP with peak constraints, where the transition dynamics are not known apriori. We demonstrate the proposed algorithm on an energy harvesting problem and a single machine scheduling problem, where it performs close to the theoretical upper bound of the studied optimization problem. △ Less

Submitted 13 June, 2022; v1 submitted 11 March, 2020; originally announced March 2020.

arXiv:2002.07638 [pdf, other]

Conditional Mutual information-based Contrastive Loss for Financial Time Series Forecasting

Authors: Hanwei Wu, Ather Gattami, Markus Flierl

Abstract: We present a representation learning framework for financial time series forecasting. One challenge of using deep learning models for finance forecasting is the shortage of available training data when using small datasets. Direct trend classification using deep neural networks trained on small datasets is susceptible to the overfitting problem. In this paper, we propose to first learn compact rep… ▽ More We present a representation learning framework for financial time series forecasting. One challenge of using deep learning models for finance forecasting is the shortage of available training data when using small datasets. Direct trend classification using deep neural networks trained on small datasets is susceptible to the overfitting problem. In this paper, we propose to first learn compact representations from time series data, then use the learned representations to train a simpler model for predicting time series movements. We consider a class-conditioned latent variable model. We train an encoder network to maximize the mutual information between the latent variables and the trend information conditioned on the encoded observed variables. We show that conditional mutual information maximization can be approximated by a contrastive loss. Then, the problem is transformed into a classification task of determining whether two encoded representations are sampled from the same class or not. This is equivalent to performing pairwise comparisons of the training datapoints, and thus, improves the generalization ability of the encoder network. We use deep autoregressive models as our encoder to capture long-term dependencies of the sequence data. Empirical experiments indicate that our proposed method has the potential to advance state-of-the-art performance. △ Less

Submitted 7 May, 2021; v1 submitted 18 February, 2020; originally announced February 2020.

Comments: Published in ICAIF 2020 : ACM International Conference on AI in Finance

arXiv:1901.08978 [pdf, other]

Reinforcement Learning for Multi-Objective and Constrained Markov Decision Processes

Authors: Ather Gattami, Qinbo Bai, Vaneet Agarwal

Abstract: In this paper, we consider the problem of optimization and learning for constrained and multi-objective Markov decision processes, for both discounted rewards and expected average rewards. We formulate the problems as zero-sum games where one player (the agent) solves a Markov decision problem and its opponent solves a bandit optimization problem, which we here call Markov-Bandit games. We extend… ▽ More In this paper, we consider the problem of optimization and learning for constrained and multi-objective Markov decision processes, for both discounted rewards and expected average rewards. We formulate the problems as zero-sum games where one player (the agent) solves a Markov decision problem and its opponent solves a bandit optimization problem, which we here call Markov-Bandit games. We extend Q-learning to solve Markov-Bandit games and show that our new Q-learning algorithms converge to the optimal solutions of the zero-sum Markov-Bandit games, and hence converge to the optimal solutions of the constrained and multi-objective Markov decision problems. We provide a numerical example where we calculate the optimal policies and show by simulations that the algorithm converges to the calculated optimal policies. To the best of our knowledge, this is the first time learning algorithms guarantee convergence to optimal stationary policies for the constrained MDP problem with discounted and expected average rewards, respectively. △ Less

Submitted 4 March, 2021; v1 submitted 23 January, 2019; originally announced January 2019.

Comments: arXiv admin note: substantial text overlap with arXiv:1901.07839

arXiv:1901.07839 [pdf, ps, other]

Reinforcement Learning of Markov Decision Processes with Peak Constraints

Authors: Ather Gattami

Abstract: In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take actions based on the observed states, reward outputs, and constraint-outputs, without any knowledge about the dynamics, reward functions, and/or the knowledge o… ▽ More In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take actions based on the observed states, reward outputs, and constraint-outputs, without any knowledge about the dynamics, reward functions, and/or the knowledge of the constraint-functions. We introduce a game theoretic approach to construct reinforcement learning algorithms where the agent maximizes an unconstrained objective that depends on the simulated action of the minimizing opponent which acts on a finite set of actions and the output data of the constraint functions (rewards). We show that the policies obtained from maximin Q-learning converge to the optimal policies. To the best of our knowledge, this is the first time learning algorithms guarantee convergence to optimal stationary policies for the MDP problem with peak constraints for both discounted and expected average rewards. △ Less

Submitted 6 December, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

arXiv:1605.04579 [pdf, other]

Communicating One Bit over a Delay Constrained Gaussian MIMO Channel with Feedback

Authors: Bo Bernhardsson, Ather Gattami

Abstract: The energy-optimal scheme is found for communicating one bit over a memoryless Gaussian channel with an ideal feedback channel. It is assumed that the channel is allowed to be used at most N times before decoding. The optimal coding/decoding strategy is derived by dynamic programming. It is found that feedback gives a significant performance gain and that the optimal strategies are discontinuous.… ▽ More The energy-optimal scheme is found for communicating one bit over a memoryless Gaussian channel with an ideal feedback channel. It is assumed that the channel is allowed to be used at most N times before decoding. The optimal coding/decoding strategy is derived by dynamic programming. It is found that feedback gives a significant performance gain and that the optimal strategies are discontinuous. It is also shown that most of the performance increase can be obtained even with a one-bit feedback channel. The optimal scheme is compared with the strategy by Kailath-Schalkwijk and is found to be significantly more effective. For the case of a diagonal MIMO channel where measurement noise variances are equal along the sub channels we also show that the problem can be reduced to the previous case of transmitting one bit over a scalar feedback channel. △ Less

Submitted 15 May, 2016; originally announced May 2016.

Comments: Submitted for publication

arXiv:1511.06866 [pdf, other]

Feedback Capacity of Gaussian Channels Revisited

Authors: Ather Gattami

Abstract: In this paper, we revisit the problem of finding the average capacity of the Gaussian feedback channel. First, we consider the problem of finding the average capacity of the analog Gaussian noise channel where the noise has an arbitrary spectral density. We introduce a new approach to the problem where we solve the problem over a finite number of transmissions and then consider the limit of an inf… ▽ More In this paper, we revisit the problem of finding the average capacity of the Gaussian feedback channel. First, we consider the problem of finding the average capacity of the analog Gaussian noise channel where the noise has an arbitrary spectral density. We introduce a new approach to the problem where we solve the problem over a finite number of transmissions and then consider the limit of an infinite number of transmissions. Further, we consider the important special case of stationary Gaussian noise with finite memory. We show that the channel capacity at stationarity can be found by solving a semi-definite program, and hence computationally tractable. We also give new proofs and results of the non stationary solution which bridges the gap between results in the literature for the stationary and non stationary feedback channel capacities. It's shown that a linear communication feedback strategy is optimal. Similar to the solution of the stationary problem, it's shown that the optimal linear strategy is to transmit a linear combination of the information symbols to be communicated and the innovations for the estimation error of the state of the noise process. △ Less

Submitted 23 January, 2019; v1 submitted 21 November, 2015; originally announced November 2015.

arXiv:1506.00777 [pdf, other]

Team Decision Problems with Convex Quadratic Constraints

Authors: Ather Gattami

Abstract: In this paper, we consider linear quadratic team problems with an arbitrary number of quadratic constraints in both stochastic and deterministic settings. The team consists of players with different measurements about the state of nature. The objective of the team is to minimize a quadratic cost subject to additional finite number of quadratic constraints. We first consider the problem of countabl… ▽ More In this paper, we consider linear quadratic team problems with an arbitrary number of quadratic constraints in both stochastic and deterministic settings. The team consists of players with different measurements about the state of nature. The objective of the team is to minimize a quadratic cost subject to additional finite number of quadratic constraints. We first consider the problem of countably infinite number of players in the team for a bounded state of nature with a Gaussian distribution and show that linear decisions are optimal. Then, we consider the problem of team decision problems with additional convex quadratic constraints and show that linear decisions are optimal for both the finite and infinite number of players in the team. For the finite player case, the optimal linear decisions can be found by solving a semidefinite program. Finally, we consider the problem of minimizing a quadratic objective for the worst case scenario, subject to an arbitrary number of deterministic quadratic constraints. We show that linear decisions are optimal and can be found by solving a semidefinite program. Finally, we apply the developed theory on dynamic team decision problems in linear quadratic settings. △ Less

Submitted 2 June, 2015; originally announced June 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1209.2551

arXiv:1506.00484 [pdf, other]

Optimal Communication of States of Dynamical Systems over Gaussian Channels with Noisy Feedback: The Scalar Case

Authors: Ather Gattami

Abstract: We consider the problem of communicating the state of a dynamical system via a Shannon Gaussian channel. The receiver, which acts as both a decoder and estimator, observes the noisy measurement of the channel output and makes an optimal estimate of the state of the dynamical system in the minimum mean square sense. Noisy feedback from the receiver to the transmitter is present. The transmitter obs… ▽ More We consider the problem of communicating the state of a dynamical system via a Shannon Gaussian channel. The receiver, which acts as both a decoder and estimator, observes the noisy measurement of the channel output and makes an optimal estimate of the state of the dynamical system in the minimum mean square sense. Noisy feedback from the receiver to the transmitter is present. The transmitter observes the noise-corrupted feedback message from the receiver together with a possibly noisy measurement of the state the dynamical system. These measurements are then used to encode the message to be transmitted over a noisy Gaussian channel, where a per symbol power constraint is imposed on the transmitted message. Thus, we get a mixed problem of Shannon's source-channel coding problem and a sort of Kalman filtering problem. In particular, we consider two feedback instances, one being feedback of receiver measurements and the second being the receiver's state estimates. We show that optimal encoders and decoders are linear filters with a finite memory and we give explicitly the state space realizations of the optimal filters. For the case where the transmitter has access to noisy measurements of the state, we derive a separation principle for the optimal communication scheme. Furthermore, we investigate the presence of noiseless feedback or no feedback from the receiver to the transmitter. Necessary and sufficient conditions for the existence of a stationary solution are also given for the feedback cases considered. △ Less

Submitted 1 June, 2015; originally announced June 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1404.4350

arXiv:1505.03309 [pdf, other]

Time Localization and Capacity of Faster-Than-Nyquist Signaling

Authors: Ather Gattami, Emil Ringh, Johan Karlsson

Abstract: In this paper, we consider communication over the bandwidth limited analog white Gaussian noise channel using non-orthogonal pulses. In particular, we consider non-orthogonal transmission by signaling samples at a rate higher than the Nyquist rate. Using the faster-than-Nyquist (FTN) framework, Mazo showed that one may transmit symbols carried by sinc pulses at a higher rate than that dictated by… ▽ More In this paper, we consider communication over the bandwidth limited analog white Gaussian noise channel using non-orthogonal pulses. In particular, we consider non-orthogonal transmission by signaling samples at a rate higher than the Nyquist rate. Using the faster-than-Nyquist (FTN) framework, Mazo showed that one may transmit symbols carried by sinc pulses at a higher rate than that dictated by Nyquist without loosing bit error rate. However, as we will show in this paper, such pulses are not necessarily well localized in time. In fact, assuming that signals in the FTN framework are well localized in time, one can construct a signaling scheme that violates the Shannon capacity bound. We also show directly that FTN signals are in general not well localized in time. Therefore, the results of Mazo do not imply that one can transmit more data per time unit without degrading performance in terms of error probability. We also consider FTN signaling in the case of pulses that are different from the sinc pulses. We show that one can use a precoding scheme of low complexity to remove the inter-symbol interference. This leads to the possibility of increasing the number of transmitted samples per time unit and compensate for spectral inefficiency due to signaling at the Nyquist rate of the non sinc pulses. We demonstrate the power of the precoding scheme by simulations. △ Less

Submitted 7 December, 2015; v1 submitted 13 May, 2015; originally announced May 2015.

arXiv:1505.02997 [pdf, other]

Optimal Data and Training Symbol Ratio for Communication over Uncertain Channels

Authors: Ather Gattami

Abstract: We consider the problem of determining the power ratio between the training symbols and data symbols in order to maximize the channel capacity for transmission over uncertain channels with a channel estimate available at both the transmitter and receiver. The receiver makes an estimate of the channel by using a known sequence of training symbols. This channel estimate is then transmitted back to t… ▽ More We consider the problem of determining the power ratio between the training symbols and data symbols in order to maximize the channel capacity for transmission over uncertain channels with a channel estimate available at both the transmitter and receiver. The receiver makes an estimate of the channel by using a known sequence of training symbols. This channel estimate is then transmitted back to the transmitter. The capacity that the transceiver maximizes is the worst case capacity, in the sense that given a noise covariance, the transceiver maximizes the minimal capacity over all distributions of the measurement noise under a fixed covariance matrix known at both the transmitter and receiver. We give an exact expression of the channel capacity as a function of the channel covariance matrix, and the number of training symbols used during a coherence time interval. This expression determines the number of training symbols that need to be used by finding the optimal integer number of training symbols that maximize the channel capacity. As a bi-product, we show that linear filters are optimal at both the transmitter and receiver. △ Less

Submitted 12 May, 2015; originally announced May 2015.

arXiv:1503.07561 [pdf, ps, other]

Primal robustness and semidefinite cones

Authors: Seungil You, Ather Gattami, John C. Doyle

Abstract: This paper reformulates and streamlines the core tools of robust stability and performance for LTI systems using now-standard methods in convex optimization. In particular, robustness analysis can be formulated directly as a primal convex (semidefinite program or SDP) optimization problem using sets of gramians whose closure is a semidefinite cone. This allows various constraints such as structure… ▽ More This paper reformulates and streamlines the core tools of robust stability and performance for LTI systems using now-standard methods in convex optimization. In particular, robustness analysis can be formulated directly as a primal convex (semidefinite program or SDP) optimization problem using sets of gramians whose closure is a semidefinite cone. This allows various constraints such as structured uncertainty to be included directly, and worst-case disturbances and perturbations constructed directly from the primal variables. Well known results such as the KYP lemma and various scaled small gain tests can also be obtained directly through standard SDP duality. To readers familiar with robustness and SDPs, the framework should appear obvious, if only in retrospect. But this is also part of its appeal and should enhance pedagogy, and we hope suggest new research. There is a key lemma proving closure of a grammian that is also obvious but our current proof appears unnecessarily cumbersome, and a final aim of this paper is to enlist the help of experts in robust control and convex optimization in finding simpler alternatives. △ Less

Submitted 25 March, 2015; originally announced March 2015.

Comments: A shorter version submitted to CDC 15

arXiv:1412.6160 [pdf, ps, other]

H infinity Analysis Revisited

Authors: Seungil You, Ather Gattami

Abstract: This paper proposes a direct, and simple approach to the H infinity norm calculation in more general settings. In contrast to the method based on the Kalman-Yakubovich-Popov lemma, our approach does not require a controllability assumption, and returns a sinusoidal input that achieves the H infinity norm of the system including its frequency. In addition, using a semidefinite programming duality,… ▽ More This paper proposes a direct, and simple approach to the H infinity norm calculation in more general settings. In contrast to the method based on the Kalman-Yakubovich-Popov lemma, our approach does not require a controllability assumption, and returns a sinusoidal input that achieves the H infinity norm of the system including its frequency. In addition, using a semidefinite programming duality, we present a new proof of the Kalman- Yakubovich-Popov lemma, and make a connection between strong duality and controllability. Finally, we generalize our approach towards the generalized Kalman-Yakubovich-Popov lemma, which considers input signals within a finite spectrum. △ Less

Submitted 15 December, 2014; originally announced December 2014.

Comments: Submitted to IEEE Transactions on Automatic Control

arXiv:1404.4350 [pdf, other]

Kalman meets Shannon

Authors: Ather Gattami

Abstract: We consider the problem of communicating the state of a dynamical system via a Shannon Gaussian channel. The receiver, which acts as both a decoder and estimator, observes the noisy measurement of the channel output and makes an optimal estimate of the state of the dynamical system in the minimum mean square sense. The transmitter observes a possibly noisy measurement of the state of the dynamical… ▽ More We consider the problem of communicating the state of a dynamical system via a Shannon Gaussian channel. The receiver, which acts as both a decoder and estimator, observes the noisy measurement of the channel output and makes an optimal estimate of the state of the dynamical system in the minimum mean square sense. The transmitter observes a possibly noisy measurement of the state of the dynamical system. These measurements are then used to encode the message to be transmitted over a noisy Gaussian channel, where a per sample power constraint is imposed on the transmitted message. Thus, we get a mixed problem of Shannon's source-channel coding problem and a sort of Kalman filtering problem. We first consider the problem of communication with full state measurements at the transmitter and show that optimal linear encoders don't need to have memory and the optimal linear decoders have an order of at most that of the state dimension. We also give explicitly the structure of the optimal linear filters. For the case where the transmitter has access to noisy measurements of the state, we derive a separation principle for the optimal communication scheme, where the transmitter needs a filter with an order of at most the dimension of the state of the dynamical system. The results are derived for first order linear dynamical systems, but may be extended to MIMO systems with arbitrary order. △ Less

Submitted 12 May, 2015; v1 submitted 16 April, 2014; originally announced April 2014.

arXiv:1402.3402 [pdf, ps, other]

Multi-Objective Optimal Control with Arbitrary Additive and Multiplicative Noise

Authors: Ather Gattami

Abstract: In this paper, we consider the problem of multi-objective optimal control of a dynamical system with additive and multiplicative noises with given second moments and arbitrary probability distributions. The objectives are given by quadratic constraints in the state and controller, where the quadratic forms maybe indefinite and thus not necessarily convex. We show that the problem can be transforme… ▽ More In this paper, we consider the problem of multi-objective optimal control of a dynamical system with additive and multiplicative noises with given second moments and arbitrary probability distributions. The objectives are given by quadratic constraints in the state and controller, where the quadratic forms maybe indefinite and thus not necessarily convex. We show that the problem can be transformed to a semidefinite program and hence convex. The optimization problem is to be optimized with respect to a certain variable serving as the covariance matrix of the state and the controller. We show that affine controllers are optimal and depend on the optimal covariance matrix. Furthermore, we show that optimal controllers are linear if all the quadratic forms are convex in the control variable. The solutions are presented for both the finite and infinite horizon cases. We give a necessary and sufficient condition for mean square stabilizability of the dynamical system with additive and multiplicative noises. The condition is a Lyapunov-like condition whose solution is again given by the covariance matrix of the state and the control variable. The results are illustrated with an example. △ Less

Submitted 14 February, 2014; originally announced February 2014.

arXiv:1309.4251 [pdf, other]

doi 10.1109/CDC.2012.6426380

Optimal Distributed Controller Design with Communication Delays: Application to Vehicle Formations

Authors: Hamid Reza Feyzmahdavian, Assad Alam, Ather Gattami

Abstract: This paper develops a controller synthesis algorithm for distributed LQG control problems under output feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case of this problem has previously been solved, the extension to output-feedback is nontrivial, as the classical separation principle fails.… ▽ More This paper develops a controller synthesis algorithm for distributed LQG control problems under output feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case of this problem has previously been solved, the extension to output-feedback is nontrivial, as the classical separation principle fails. To find the optimal solution, the controller is decomposed into two independent components. One is delayed centralized LQR, and the other is the sum of correction terms based on additional local information. Explicit discrete-time equations are derived whose solutions are the gains of the optimal controller. △ Less

Submitted 17 September, 2013; originally announced September 2013.

Comments: Submitted to the 51nd IEEE Conference on Decision and Control, 2012

arXiv:1209.3135 [pdf, ps, other]

Deterministic Team Problems with Signaling Incentive

Authors: Ather Gattami

Abstract: This paper considers linear quadratic team decision problems where the players in the team affect each other's information structure through their decisions. Whereas the stochastic version of the problem is well known to be complex with nonlinear optimal solutions that are hard to find, the deterministic counterpart is shown to be tractable. We show that under some assumptions on the weight matrix… ▽ More This paper considers linear quadratic team decision problems where the players in the team affect each other's information structure through their decisions. Whereas the stochastic version of the problem is well known to be complex with nonlinear optimal solutions that are hard to find, the deterministic counterpart is shown to be tractable. We show that under some assumptions on the weight matrix and the signaling channels, linear decisions are optimal and can be found efficiently by solving a semi-definite program. △ Less

Submitted 3 February, 2013; v1 submitted 14 September, 2012; originally announced September 2012.

Comments: Submitted for publication

arXiv:1209.2551 [pdf, ps, other]

Multi-Objective Linear Quadratic Team Optimization

Authors: Ather Gattami

Abstract: In this paper, we consider linear quadratic team problems with an arbitrary number of quadratic constraints in both stochastic and deterministic settings. The team consists of players with different measurements about the state of nature. The objective of the team is to minimize a quadratic cost subject to additional finite number of quadratic constraints. We will first consider the Gaussian case,… ▽ More In this paper, we consider linear quadratic team problems with an arbitrary number of quadratic constraints in both stochastic and deterministic settings. The team consists of players with different measurements about the state of nature. The objective of the team is to minimize a quadratic cost subject to additional finite number of quadratic constraints. We will first consider the Gaussian case, where the state of nature is assumed to have a Gaussian distribution, and show that the linear decisions are optimal and can be found by solving a semidefinite program We then consider the problem of minimizing a quadratic objective for the worst case scenario, subject to an arbitrary number of deterministic quadratic constraints. We show that linear decisions can be found by solving a semidefinite program. △ Less

Submitted 12 September, 2012; originally announced September 2012.

Comments: Submitted for publication

arXiv:1205.4563 [pdf, ps, other]

Iterative Source-Channel Coding Approach to Witsenhausen's Counterexample

Authors: Johannes Kron, Ather Gattami, Tobias J. Oechtering, Mikael Skoglund

Abstract: In 1968, Witsenhausen introduced his famous counterexample where he showed that even in the simple linear quadratic static team decision problem, complex nonlinear decisions could outperform any given linear decision. This problem has served as a benchmark problem for decades where researchers try to achieve the optimal solution. This paper introduces a systematic iterative source--channel coding… ▽ More In 1968, Witsenhausen introduced his famous counterexample where he showed that even in the simple linear quadratic static team decision problem, complex nonlinear decisions could outperform any given linear decision. This problem has served as a benchmark problem for decades where researchers try to achieve the optimal solution. This paper introduces a systematic iterative source--channel coding approach to solve problems of the Witsenhausen Counterexample-character. The advantage of the presented approach is its simplicity. Also, no assumptions are made about the shape of the space of policies. The minimal cost obtained using the introduced method is 0.16692462, which is the lowest known to date. △ Less

Submitted 21 May, 2012; originally announced May 2012.

arXiv:1205.1907 [pdf, ps, other]

Optimal Control and Estimation for Partially Nested Interconnected Systems

Authors: Ather Gattami, Sanjoy Mitter

Abstract: In this paper, we study distributed estimation and control problems over graphs under partially nested information patterns. We show a duality result that is very similar to the classical duality result between state estimation and state feedback control with a classical information pattern, under the condition that the disturbances entering different systems on the graph are uncorrelated. The dis… ▽ More In this paper, we study distributed estimation and control problems over graphs under partially nested information patterns. We show a duality result that is very similar to the classical duality result between state estimation and state feedback control with a classical information pattern, under the condition that the disturbances entering different systems on the graph are uncorrelated. The distributed estimation problem decomposes into $N$ separate estimation problems, where $N$ is the number of interconnected subsystems over the graph, and the solution to each subproblem is simply the optimal Kalman filter. This also gives the solution to the distributed control problem due to the duality of distributed estimation and control under partially nested information pattern. We then consider a weighted distributed estimation problem, where we get coupling between the estimators, and separation between the estimators is not possible. We propose a solution based on linear quadratic team decision theory, which provides a generalized Riccati equation for teams. We show that the weighted estimation problem is the dual to a distributed state feedback problem, where the disturbances entering the interconnected systems are correlated. △ Less

Submitted 14 September, 2012; v1 submitted 9 May, 2012; originally announced May 2012.

Comments: Submitted for publication

arXiv:1204.6178 [pdf, other]

Distributed Output-Feedback LQG Control with Delayed Information Sharing

Authors: Hamid Reza Feyzmahdavian, Ather Gattami, Mikael Johansson

Abstract: This paper develops a controller synthesis method for distributed LQG control problems under output-feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case has previously been solved, the extension to output-feedback is nontrivial as the classical separation principle fails. To find the optimal… ▽ More This paper develops a controller synthesis method for distributed LQG control problems under output-feedback. We consider a system consisting of three interconnected linear subsystems with a delayed information sharing structure. While the state-feedback case has previously been solved, the extension to output-feedback is nontrivial as the classical separation principle fails. To find the optimal solution, the controller is decomposed into two independent components: a centralized LQG-optimal controller under delayed state observations, and a sum of correction terms based on additional local information available to decision makers. Explicit discrete-time equations are derived whose solutions are the gains of the optimal controller. △ Less

Submitted 17 September, 2013; v1 submitted 27 April, 2012; originally announced April 2012.

Comments: 25 pages, 3 figures

arXiv:1204.3876 [pdf, ps, other]

On Optimal Distributed Output-Feedback Control over Acyclic Graphs

Authors: Ather Gattami, Omid Khorsand

Abstract: In this paper, we consider the problem of distributed optimal control of linear dynamical systems with a quadratic cost criterion. We study the case of output feedback control for two interconnected dynamical systems, and show that the linear optimal solution can be obtained from a combination of two uncoupled Riccati equations and two coupled Riccati equations. In this paper, we consider the problem of distributed optimal control of linear dynamical systems with a quadratic cost criterion. We study the case of output feedback control for two interconnected dynamical systems, and show that the linear optimal solution can be obtained from a combination of two uncoupled Riccati equations and two coupled Riccati equations. △ Less

Submitted 17 April, 2012; originally announced April 2012.

arXiv:1204.1869 [pdf, other]

Optimal Distributed Controller Synthesis for Chain Structures: Applications to Vehicle Formations

Authors: Omid Khorsand, Assad Alam, Ather Gattami

Abstract: We consider optimal distributed controller synthesis for an interconnected system subject to communication constraints, in linear quadratic settings. Motivated by the problem of finite heavy duty vehicle platooning, we study systems composed of interconnected subsystems over a chain graph. By decomposing the system into orthogonal modes, the cost function can be separated into individual component… ▽ More We consider optimal distributed controller synthesis for an interconnected system subject to communication constraints, in linear quadratic settings. Motivated by the problem of finite heavy duty vehicle platooning, we study systems composed of interconnected subsystems over a chain graph. By decomposing the system into orthogonal modes, the cost function can be separated into individual components. Thereby, derivation of the optimal controllers in state-space follows immediately. The optimal controllers are evaluated under the practical setting of heavy duty vehicle platooning with communication constraints. It is shown that the performance can be significantly improved by adding a few communication links. The results show that the proposed optimal distributed controller performs almost as well as the centralized linear quadratic Gaussian controller and outperforms a suboptimal controller in terms of control input. Furthermore, the control input energy can be reduced significantly with the proposed controller compared to the suboptimal controller, depending on the vehicle position in the platoon. Thus, the importance of considering preceding vehicles as well as the following vehicles in a platoon for fuel optimality is concluded. △ Less

Submitted 9 April, 2012; originally announced April 2012.

arXiv:1103.5678 [pdf, other]

doi 10.1109/CDC.2011.6161194

Converging an Overlay Network to a Gradient Topology

Authors: Håkan Terelius, Guodong Shi, Jim Dowling, Amir Payberah, Ather Gattami, Karl Henrik Johansson

Abstract: In this paper, we investigate the topology convergence problem for the gossip-based Gradient overlay network. In an overlay network where each node has a local utility value, a Gradient overlay network is characterized by the properties that each node has a set of neighbors with the same utility value (a similar view) and a set of neighbors containing higher utility values (gradient neighbor set),… ▽ More In this paper, we investigate the topology convergence problem for the gossip-based Gradient overlay network. In an overlay network where each node has a local utility value, a Gradient overlay network is characterized by the properties that each node has a set of neighbors with the same utility value (a similar view) and a set of neighbors containing higher utility values (gradient neighbor set), such that paths of increasing utilities emerge in the network topology. The Gradient overlay network is built using gossi** and a preference function that samples from nodes using a uniform random peer sampling service. We analyze it using tools from matrix analysis, and we prove both the necessary and sufficient conditions for convergence to a complete gradient structure, as well as estimating the convergence time and providing bounds on worst-case convergence time. Finally, we show in simulations the potential of the Gradient overlay, by building a more efficient live-streaming peer-to-peer (P2P) system than one built using uniform random peer sampling. △ Less

Submitted 29 March, 2011; originally announced March 2011.

Comments: Submitted to 50th IEEE Conference on Decision and Control (CDC 2011)

Showing 1–27 of 27 results for author: Gattami, A