-
Scaling Data Plane Verification with Intent-based Slicing
Authors:
Kuan-Yen Chou,
Santhosh Prabhu,
Giri Subramanian,
Wenxuan Zhou,
Aanand Nayyar,
Brighten Godfrey,
Matthew Caesar
Abstract:
Data plane verification has grown into a powerful tool to ensure network correctness. However, existing monolithic data plane models have high memory requirements with large networks, and the existing method of scaling out is too limited in expressiveness to capture practical network features. In this paper, we describe Scylla, a general data plane verifier that provides fine-grained scale-out wit…
▽ More
Data plane verification has grown into a powerful tool to ensure network correctness. However, existing monolithic data plane models have high memory requirements with large networks, and the existing method of scaling out is too limited in expressiveness to capture practical network features. In this paper, we describe Scylla, a general data plane verifier that provides fine-grained scale-out without the need for a monolithic network model. Scylla creates models for what we call intent-based slices, each of which is constructed at a fine (rule-level) granularity with just enough to verify a given set of intents. The sliced models are retained in memory across a cluster and are incrementally updated in a distributed compute cluster in response to network updates. Our experiments show that Scylla makes the scaling problem more granular -- tied to the size of the intent-based slices rather than that of the overall network. This enables Scylla to verify large, complex networks in minimum units of work that are significantly smaller (in both memory and time) than past techniques, enabling fast scale-out verification with minimal resource requirement.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
Authors:
Dengwang Tang,
Rahul Jain,
Ashutosh Nayyar,
Pierluigi Nuzzo
Abstract:
In this paper, we introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. This is a pure exploration problem in a stochastic finite armed bandit model. Each arm is associated with a reward and multiple types of costs from unknown distributions. Unlike the unconstrained best arm identification problem, the optimal solution for the CBMAI problem may be a randomiz…
▽ More
In this paper, we introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. This is a pure exploration problem in a stochastic finite armed bandit model. Each arm is associated with a reward and multiple types of costs from unknown distributions. Unlike the unconstrained best arm identification problem, the optimal solution for the CBMAI problem may be a randomized mixture of multiple arms. The goal thus is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$. We propose a novel, parameter-free algorithm, called the Score Function-based Successive Reject (SFSR) algorithm, that combines the classical successive reject framework with a novel score-function-based rejection criteria based on linear programming theory to identify the optimal support. We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$ and some constants that characterize the hardness of the problem instance. We also develop an information theoretic lower bound on the error probability that shows that these constants appropriately characterize the problem difficulty. We validate this empirically on a number of average and hard instances.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Model approximation in MDPs with unbounded per-step cost
Authors:
Berk Bozkurt,
Aditya Mahajan,
Ashutosh Nayyar,
Yi Ouyang
Abstract:
We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process $\mathcal{M}$ when we only have access to an approximate model $\hat{\mathcal{M}}$. How well does an optimal policy $\hatπ^{\star}$ of the approximate model perform when used in the original model $\mathcal{M}$? We answer this question by bounding a weighted norm of the difference…
▽ More
We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process $\mathcal{M}$ when we only have access to an approximate model $\hat{\mathcal{M}}$. How well does an optimal policy $\hatπ^{\star}$ of the approximate model perform when used in the original model $\mathcal{M}$? We answer this question by bounding a weighted norm of the difference between the value function of $\hatπ^\star $ when used in $\mathcal{M}$ and the optimal value function of $\mathcal{M}$. We then extend our results and obtain potentially tighter upper bounds by considering affine transformations of the per-step cost. We further provide upper bounds that explicitly depend on the weighted distance between cost functions and weighted distance between transition kernels of the original and approximate models. We present examples to illustrate our results.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Posterior Sampling-based Online Learning for Episodic POMDPs
Authors:
Dengwang Tang,
Dongze Ye,
Rahul Jain,
Ashutosh Nayyar,
Pierluigi Nuzzo
Abstract:
Learning in POMDPs is known to be significantly harder than MDPs. In this paper, we consider the online learning problem for episodic POMDPs with unknown transition and observation models. We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs (PS4POMDPs), which is much simpler and more implementable compared to state-of-the-art optimism-based online learning algorithms…
▽ More
Learning in POMDPs is known to be significantly harder than MDPs. In this paper, we consider the online learning problem for episodic POMDPs with unknown transition and observation models. We propose a Posterior Sampling-based reinforcement learning algorithm for POMDPs (PS4POMDPs), which is much simpler and more implementable compared to state-of-the-art optimism-based online learning algorithms for POMDPs. We show that the Bayesian regret of the proposed algorithm scales as the square root of the number of episodes, matching the lower bound, and is polynomial in the other parameters. In a general setting, its regret scales exponentially in the horizon length $H$, and we show that this is inevitable by providing a lower bound. However, when the POMDP is undercomplete and weakly revealing (a common assumption in the recent literature), we establish a polynomial Bayesian regret bound. We finally propose a posterior sampling algorithm for multi-agent POMDPs, and show it too has sublinear regret.
△ Less
Submitted 23 May, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Conditional Kernel Imitation Learning for Continuous State Environments
Authors:
Rishabh Agrawal,
Nathan Dahlin,
Rahul Jain,
Ashutosh Nayyar
Abstract:
Imitation Learning (IL) is an important paradigm within the broader reinforcement learning (RL) methodology. Unlike most of RL, it does not assume availability of reward-feedback. Reward inference and sha** are known to be difficult and error-prone methods particularly when the demonstration data comes from human experts. Classical methods such as behavioral cloning and inverse reinforcement lea…
▽ More
Imitation Learning (IL) is an important paradigm within the broader reinforcement learning (RL) methodology. Unlike most of RL, it does not assume availability of reward-feedback. Reward inference and sha** are known to be difficult and error-prone methods particularly when the demonstration data comes from human experts. Classical methods such as behavioral cloning and inverse reinforcement learning are highly sensitive to estimation errors, a problem that is particularly acute in continuous state space problems. Meanwhile, state-of-the-art IL algorithms convert behavioral policy learning problems into distribution-matching problems which often require additional online interaction data to be effective. In this paper, we consider the problem of imitation learning in continuous state space environments based solely on observed behavior, without access to transition dynamics information, reward structure, or, most importantly, any additional interactions with the environment. Our approach is based on the Markov balance equation and introduces a novel conditional kernel density estimation-based imitation learning framework. It involves estimating the environment's transition dynamics using conditional kernel density estimators and seeks to satisfy the probabilistic balance equations for the environment. We establish that our estimators satisfy basic asymptotic consistency requirements. Through a series of numerical experiments on continuous state benchmark environments, we show consistently superior empirical performance over many state-of-the-art IL algorithms.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.
-
Optimal Symmetric Strategies in Multi-Agent Systems with Decentralized Information
Authors:
Sagar Sudhakara,
Ashutosh Nayyar
Abstract:
We consider a cooperative multi-agent system consisting of a team of agents with decentralized information. Our focus is on the design of symmetric (i.e. identical) strategies for the agents in order to optimize a finite horizon team objective. We start with a general information structure and then consider some special cases. The constraint of using symmetric strategies introduces new features an…
▽ More
We consider a cooperative multi-agent system consisting of a team of agents with decentralized information. Our focus is on the design of symmetric (i.e. identical) strategies for the agents in order to optimize a finite horizon team objective. We start with a general information structure and then consider some special cases. The constraint of using symmetric strategies introduces new features and complications in the team problem. For example, we show in a simple example that randomized symmetric strategies may outperform deterministic symmetric strategies. We also discuss why some of the known approaches for reducing agents' private information in teams may not work under the constraint of symmetric strategies. We then adopt the common information approach for our problem and modify it to accommodate the use of symmetric strategies. This results in a common information based dynamic program where each step involves minimization over a single function from the space of an agent's private information to the space of probability distributions over actions. We present specialized models where private information can be reduced using simple dynamic program based arguments.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes
Authors:
Krishna C. Kalagarla,
Dhruva Kartik,
Dongming Shen,
Rahul Jain,
Ashutosh Nayyar,
Pierluigi Nuzzo
Abstract:
Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we firs…
▽ More
Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we first introduce an optimal control theory for partially observable Markov decision processes (POMDPs) with finite linear temporal logic constraints. We provide a structured methodology for synthesizing policies that maximize a cumulative reward while ensuring that the probability of satisfying a temporal logic constraint is sufficiently high. Our approach comes with guarantees on approximate reward optimality and constraint satisfaction. We then build on this approach to design an optimal control framework for logically constrained multi-agent settings with information asymmetry. We illustrate the effectiveness of our approach by implementing it on several case studies.
△ Less
Submitted 19 June, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
A Novel Point-based Algorithm for Multi-agent Control Using the Common Information Approach
Authors:
Dengwang Tang,
Ashutosh Nayyar,
Rahul Jain
Abstract:
The Common Information (CI) approach provides a systematic way to transform a multi-agent stochastic control problem to a single-agent partially observed Markov decision problem (POMDP) called the coordinator's POMDP. However, such a POMDP can be hard to solve due to its extraordinarily large action space. We propose a new algorithm for multi-agent stochastic control problems, called coordinator's…
▽ More
The Common Information (CI) approach provides a systematic way to transform a multi-agent stochastic control problem to a single-agent partially observed Markov decision problem (POMDP) called the coordinator's POMDP. However, such a POMDP can be hard to solve due to its extraordinarily large action space. We propose a new algorithm for multi-agent stochastic control problems, called coordinator's heuristic search value iteration (CHSVI), that combines the CI approach and point-based POMDP algorithms for large action spaces. We demonstrate the algorithm through optimally solving several benchmark problems.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
Optimal Communication and Control Strategies for a Multi-Agent System in the Presence of an Adversary
Authors:
Dhruva Kartik,
Sagar Sudhakara,
Rahul Jain,
Ashutosh Nayyar
Abstract:
We consider a multi-agent system in which a decentralized team of agents controls a stochastic system in the presence of an adversary. Instead of committing to a fixed information sharing protocol, the agents can strategically decide at each time whether to share their private information with each other or not. The agents incur a cost whenever they communicate with each other and the adversary ma…
▽ More
We consider a multi-agent system in which a decentralized team of agents controls a stochastic system in the presence of an adversary. Instead of committing to a fixed information sharing protocol, the agents can strategically decide at each time whether to share their private information with each other or not. The agents incur a cost whenever they communicate with each other and the adversary may eavesdrop on their communication. Thus, the agents in the team must effectively coordinate with each other while being robust to the adversary's malicious actions. We model this interaction between the team and the adversary as a stochastic zero-sum game where the team aims to minimize a cost while the adversary aims to maximize it. Under some assumptions on the adversary's capabilities, we characterize a min-max control and communication strategy for the team. We supplement this characterization with several structural results that can make the computation of the min-max strategy more tractable.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Optimal Control of Partially Observable Markov Decision Processes with Finite Linear Temporal Logic Constraints
Authors:
Krishna C. Kalagarla,
Dhruva Kartik,
Dongming Shen,
Rahul Jain,
Ashutosh Nayyar,
Pierluigi Nuzzo
Abstract:
Autonomous agents often operate in scenarios where the state is partially observed. In addition to maximizing their cumulative reward, agents must execute complex tasks with rich temporal and logical structures. These tasks can be expressed using temporal logic languages like finite linear temporal logic (LTL_f). This paper, for the first time, provides a structured framework for designing agent p…
▽ More
Autonomous agents often operate in scenarios where the state is partially observed. In addition to maximizing their cumulative reward, agents must execute complex tasks with rich temporal and logical structures. These tasks can be expressed using temporal logic languages like finite linear temporal logic (LTL_f). This paper, for the first time, provides a structured framework for designing agent policies that maximize the reward while ensuring that the probability of satisfying the temporal logic specification is sufficiently high. We reformulate the problem as a constrained partially observable Markov decision process (POMDP) and provide a novel approach that can leverage off-the-shelf unconstrained POMDP solvers for solving it. Our approach guarantees approximate optimality and constraint satisfaction with high probability. We demonstrate its effectiveness by implementing it on several models of interest.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
Recent trends in Social Engineering Scams and Case study of Gift Card Scam
Authors:
Rajasekhar Chaganti,
Bharat Bhushan,
Anand Nayyar,
Azrour Mourade
Abstract:
Social engineering scams (SES) has been existed since the adoption of the telecommunications by humankind. An earlier version of the scams include leveraging premium phone service to charge the consumers and service providers but not limited to. There are variety of techniques being considered to scam the people due to the advancements in digital data access capabilities and Internet technology. A…
▽ More
Social engineering scams (SES) has been existed since the adoption of the telecommunications by humankind. An earlier version of the scams include leveraging premium phone service to charge the consumers and service providers but not limited to. There are variety of techniques being considered to scam the people due to the advancements in digital data access capabilities and Internet technology. A lot of research has been done to identify the scammer methodologies and characteristics of the scams. However, the scammers finding new ways to lure the consumers and stealing their financial assets. An example would be a recent circumstance of Covid-19 unemployment, which was used as a weapon to scam the US citizens. These scams will not be stop** here, and will keep appearing with new social engineering strategies in the near future. So, to better prepare these kind of scams in ever-changing world, we describe the recent trends of various social engineering scams targeting the innocent people all over the world, who oversight the consequences of scams,and also give detailed description of recent social engineering scams including Covid scams. The social engineering scan threat model architecture is also proposed to map various scams. In addition, we discuss the case study of real-time gift card scam targeting various enterprise organization customers to steal their money and put the organization reputation in stake. We also provide recommendations to internet users for not falling a victim of social engineering scams. In the end, we provide insights on how to prepare/respond to the social engineering scams by following the security incident detection and response life cycle in enterprises
△ Less
Submitted 13 October, 2021;
originally announced October 2021.
-
A Bayesian Learning Algorithm for Unknown Zero-sum Stochastic Games with an Arbitrary Opponent
Authors:
Mehdi Jafarnia-Jahromi,
Rahul Jain,
Ashutosh Nayyar
Abstract:
In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint…
▽ More
In this paper, we propose Posterior Sampling Reinforcement Learning for Zero-sum Stochastic Games (PSRL-ZSG), the first online learning algorithm that achieves Bayesian regret bound of $O(HS\sqrt{AT})$ in the infinite-horizon zero-sum stochastic games with average-reward criterion. Here $H$ is an upper bound on the span of the bias function, $S$ is the number of states, $A$ is the number of joint actions and $T$ is the horizon. We consider the online setting where the opponent can not be controlled and can take any arbitrary time-adaptive history-dependent strategy. Our regret bound improves on the best existing regret bound of $O(\sqrt[3]{DS^2AT^2})$ by Wei et al. (2017) under the same assumption and matches the theoretical lower bound in $T$.
△ Less
Submitted 11 March, 2024; v1 submitted 7 September, 2021;
originally announced September 2021.
-
A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems
Authors:
Mukul Gagrani,
Sagar Sudhakara,
Aditya Mahajan,
Ashutosh Nayyar,
Yi Ouyang
Abstract:
We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al (arXiv:1709.04047). The regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does…
▽ More
We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al (arXiv:1709.04047). The regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of $\tilde{\mathcal{O}}(\sqrt{T})$, where $T$ is the time-horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in~$T$.
△ Less
Submitted 19 September, 2022; v1 submitted 19 August, 2021;
originally announced August 2021.
-
Scalable regret for learning to control network-coupled subsystems with unknown dynamics
Authors:
Sagar Sudhakara,
Aditya Mahajan,
Ashutosh Nayyar,
Yi Ouyang
Abstract:
We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the…
▽ More
We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy with respect to an oracle who knows the system model. Viewing the interconnected subsystems globally and directly using existing LQG learning algorithms for the global system results in a regret that increases super-linearly with the number of subsystems. Instead, we propose a new Thompson sampling based learning algorithm which exploits the structure of the underlying network. We show that the expected regret of the proposed algorithm is bounded by $\tilde{\mathcal{O}} \big( n \sqrt{T} \big)$ where $n$ is the number of subsystems, $T$ is the time horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in $n$ and $T$. Thus, the regret scales linearly with the number of subsystems. We present numerical experiments to illustrate the salient features of the proposed algorithm.
△ Less
Submitted 18 August, 2021;
originally announced August 2021.
-
Optimal communication and control strategies in a multi-agent MDP problem
Authors:
Sagar Sudhakara,
Dhruva Kartik,
Rahul Jain,
Ashutosh Nayyar
Abstract:
The problem of controlling multi-agent systems under different models of information sharing among agents has received significant attention in the recent literature. In this paper, we consider a setup where rather than committing to a fixed information sharing protocol (e.g. periodic sharing or no sharing etc), agents can dynamically decide at each time step whether to share information with each…
▽ More
The problem of controlling multi-agent systems under different models of information sharing among agents has received significant attention in the recent literature. In this paper, we consider a setup where rather than committing to a fixed information sharing protocol (e.g. periodic sharing or no sharing etc), agents can dynamically decide at each time step whether to share information with each other and incur the resulting communication cost. This setup requires a joint design of agents' communication and control strategies in order to optimize the trade-off between communication costs and control objective. We first show that agents can ignore a big part of their private information without compromising the system performance. We then provide a common information approach based solution for the strategy optimization problem. This approach relies on constructing a fictitious POMDP whose solution (obtained via a dynamic program) characterizes the optimal strategies for the agents. We also show that our solution can be easily modified to incorporate constraints on when and how frequently agents can communicate.
△ Less
Submitted 22 April, 2021;
originally announced April 2021.
-
Online Learning for Unknown Partially Observable MDPs
Authors:
Mehdi Jafarnia-Jahromi,
Rahul Jain,
Ashutosh Nayyar
Abstract:
Solving Partially Observable Markov Decision Processes (POMDPs) is hard. Learning optimal controllers for POMDPs when the model is unknown is harder. Online learning of optimal controllers for unknown POMDPs, which requires efficient learning using regret-minimizing algorithms that effectively tradeoff exploration and exploitation, is even harder, and no solution exists currently. In this paper, w…
▽ More
Solving Partially Observable Markov Decision Processes (POMDPs) is hard. Learning optimal controllers for POMDPs when the model is unknown is harder. Online learning of optimal controllers for unknown POMDPs, which requires efficient learning using regret-minimizing algorithms that effectively tradeoff exploration and exploitation, is even harder, and no solution exists currently. In this paper, we consider infinite-horizon average-cost POMDPs with unknown transition model, though a known observation model. We propose a natural posterior sampling-based reinforcement learning algorithm (PSRL-POMDP) and show that it achieves a regret bound of $O(\log T)$, where $T$ is the time horizon, when the parameter set is finite. In the general case (continuous parameter set), we show that the algorithm achieves $O (T^{2/3})$ regret under two technical assumptions. To the best of our knowledge, this is the first online RL algorithm for POMDPs and has sub-linear regret.
△ Less
Submitted 14 June, 2021; v1 submitted 24 February, 2021;
originally announced February 2021.
-
Dynamic Games among Teams with Delayed Intra-Team Information Sharing
Authors:
Dengwang Tang,
Hamidreza Tavafoghi,
Vijay Subramanian,
Ashutosh Nayyar,
Demosthenis Teneketzis
Abstract:
We analyze a class of stochastic dynamic games among teams with asymmetric information, where members of a team share their observations internally with a delay of $d$. Each team is associated with a controlled Markov Chain, whose dynamics are coupled through the players' actions. These games exhibit challenges in both theory and practice due to the presence of signaling and the increasing domain…
▽ More
We analyze a class of stochastic dynamic games among teams with asymmetric information, where members of a team share their observations internally with a delay of $d$. Each team is associated with a controlled Markov Chain, whose dynamics are coupled through the players' actions. These games exhibit challenges in both theory and practice due to the presence of signaling and the increasing domain of information over time. We develop a general approach to characterize a subset of Nash Equilibria where the agents can use a compressed version of their information, instead of the full information, to choose their actions. We identify two subclasses of strategies: Sufficient Private Information Based (SPIB) strategies, which only compress private information, and Compressed Information Based (CIB) strategies, which compress both common and private information. We show that while SPIB-strategy-based equilibria always exist, the same is not true for CIB-strategy-based equilibria. We develop a backward inductive sequential procedure, whose solution (if it exists) provides a CIB strategy-based equilibrium. We identify some instances where we can guarantee the existence of a solution to the above procedure. Our results highlight the tension among compression of information, existence of (compression based) equilibria, and backward inductive sequential computation of such equilibria in stochastic dynamic games with asymmetric information.
△ Less
Submitted 2 April, 2021; v1 submitted 23 February, 2021;
originally announced February 2021.
-
Common Information Belief based Dynamic Programs for Stochastic Zero-sum Games with Competing Teams
Authors:
Dhruva Kartik,
Ashutosh Nayyar,
Urbashi Mitra
Abstract:
Decentralized team problems where players have asymmetric information about the state of the underlying stochastic system have been actively studied, but \emph{games} between such teams are less understood. We consider a general model of zero-sum stochastic games between two competing teams. This model subsumes many previously considered team and zero-sum game models. For this general model, we pr…
▽ More
Decentralized team problems where players have asymmetric information about the state of the underlying stochastic system have been actively studied, but \emph{games} between such teams are less understood. We consider a general model of zero-sum stochastic games between two competing teams. This model subsumes many previously considered team and zero-sum game models. For this general model, we provide bounds on the upper (min-max) and lower (max-min) values of the game. Furthermore, if the upper and lower values of the game are identical (i.e., if the game has a \emph{value}), our bounds coincide with the value of the game. Our bounds are obtained using two dynamic programs based on a sufficient statistic known as the common information belief (CIB). We also identify certain information structures in which only the minimizing team controls the evolution of the CIB. In these cases, we show that one of our CIB based dynamic programs can be used to find the min-max strategy (in addition to the min-max value). We propose an approximate dynamic programming approach for computing the values (and the strategy when applicable) and illustrate our results with the help of an example.
△ Less
Submitted 27 September, 2021; v1 submitted 10 February, 2021;
originally announced February 2021.
-
Thompson sampling for linear quadratic mean-field teams
Authors:
Mukul Gagrani,
Sagar Sudhakara,
Aditya Mahajan,
Ashutosh Nayyar,
Yi Ouyang
Abstract:
We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based…
▽ More
We consider optimal control of an unknown multi-agent linear quadratic (LQ) system where the dynamics and the cost are coupled across the agents through the mean-field (i.e., empirical mean) of the states and controls. Directly using single-agent LQ learning algorithms in such models results in regret which increases polynomially with the number of agents. We propose a new Thompson sampling based learning algorithm which exploits the structure of the system model and show that the expected Bayesian regret of our proposed algorithm for a system with agents of $|M|$ different types at time horizon $T$ is $\tilde{\mathcal{O}} \big( |M|^{1.5} \sqrt{T} \big)$ irrespective of the total number of agents, where the $\tilde{\mathcal{O}}$ notation hides logarithmic factors in $T$. We present detailed numerical experiments to illustrate the salient features of the proposed algorithm.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Optimal Dynamic Mechanism Design with Stochastic Supply and Flexible Consumers
Authors:
Shiva Navabi,
Ashutosh Nayyar
Abstract:
We consider the problem of designing an expected-revenue maximizing mechanism for allocating multiple non-perishable goods of $k$ varieties to flexible consumers over $T$ time steps. In our model, a random number of goods of each variety may become available to the seller at each time and a random number of consumers may enter the market at each time. Each consumer is present in the market for one…
▽ More
We consider the problem of designing an expected-revenue maximizing mechanism for allocating multiple non-perishable goods of $k$ varieties to flexible consumers over $T$ time steps. In our model, a random number of goods of each variety may become available to the seller at each time and a random number of consumers may enter the market at each time. Each consumer is present in the market for one time step and wants to consume one good of one of its desired varieties. Each consumer is associated with a flexibility level that indicates the varieties of the goods it is equally interested in. A consumer's flexibility level and the utility it gets from consuming a good of its desired varieties are its private information. We characterize the allocation rule for a Bayesian incentive compatible, individually rational and expected revenue maximizing mechanism in terms of the solution to a dynamic program. The corresponding payment function is also specified in terms of the optimal allocation function. We leverage the structure of the consumers' flexibility model to simplify the dynamic program and provide an alternative description of the optimal mechanism in terms of thresholds computed by the dynamic program.
△ Less
Submitted 6 July, 2020;
originally announced July 2020.
-
Testing for Anomalies: Active Strategies and Non-asymptotic Analysis
Authors:
Dhruva Kartik,
Ashutosh Nayyar,
Urbashi Mitra
Abstract:
The problem of verifying whether a multi-component system has anomalies or not is addressed. Each component can be probed over time in a data-driven manner to obtain noisy observations that indicate whether the selected component is anomalous or not. The aim is to minimize the probability of incorrectly declaring the system to be free of anomalies while ensuring that the probability of correctly d…
▽ More
The problem of verifying whether a multi-component system has anomalies or not is addressed. Each component can be probed over time in a data-driven manner to obtain noisy observations that indicate whether the selected component is anomalous or not. The aim is to minimize the probability of incorrectly declaring the system to be free of anomalies while ensuring that the probability of correctly declaring it to be safe is sufficiently large. This problem is modeled as an active hypothesis testing problem in the Neyman-Pearson setting. Component-selection and inference strategies are designed and analyzed in the non-asymptotic regime. For a specific class of homogeneous problems, stronger (with respect to prior work) non-asymptotic converse and achievability bounds are provided.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems
Authors:
Seyed Mohammad Asghari,
Yi Ouyang,
Ashutosh Nayyar
Abstract:
Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linea…
▽ More
Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems' dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in $T$ regret, where $T$ is the time horizon. When only one system's dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within $O(\sqrt{T})$ of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a $\tilde{O}(\sqrt{T})$ regret bound. (Here $\tilde{O}(\cdot)$ hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.
△ Less
Submitted 27 January, 2020;
originally announced January 2020.
-
Fixed-horizon Active Hypothesis Testing
Authors:
Dhruva Kartik,
Ashutosh Nayyar,
Urbashi Mitra
Abstract:
Two active hypothesis testing problems are formulated. In these problems, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The first problem is an asymmetric formulation in which the the objective is to minimize the probability of incorrectly declaring a particular hypothesis t…
▽ More
Two active hypothesis testing problems are formulated. In these problems, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The first problem is an asymmetric formulation in which the the objective is to minimize the probability of incorrectly declaring a particular hypothesis to be true while ensuring that the probability of correctly declaring that hypothesis is moderately high. This formulation can be seen as a generalization of the formulation in the classical Chernoff-Stein lemma to an active setting. The second problem is a symmetric formulation in which the objective is to minimize the probability of making an incorrect inference (misclassification probability) while ensuring that the true hypothesis is declared conclusively with moderately high probability. For these problems, lower and upper bounds on the optimal misclassification probabilities are derived and these bounds are shown to be asymptotically tight. Classical approaches for experiment selection suggest use of randomized and, in some cases, open-loop strategies. As opposed to these classical approaches, fully deterministic and adaptive experiment selection strategies are provided. It is shown that these strategies are asymptotically optimal and further, using numerical experiments, it is demonstrated that these novel experiment selection strategies (coupled with appropriate inference strategies) have a significantly better performance in the non-asymptotic regime.
△ Less
Submitted 15 November, 2019;
originally announced November 2019.
-
Zero-sum Stochastic Games with Asymmetric Information
Authors:
Dhruva Kartik,
Ashutosh Nayyar
Abstract:
A general model for zero-sum stochastic games with asymmetric information is considered. In this model, each player's information at each time can be divided into a common information part and a private information part. Under certain conditions on the evolution of the common and private information, a dynamic programming characterization of the value of the game (if it exists) is presented. If th…
▽ More
A general model for zero-sum stochastic games with asymmetric information is considered. In this model, each player's information at each time can be divided into a common information part and a private information part. Under certain conditions on the evolution of the common and private information, a dynamic programming characterization of the value of the game (if it exists) is presented. If the value of the zero-sum game does not exist, then the dynamic program provides bounds on the upper and lower values of the game. This dynamic program is then used for a class of zero-sum stochastic games with complete information on one side and partial information on the other, that is, games where one player has complete information about state, actions and observation history while the other player may only have partial information about the state and action history. For such games, it is shown that the value exists and can be characterized using the dynamic program. It is further shown that for this class of games, the dynamic program can be used to compute an equilibrium strategy for the more informed player in which the player selects its action using its private information and the common information belief.
△ Less
Submitted 24 December, 2019; v1 submitted 3 September, 2019;
originally announced September 2019.
-
Optimal scheduling strategy for networked estimation with energy harvesting
Authors:
Marcos M. Vasconcelos,
Mukul Gagrani,
Ashutosh Nayyar,
Urbashi Mitra
Abstract:
Joint optimization of scheduling and estimation policies is considered for a system with two sensors and two non-collocated estimators. Each sensor produces an independent and identically distributed sequence of random variables, and each estimator forms estimates of the corresponding sequence with respect to the mean-squared error sense. The data generated by the sensors is transmitted to the cor…
▽ More
Joint optimization of scheduling and estimation policies is considered for a system with two sensors and two non-collocated estimators. Each sensor produces an independent and identically distributed sequence of random variables, and each estimator forms estimates of the corresponding sequence with respect to the mean-squared error sense. The data generated by the sensors is transmitted to the corresponding estimators, over a bandwidth-constrained wireless network that can support a single packet per time slot. The access to the limited communication resources is determined by a scheduler who decides which sensor measurement to transmit based on both observations. The scheduler has an energy-harvesting battery of limited capacity, which couples the decision-making problem in time. Despite the overall lack of convexity of the team decision problem, it is shown that this system admits globally optimal scheduling and estimation strategies under the assumption that the distributions of the random variables at the sensors are symmetric and unimodal. Additionally, the optimal scheduling policy has a structure characterized by a threshold function that depends on the time index and energy level. A recursive algorithm for threshold computation is provided.
△ Less
Submitted 16 August, 2019;
originally announced August 2019.
-
Worst-case Guarantees for Remote Estimation of an Uncertain Source
Authors:
Mukul Gagrani,
Yi Ouyang,
Mohammad Rasouli,
Ashutosh Nayyar
Abstract:
Consider a remote estimation problem where a sensor wants to communicate the state of an uncertain source to a remote estimator over a finite time horizon. The uncertain source is modeled as an autoregressive process with bounded noise. Given that the sensor has a limited communication budget, the sensor must decide when to transmit the state to the estimator who has to produce real-time estimates…
▽ More
Consider a remote estimation problem where a sensor wants to communicate the state of an uncertain source to a remote estimator over a finite time horizon. The uncertain source is modeled as an autoregressive process with bounded noise. Given that the sensor has a limited communication budget, the sensor must decide when to transmit the state to the estimator who has to produce real-time estimates of the source state. In this paper, we consider the problem of finding a scheduling strategy for the sensor and an estimation strategy for the estimator to jointly minimize the worst-case maximum instantaneous estimation error over the time horizon. This leads to a decentralized minimax decision-making problem. We obtain a complete characterization of optimal strategies for this decentralized minimax problem. In particular, we show that an open loop communication scheduling strategy is optimal and the optimal estimate depends only on the most recently received sensor observation.
△ Less
Submitted 8 February, 2019;
originally announced February 2019.
-
Active Hypothesis Testing: Beyond Chernoff-Stein
Authors:
Dhruva Kartik,
Ashutosh Nayyar,
Urbashi Mitra
Abstract:
An active hypothesis testing problem is formulated. In this problem, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The objective is to minimize the probability of making an incorrect inference (misclassification probability) while ensuring that the true hypothesis is declare…
▽ More
An active hypothesis testing problem is formulated. In this problem, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The objective is to minimize the probability of making an incorrect inference (misclassification probability) while ensuring that the true hypothesis is declared conclusively with moderately high probability. For this problem, lower and upper bounds on the optimal misclassification probability are derived and these bounds are shown to be asymptotically tight. In the analysis, a sub-problem, which can be viewed as a generalization of the Chernoff-Stein lemma, is formulated and analyzed. A heuristic approach to strategy design is proposed and its relationship with existing heuristic strategies is discussed.
△ Less
Submitted 21 January, 2019;
originally announced January 2019.
-
Sequential Experiment Design for Hypothesis Verification
Authors:
Dhruva Kartik,
Ashutosh Nayyar,
Urbashi Mitra
Abstract:
Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the…
▽ More
Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the confidence level on this hypothesis to the desired level. In this paper, the focus is on the verification phase. A confidence measure is defined and active hypothesis testing is formulated as a confidence maximization problem in an infinite-horizon average-reward Partially Observable Markov Decision Process (POMDP) setting. The problem of maximizing confidence conditioned on a particular hypothesis is referred to as the hypothesis verification problem. The relationship between hypothesis testing and verification problems is established. The verification problem can be formulated as a Markov Decision Process (MDP). Optimal solutions for the verification MDP are characterized and a simple heuristic adaptive strategy for verification is proposed based on a zero-sum game interpretation of Kullback-Leibler divergences. It is demonstrated through numerical experiments that the heuristic performs better in some scenarios compared to existing methods in literature.
△ Less
Submitted 3 December, 2018;
originally announced December 2018.
-
Optimal Infinite Horizon Decentralized Networked Controllers with Unreliable Communication
Authors:
Yi Ouyang,
Seyed Mohammad Asghari,
Ashutosh Nayyar
Abstract:
We consider a decentralized networked control system (DNCS) consisting of a remote controller and a collection of linear plants, each associated with a local controller. Each local controller directly observes the state of its co-located plant and can inform the remote controller of the plant's state through an unreliable uplink channel. The downlink channels from the remote controller to local co…
▽ More
We consider a decentralized networked control system (DNCS) consisting of a remote controller and a collection of linear plants, each associated with a local controller. Each local controller directly observes the state of its co-located plant and can inform the remote controller of the plant's state through an unreliable uplink channel. The downlink channels from the remote controller to local controllers were assumed to be perfect. The objective of the local controllers and the remote controller is to cooperatively minimize the infinite horizon time average of expected quadratic cost. The finite horizon version of this problem was solved in our prior work [2]. The optimal strategies in the finite horizon case were shown to be characterized by coupled Riccati recursions. In this paper, we show that if the link failure probabilities are below certain critical thresholds, then the coupled Riccati recursions of the finite horizon solution reach a steady state and the corresponding decentralized strategies are optimal. Above these thresholds, we show that no strategy can achieve finite cost. We exploit a connection between our DNCS Riccati recursions and the coupled Riccati recursions of an auxiliary Markov jump linear system to obtain our results. Our main results in Theorems 1 and 2 explicitly identify the critical thresholds for the link failure probabilities and the optimal decentralized control strategies when all link failure probabilities are below their thresholds.
△ Less
Submitted 18 June, 2018;
originally announced June 2018.
-
Decentralized Control of Stochastically Switched Linear System with Unreliable Communication
Authors:
Seyed Mohammad Asghari,
Yi Ouyang,
Ashutosh Nayyar
Abstract:
We consider a networked control system (NCS) consisting of two plants, a global plant and a local plant, and two controllers, a global controller and a local controller. The global (resp. local) plant follows discrete-time stochastically switched linear dynamics with a continuous global (resp. local) state and a discrete global (resp. local) mode. We assume that the state and mode of the global pl…
▽ More
We consider a networked control system (NCS) consisting of two plants, a global plant and a local plant, and two controllers, a global controller and a local controller. The global (resp. local) plant follows discrete-time stochastically switched linear dynamics with a continuous global (resp. local) state and a discrete global (resp. local) mode. We assume that the state and mode of the global plant are observed by both controllers while the state and mode of the local plant are only observed by the local controller. The local controller can inform the global controller of the local plant's state and mode through an unreliable TCP-like communication channel where successful transmissions are acknowledged. The objective of the controllers is to cooperatively minimize a modes-dependent quadratic cost over a finite time horizon. Following the method developed in [1] and [2], we construct a dynamic program based on common information and a decomposition of strategies, and use it to obtain explicit optimal strategies for the controllers. In the optimal strategies, both controllers compute a common estimate of the local plant's state. The global controller's action is linear in the state of the global plant and the common estimated state, and the local controller's action is linear in the actual states of both plants and the common estimated state. Furthermore, the gain matrices for the global controller depend on the global mode and its observation about the local mode, while the gain matrices for the local controller depend on the actual modes of both plants and the global controller's observation about the local mode.
△ Less
Submitted 1 February, 2018;
originally announced February 2018.
-
An Approximation Algorithm for Optimal Clique Cover Delivery in Coded Caching
Authors:
Seyed Mohammad Asghari,
Yi Ouyang,
Ashutosh Nayyar,
A. Salman Avestimehr
Abstract:
Coded caching can significantly reduce the communication bandwidth requirement for satisfying users' demands by utilizing the multicasting gain among multiple users. Most existing works assume that the users follow the prescriptions for content placement made by the system. However, users may prefer to decide what files to cache. To address this issue, we consider a network consisting of a file se…
▽ More
Coded caching can significantly reduce the communication bandwidth requirement for satisfying users' demands by utilizing the multicasting gain among multiple users. Most existing works assume that the users follow the prescriptions for content placement made by the system. However, users may prefer to decide what files to cache. To address this issue, we consider a network consisting of a file server connected through a shared link to $K$ users, each equipped with a cache which has been already filled arbitrarily. Given an arbitrary content placement, the goal is to find a delivery strategy for the server that minimizes the load of the shared link. In this paper, we focus on a specific class of coded multicasting delivery schemes known as the "clique cover delivery scheme". We first formulate the optimal clique cover delivery problem as a combinatorial optimization problem. Using a connection with the weighted set cover problem, we propose an approximation algorithm and show that it provides an approximation ratio of $(1 + \log K)$, while the approximation ratio for the existing coded delivery schemes is linear in $K$. Numerical simulations show that our proposed algorithm provides a considerable bandwidth reduction over the existing coded delivery schemes for almost all content placement schemes.
△ Less
Submitted 28 March, 2019; v1 submitted 29 October, 2017;
originally announced October 2017.
-
Optimal Mechanism Design with Flexible Consumers and Costly Supply
Authors:
Shiva Navabi,
Ashutosh Nayyar
Abstract:
The problem of designing a profit-maximizing, Bayesian incentive compatible and individually rational mechanism with flexible consumers and costly heterogeneous supply is considered. In our setup, each consumer is associated with a flexibility set that describes the subset of goods the consumer is equally interested in. Each consumer wants to consume one good from its flexibility set. The flexibil…
▽ More
The problem of designing a profit-maximizing, Bayesian incentive compatible and individually rational mechanism with flexible consumers and costly heterogeneous supply is considered. In our setup, each consumer is associated with a flexibility set that describes the subset of goods the consumer is equally interested in. Each consumer wants to consume one good from its flexibility set. The flexibility set of a consumer and the utility it gets from consuming a good from its flexibility set are its private information. We adopt the flexibility model of [1] and focus on the case of nested flexibility sets -- each consumer's flexibility set can be one of k nested sets. Examples of settings with this inherent nested structure are provided. On the supply side, we assume that the seller has an initial stock of free supply but it can purchase more goods for each of the nested sets at fixed exogenous prices. We characterize the allocation and purchase rules for a profit-maximizing, Bayesian incentive compatible and individually rational mechanism as the solution to an integer program. The optimal payment function is pinned down by the optimal allocation rule in the form of an integral equation. We show that the nestedness of flexibility sets can be exploited to obtain a simple description of the optimal allocations, purchases and payments in terms of thresholds that can be computed through a straightforward iterative procedure.
△ Less
Submitted 25 October, 2017;
originally announced October 2017.
-
Learning Unknown Markov Decision Processes: A Thompson Sampling Approach
Authors:
Yi Ouyang,
Mukul Gagrani,
Ashutosh Nayyar,
Rahul Jain
Abstract:
We consider the problem of learning an unknown Markov Decision Process (MDP) that is weakly communicating in the infinite horizon setting. We propose a Thompson Sampling-based reinforcement learning algorithm with dynamic episodes (TSDE). At the beginning of each episode, the algorithm generates a sample from the posterior distribution over the unknown model parameters. It then follows the optimal…
▽ More
We consider the problem of learning an unknown Markov Decision Process (MDP) that is weakly communicating in the infinite horizon setting. We propose a Thompson Sampling-based reinforcement learning algorithm with dynamic episodes (TSDE). At the beginning of each episode, the algorithm generates a sample from the posterior distribution over the unknown model parameters. It then follows the optimal stationary policy for the sampled model for the rest of the episode. The duration of each episode is dynamically determined by two stop** criteria. The first stop** criterion controls the growth rate of episode length. The second stop** criterion happens when the number of visits to any state-action pair is doubled. We establish $\tilde O(HS\sqrt{AT})$ bounds on expected regret under a Bayesian setting, where $S$ and $A$ are the sizes of the state and action spaces, $T$ is time, and $H$ is the bound of the span. This regret bound matches the best available bound for weakly communicating MDPs. Numerical results show it to perform better than existing algorithms for infinite horizon MDPs.
△ Less
Submitted 13 September, 2017;
originally announced September 2017.
-
Optimal Local and Remote Controllers with Unreliable Uplink Channels
Authors:
Seyed Mohammad Asghari,
Yi Ouyang,
Ashutosh Nayyar
Abstract:
We consider a networked control system consisting of a remote controller and a collection of linear plants, each associated with a local controller. Each local controller directly observes the state of its co-located plant and can inform the remote controller of the plant's state through an unreliable uplink channel. We assume that the downlink channels from the remote controller to local controll…
▽ More
We consider a networked control system consisting of a remote controller and a collection of linear plants, each associated with a local controller. Each local controller directly observes the state of its co-located plant and can inform the remote controller of the plant's state through an unreliable uplink channel. We assume that the downlink channels from the remote controller to local controllers are perfect. The objective of the local controllers and the remote controller is to cooperatively minimize a quadratic performance cost. We provide a dynamic program for this decentralized control problem using the common information approach. Although our problem is not a partially nested problem, we obtain explicit optimal strategies for all controllers. In the optimal strategies, all controllers compute common estimates of the states of the plants based on the common information obtained from the communication network. The remote controller's action is linear in the common state estimates, and the action of each local controller is linear in both the actual state of its co-located plant and the common state estimates. We illustrate our results with numerical experiments using randomly generated models.
△ Less
Submitted 16 June, 2018; v1 submitted 22 November, 2016;
originally announced November 2016.
-
Dynamic Teams and Decentralized Control Problems with Substitutable Actions
Authors:
Seyed Mohammad Asghari,
Ashutosh Nayyar
Abstract:
This paper considers two problems -- a dynamic team problem and a decentralized control problem. The problems we consider do not belong to the known classes of "simpler" dynamic team/decentralized control problems such as partially nested or quadratically invariant problems. However, we show that our problems admit simple solutions under an assumption referred to as the substitutability assumption…
▽ More
This paper considers two problems -- a dynamic team problem and a decentralized control problem. The problems we consider do not belong to the known classes of "simpler" dynamic team/decentralized control problems such as partially nested or quadratically invariant problems. However, we show that our problems admit simple solutions under an assumption referred to as the substitutability assumption. Intuitively, substitutability in a team (resp. decentralized control) problem means that the effects of one team member's (resp. controller's) action on the cost function and the information (resp. state dynamics) can be achieved by an action of another member (resp. controller). For the non-partially-nested LQG dynamic team problem, it is shown that under certain conditions linear strategies are optimal. For the non-partially-nested decentralized LQG control problem, the state structure can be exploited to obtain optimal control strategies with recursively update-able sufficient statistics. These results suggest that substitutability can work as a counterpart of the information structure requirements that enable simplification of dynamic teams and decentralized control problems.
△ Less
Submitted 11 November, 2016;
originally announced November 2016.
-
Optimal Auction Design for Flexible Consumers
Authors:
Shiva Navabi,
Ashutosh Nayyar
Abstract:
We study the problem of designing revenue-maximizing auctions for allocating multiple goods to flexible consumers. In our model, each consumer is interested in a subset of goods known as its flexibility set and wants to consume one good from this set. A consumer's flexibility set and its utility from consuming a good from its flexibility set are its private information. We focus on the case of nes…
▽ More
We study the problem of designing revenue-maximizing auctions for allocating multiple goods to flexible consumers. In our model, each consumer is interested in a subset of goods known as its flexibility set and wants to consume one good from this set. A consumer's flexibility set and its utility from consuming a good from its flexibility set are its private information. We focus on the case of nested flexibility sets --- each consumer's flexibility set can be one of $k$ nested sets. We provide several examples where such nested flexibility sets may arise. We characterize the allocation rule for an incentive compatible, individually rational and revenue-maximizing auction as the solution to an integer program. The corresponding payment rule is described by an integral equation. We then leverage the nestedness of flexibility sets to simplify the optimal auction and provide a complete characterization of allocations and payments in terms of simple thresholds.
△ Less
Submitted 31 January, 2018; v1 submitted 8 July, 2016;
originally announced July 2016.
-
Estimating the Dissemination of Social and Mobile Search in Categories of Information Needs Using Websites as Proxies
Authors:
Christoph Fuchs,
Akash Nayyar,
Ruth Nussbaumer,
Georg Groh
Abstract:
With the increasing popularity of social means to satisfy information needs using Social Media (e.g., Social Media Question Asking, SMQA) or Social Information Retrieval approaches, this paper tries to identify types of information needs which are inherently social and therefore better suited for those techniques. We describe an experiment where prominent websites from various content categories a…
▽ More
With the increasing popularity of social means to satisfy information needs using Social Media (e.g., Social Media Question Asking, SMQA) or Social Information Retrieval approaches, this paper tries to identify types of information needs which are inherently social and therefore better suited for those techniques. We describe an experiment where prominent websites from various content categories are used to represent their respective content area and allow to correlate attributes of the content areas. The underlying assumption is that successful websites for focused content areas perfectly align with the information seekers' requirements when satisfying information needs in the respective content areas. Based on a manually collected dataset of URLs from websites covering a broad range of topics taken from Alexa (http://www.alexa.com} (retrieved 2015-11-04)) (a company that publishes statistics about web traffic), a crowdsourcing approach is employed to rate the information needs that could get solved by the respective URLs according to several dimensions (incl. sociality and mobility) to investigate possible correlations with other attributes. Our results suggest that information needs which do not require a certain formal expertise play an important role in social information retrieval and that some content areas are better suited for social information retrieval (e.g., Factual Knowledge & News, Games, Lifestyle) than others (e.g., Health & Lifestyle).
△ Less
Submitted 7 July, 2016;
originally announced July 2016.
-
Optimal Local and Remote Controllers with Unreliable Communication
Authors:
Yi Ouyang,
Seyed Mohammad Asghari,
Ashutosh Nayyar
Abstract:
We consider a decentralized optimal control problem for a linear plant controlled by two controllers, a local controller and a remote controller. The local controller directly observes the state of the plant and can inform the remote controller of the plant state through a packet-drop channel. We assume that the remote controller is able to send acknowledgments to the local controller to signal th…
▽ More
We consider a decentralized optimal control problem for a linear plant controlled by two controllers, a local controller and a remote controller. The local controller directly observes the state of the plant and can inform the remote controller of the plant state through a packet-drop channel. We assume that the remote controller is able to send acknowledgments to the local controller to signal the successful receipt of transmitted packets. The objective of the two controllers is to cooperatively minimize a quadratic performance cost. We provide a dynamic program for this decentralized control problem using the common information approach. Although our problem is not a partially nested LQG problem, we obtain explicit optimal strategies for the two controllers. In the optimal strategies, both controllers compute a common estimate of the plant state based on the common information. The remote controller's action is linear in the common estimated state, and the local controller's action is linear in both the actual state and the common estimated state.
△ Less
Submitted 23 June, 2016;
originally announced June 2016.
-
Decentralized Control Problems with Substitutable Actions
Authors:
Seyed Mohammad Asghari,
Ashutosh Nayyar
Abstract:
We consider a decentralized system with multiple controllers and define substitutability of one controller by another in open-loop strategies. We explore the implications of this property on the optimization of closed-loop strategies. In particular, we focus on the decentralized LQG problem with substitutable actions. Even though the problem we formulate does not belong to the known classes of "si…
▽ More
We consider a decentralized system with multiple controllers and define substitutability of one controller by another in open-loop strategies. We explore the implications of this property on the optimization of closed-loop strategies. In particular, we focus on the decentralized LQG problem with substitutable actions. Even though the problem we formulate does not belong to the known classes of "simpler" decentralized problems such as partially nested or quadratically invariant problems, our results show that, under the substitutability assumption, linear strategies are optimal and we provide a complete state space characterization of optimal strategies. We also identify a family of information structures that all give the same optimal cost as the centralized information structure under the substitutability assumption. Our results suggest that open-loop substitutability can work as a counterpart of the information structure requirements that enable simplification of decentralized control problems.
△ Less
Submitted 10 January, 2016;
originally announced January 2016.
-
Rate-constrained Energy Services: Allocation Policies and Market Decisions
Authors:
Ashutosh Nayyar,
Matias Negrete-Pincetic,
Kameshwar Poolla,
Pravin Varaiya
Abstract:
The integration of renewable generation poses operational and economic challenges for the electricity grid. For the core problem of power balance, the legacy paradigm of tailoring supply to follow random demand may be inappropriate under deep penetration of uncertain and intermittent renewable generation. In this situation, there is an emerging consensus that the alternative approach of controllin…
▽ More
The integration of renewable generation poses operational and economic challenges for the electricity grid. For the core problem of power balance, the legacy paradigm of tailoring supply to follow random demand may be inappropriate under deep penetration of uncertain and intermittent renewable generation. In this situation, there is an emerging consensus that the alternative approach of controlling demand to follow random supply offers compelling economic benefits in terms of reduced regulation costs. This approach exploits the flexibility of demand side resources and requires sensing, actuation, and communication infrastructure; distributed control algorithms; and viable schemes to compensate participating loads. This paper considers rate-constrained energy services which are a specific paradigm for flexible demand. These services are characterized by a specified delivery window, the total amount of energy that must be supplied over this window, and the maximum rate at which this energy may be delivered. We consider a forward market where rate-constrained energy services are traded. We explore allocation policies and market decisions of a supplier in this market. The supplier owns a generation mix that includes some uncertain renewable generation and may also purchase energy in day-ahead and real-time markets to meet customer demand. The supplier must optimally select the portfolio of rate-constrained services to sell, the amount of day-ahead energy to buy, and the policies for making real-time energy purchases and allocations to customers to maximize its expected profit. We offer solutions to the supplier's decision and control problems to economically provide rate constrained energy services.
△ Less
Submitted 24 September, 2014;
originally announced September 2014.
-
Duration-differentiated Energy Services with a Continuum of Loads
Authors:
Ashutosh Nayyar,
Matias Negrete-Pincetic,
Kameshwar Poolla,
Pravin Varaiya
Abstract:
As the proportion of total power supplied by renewable sources increases, it gets more costly to use reserve generation to compensate for the variability of renewables like solar and wind. Hence attention has been drawn to exploiting flexibility in demand as a substitute for reserve generation. Flexibility has different attributes. In this paper we consider loads requiring a constant power for a s…
▽ More
As the proportion of total power supplied by renewable sources increases, it gets more costly to use reserve generation to compensate for the variability of renewables like solar and wind. Hence attention has been drawn to exploiting flexibility in demand as a substitute for reserve generation. Flexibility has different attributes. In this paper we consider loads requiring a constant power for a specified duration (within say one day), whose flexibility resides in the fact that power may be delivered at any time so long as the total duration of service equals the load's specified duration. We give conditions under which a variable power supply is adequate to meet these flexible loads, and describe how to allocate the power to the loads. We also characterize the additional power needed when the supply is inadequate. We study the problem of allocating the available power to loads to maximize welfare, and show that the welfare optimum can be sustained as a competitive equilibrium in a forward market in which electricity is sold as service contracts differentiated by the duration of service and power level. We compare this forward market with a spot market in their ability to capture the flexiblity inherent in duration-differentiated loads.
△ Less
Submitted 25 August, 2014;
originally announced August 2014.
-
Optimal Control for LQG Systems on Graphs---Part I: Structural Results
Authors:
Ashutosh Nayyar,
Laurent Lessard
Abstract:
In this two-part paper, we identify a broad class of decentralized output-feedback LQG systems for which the optimal control strategies have a simple intuitive estimation structure and can be computed efficiently. Roughly, we consider the class of systems for which the coupling of dynamics among subsystems and the inter-controller communication is characterized by the same directed graph. Furtherm…
▽ More
In this two-part paper, we identify a broad class of decentralized output-feedback LQG systems for which the optimal control strategies have a simple intuitive estimation structure and can be computed efficiently. Roughly, we consider the class of systems for which the coupling of dynamics among subsystems and the inter-controller communication is characterized by the same directed graph. Furthermore, this graph is assumed to be a multitree, that is, its transitive reduction can have at most one directed path connecting each pair of nodes. In this first part, we derive sufficient statistics that may be used to aggregate each controller's growing available information. Each controller must estimate the states of the subsystems that it affects (its descendants) as well as the subsystems that it observes (its ancestors). The optimal control action for a controller is a linear function of the estimate it computes as well as the estimates computed by all of its ancestors. Moreover, these state estimates may be updated recursively, much like a Kalman filter.
△ Less
Submitted 11 August, 2014;
originally announced August 2014.
-
Duration-Differentiated Services in Electricity
Authors:
Ashutosh Nayyar,
Matias Negrete-Pincetic,
Kameshwar Poolla,
Pravin Varaiya
Abstract:
The integration of renewable sources poses challenges at the operational and economic levels of the power grid. In terms of kee** the balance between supply and demand, the usual scheme of supply following load may not be appropriate for large penetration levels of uncertain and intermittent renewable supply. In this paper, we focus on an alternative scheme in which the load follows the supply,…
▽ More
The integration of renewable sources poses challenges at the operational and economic levels of the power grid. In terms of kee** the balance between supply and demand, the usual scheme of supply following load may not be appropriate for large penetration levels of uncertain and intermittent renewable supply. In this paper, we focus on an alternative scheme in which the load follows the supply, exploiting the flexibility associated with the demand side. We consider a model of flexible loads that are to be serviced by zero-marginal cost renewable power together with conventional generation if necessary. Each load demands 1 kW for a specified number of time slots within an operational period. The flexibility of a load resides in the fact that the service may be delivered over any slots within the operational period. Loads therefore require flexible energy services that are differentiated by the demanded duration. We focus on two problems associated with durations-differentiated loads. The first problem deals with the operational decisions that a supplier has to make to serve a given set of duration differentiated loads. The second problem focuses on a market implementation for duration differentiated services. We give necessary and sufficient conditions under which the available power can service the loads, and we describe an algorithm that constructs an appropriate allocation. In the event the available supply is inadequate, we characterize the minimum amount of power that must be purchased to service the loads. Next we consider a forward market where consumers can purchase duration differentiated energy services. We first characterize social welfare maximizing allocations in this forward market and then show the existence of an efficient competitive equilibrium.
△ Less
Submitted 3 April, 2014;
originally announced April 2014.
-
Signaling in sensor networks for sequential detection
Authors:
Ashutosh Nayyar,
Demosthenis Teneketzis
Abstract:
Sequential detection problems in sensor networks are considered. The true state of nature/true hypothesis is modeled as a binary random variable $H$ with known prior distribution. There are $N$ sensors making noisy observations about the hypothesis; $\mathcal{N} =\{1,2,\ldots,N\}$ denotes the set of sensors. Sensor $i$ can receive messages from a subset $\mathcal{P}^i \subset \mathcal{N}$ of senso…
▽ More
Sequential detection problems in sensor networks are considered. The true state of nature/true hypothesis is modeled as a binary random variable $H$ with known prior distribution. There are $N$ sensors making noisy observations about the hypothesis; $\mathcal{N} =\{1,2,\ldots,N\}$ denotes the set of sensors. Sensor $i$ can receive messages from a subset $\mathcal{P}^i \subset \mathcal{N}$ of sensors and send a message to a subset $\mathcal{C}^i \subset \mathcal{N}$. Each sensor is faced with a stop** problem. At each time $t$, based on the observations it has taken so far and the messages it may have received, sensor $i$ can decide to stop and communicate a binary decision to the sensors in $\mathcal{C}^i$, or it can continue taking observations and receiving messages. After sensor $i$'s binary decision has been sent, it becomes inactive. Sensors incur operational costs (cost of taking observations, communication costs etc.) while they are active. In addition, the system incurs a terminal cost that depends on the true hypothesis $H$, the sensors' binary decisions and their stop** times. The objective is to determine decision strategies for all sensors to minimize the total expected cost.
△ Less
Submitted 12 March, 2014;
originally announced March 2014.
-
Sufficient statistics for linear control strategies in decentralized systems with partial history sharing
Authors:
Aditya Mahajan,
Ashutosh Nayyar
Abstract:
In decentralized control systems with linear dynamics, quadratic cost, and Gaussian disturbance (also called decentralized LQG systems) linear control strategies are not always optimal. Nonetheless, linear control strategies are appealing due to analytic and implementation simplicity. In this paper, we investigate decentralized LQG systems with partial history sharing information structure and ide…
▽ More
In decentralized control systems with linear dynamics, quadratic cost, and Gaussian disturbance (also called decentralized LQG systems) linear control strategies are not always optimal. Nonetheless, linear control strategies are appealing due to analytic and implementation simplicity. In this paper, we investigate decentralized LQG systems with partial history sharing information structure and identify finite dimensional sufficient statistics for such systems. Unlike prior work on decentralized LQG systems, we do not assume partially nestedness or quadratic invariance. Our approach is based on the common information approach of Nayyar \emph{et al}, 2013 and exploits the linearity of the system dynamics and control strategies. To illustrate our methodology, we identify sufficient statistics for linear strategies in decentralized systems where controllers communicate over a strongly connected graph with finite delays, and for decentralized systems consisting of coupled subsystems with control sharing or one-sided one step delay sharing information structures.
△ Less
Submitted 11 March, 2014;
originally announced March 2014.
-
Common Information based Markov Perfect Equilibria for Linear-Gaussian Games with Asymmetric Information
Authors:
Abhishek Gupta,
Ashutosh Nayyar,
Cedric Langbort,
Tamer Basar
Abstract:
We consider a class of two-player dynamic stochastic nonzero-sum games where the state transition and observation equations are linear, and the primitive random variables are Gaussian. Each controller acquires possibly different dynamic information about the state process and the other controller's past actions and observations. This leads to a dynamic game of asymmetric information among the cont…
▽ More
We consider a class of two-player dynamic stochastic nonzero-sum games where the state transition and observation equations are linear, and the primitive random variables are Gaussian. Each controller acquires possibly different dynamic information about the state process and the other controller's past actions and observations. This leads to a dynamic game of asymmetric information among the controllers. Building on our earlier work on finite games with asymmetric information, we devise an algorithm to compute a Nash equilibrium by using the common information among the controllers. We call such equilibria common information based Markov perfect equilibria of the game, which can be viewed as a refinement of Nash equilibrium in games with asymmetric information. If the players' cost functions are quadratic, then we show that under certain conditions a unique common information based Markov perfect equilibrium exists. Furthermore, this equilibrium can be computed by solving a sequence of linear equations. We also show through an example that there could be other Nash equilibria in a game of asymmetric information, not corresponding to common information based Markov perfect equilibria.
△ Less
Submitted 19 January, 2014;
originally announced January 2014.
-
Structural Results and Explicit Solution for Two-Player LQG Systems on a Finite Time Horizon
Authors:
Laurent Lessard,
Ashutosh Nayyar
Abstract:
It is well-known that linear dynamical systems with Gaussian noise and quadratic cost (LQG) satisfy a separation principle. Finding the optimal controller amounts to solving separate dual problems; one for control and one for estimation. For the discrete-time finite-horizon case, each problem is a simple forward or backward recursion. In this paper, we consider a generalization of the LQG problem…
▽ More
It is well-known that linear dynamical systems with Gaussian noise and quadratic cost (LQG) satisfy a separation principle. Finding the optimal controller amounts to solving separate dual problems; one for control and one for estimation. For the discrete-time finite-horizon case, each problem is a simple forward or backward recursion. In this paper, we consider a generalization of the LQG problem in which there are two controllers. Each controller is responsible for one of two system inputs, but has access to different subsets of the available measurements. Our paper has three main contributions. First, we prove a fundamental structural result: sufficient statistics for the controllers can be expressed as conditional means of the global state. Second, we give explicit state-space formulae for the optimal controller. These formulae are reminiscent of the classical LQG solution with dual forward and backward recursions, but with the important difference that they are intricately coupled. Lastly, we show how these recursions can be solved efficiently, with computational complexity comparable to that of the centralized problem.
△ Less
Submitted 6 September, 2013; v1 submitted 13 March, 2013;
originally announced March 2013.
-
Nash Equilibria for Stochastic Games with Asymmetric Information-Part 1: Finite Games
Authors:
Ashutosh Nayyar,
Abhishek Gupta,
Cédric Langbort,
Tamer Başar
Abstract:
A model of stochastic games where multiple controllers jointly control the evolution of the state of a dynamic system but have access to different information about the state and action processes is considered. The asymmetry of information among the controllers makes it difficult to compute or characterize Nash equilibria. Using common information among the controllers, the game with asymmetric in…
▽ More
A model of stochastic games where multiple controllers jointly control the evolution of the state of a dynamic system but have access to different information about the state and action processes is considered. The asymmetry of information among the controllers makes it difficult to compute or characterize Nash equilibria. Using common information among the controllers, the game with asymmetric information is shown to be equivalent to another game with symmetric information. Further, under certain conditions, a Markov state is identified for the equivalent symmetric information game and its Markov perfect equilibria are characterized. This characterization provides a backward induction algorithm to find Nash equilibria of the original game with asymmetric information in pure or behavioral strategies. Each step of this algorithm involves finding Bayesian Nash equilibria of a one-stage Bayesian game. The class of Nash equilibria of the original game that can be characterized in this backward manner are named common information based Markov perfect equilibria.
△ Less
Submitted 17 September, 2012;
originally announced September 2012.
-
Decentralized Stochastic Control with Partial History Sharing: A Common Information Approach
Authors:
Ashutosh Nayyar,
Aditya Mahajan,
Demosthenis Teneketzis
Abstract:
A general model of decentralized stochastic control called partial history sharing information structure is presented. In this model, at each step the controllers share part of their observation and control history with each other. This general model subsumes several existing models of information sharing as special cases. Based on the information commonly known to all the controllers, the decentr…
▽ More
A general model of decentralized stochastic control called partial history sharing information structure is presented. In this model, at each step the controllers share part of their observation and control history with each other. This general model subsumes several existing models of information sharing as special cases. Based on the information commonly known to all the controllers, the decentralized problem is reformulated as an equivalent centralized problem from the perspective of a coordinator. The coordinator knows the common information and select prescriptions that map each controller's local information to its control actions. The optimal control problem at the coordinator is shown to be a partially observable Markov decision process (POMDP) which is solved using techniques from Markov decision theory. This approach provides (a) structural results for optimal strategies, and (b) a dynamic program for obtaining optimal strategies for all controllers in the original decentralized problem. Thus, this approach unifies the various ad-hoc approaches taken in the literature. In addition, the structural results on optimal control strategies obtained by the proposed approach cannot be obtained by the existing generic approach (the person-by-person approach) for obtaining structural results in decentralized problems; and the dynamic program obtained by the proposed approach is simpler than that obtained by the existing generic approach (the designer's approach) for obtaining dynamic programs in decentralized problems.
△ Less
Submitted 8 September, 2012;
originally announced September 2012.
-
Optimal Strategies for Communication and Remote Estimation with an Energy Harvesting Sensor
Authors:
Ashutosh Nayyar,
Tamer Basar,
Demosthenis Teneketzis,
Venugopal V. Veeravalli
Abstract:
We consider a remote estimation problem with an energy harvesting sensor and a remote estimator. The sensor observes the state of a discrete-time source which may be a finite state Markov chain or a multi-dimensional linear Gaussian system. It harvests energy from its environment (say, for example, through a solar cell) and uses this energy for the purpose of communicating with the estimator. Due…
▽ More
We consider a remote estimation problem with an energy harvesting sensor and a remote estimator. The sensor observes the state of a discrete-time source which may be a finite state Markov chain or a multi-dimensional linear Gaussian system. It harvests energy from its environment (say, for example, through a solar cell) and uses this energy for the purpose of communicating with the estimator. Due to the randomness of energy available for communication, the sensor may not be able to communicate all the time. The sensor may also want to save its energy for future communications. The estimator relies on messages communicated by the sensor to produce real-time estimates of the source state. We consider the problem of finding a communication scheduling strategy for the sensor and an estimation strategy for the estimator that jointly minimize an expected sum of communication and distortion costs over a finite time horizon. Our goal of joint optimization leads to a decentralized decision-making problem. By viewing the problem from the estimator's perspective, we obtain a dynamic programming characterization for the decentralized decision-making problem that involves optimization over functions. Under some symmetry assumptions on the source statistics and the distortion metric, we show that an optimal communication strategy is described by easily computable thresholds and that the optimal estimate is a simple function of the most recently received sensor observation.
△ Less
Submitted 27 May, 2012;
originally announced May 2012.