Search | arXiv e-print repository

Optimal Control of Logically Constrained Partially Observable and Multi-Agent Markov Decision Processes

Authors: Krishna C. Kalagarla, Dhruva Kartik, Dongming Shen, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo

Abstract: Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we firs… ▽ More Autonomous systems often have logical constraints arising, for example, from safety, operational, or regulatory requirements. Such constraints can be expressed using temporal logic specifications. The system state is often partially observable. Moreover, it could encompass a team of multiple agents with a common objective but disparate information structures and constraints. In this paper, we first introduce an optimal control theory for partially observable Markov decision processes (POMDPs) with finite linear temporal logic constraints. We provide a structured methodology for synthesizing policies that maximize a cumulative reward while ensuring that the probability of satisfying a temporal logic constraint is sufficiently high. Our approach comes with guarantees on approximate reward optimality and constraint satisfaction. We then build on this approach to design an optimal control framework for logically constrained multi-agent settings with information asymmetry. We illustrate the effectiveness of our approach by implementing it on several case studies. △ Less

Submitted 19 June, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2203.09038

arXiv:2305.02401 [pdf, other]

Synthetic DOmain-Targeted Augmentation (S-DOTA) Improves Model Generalization in Digital Pathology

Authors: Sai Chowdary Gullapally, Yibo Zhang, Nitin Kumar Mittal, Deeksha Kartik, Sandhya Srinivasan, Kevin Rose, Daniel Shenker, Dinkar Juyal, Harshith Padigela, Raymond Biju, Victor Minden, Chirag Maheshwari, Marc Thibault, Zvi Goldstein, Luke Novak, Nidhi Chandra, Justin Lee, Aaditya Prakash, Chintan Shah, John Abel, Darren Fahy, Amaro Taylor-Weiner, Anand Sampat

Abstract: Machine learning algorithms have the potential to improve patient outcomes in digital pathology. However, generalization of these tools is currently limited by sensitivity to variations in tissue preparation, staining procedures and scanning equipment that lead to domain shift in digitized slides. To overcome this limitation and improve model generalization, we studied the effectiveness of two Syn… ▽ More Machine learning algorithms have the potential to improve patient outcomes in digital pathology. However, generalization of these tools is currently limited by sensitivity to variations in tissue preparation, staining procedures and scanning equipment that lead to domain shift in digitized slides. To overcome this limitation and improve model generalization, we studied the effectiveness of two Synthetic DOmain-Targeted Augmentation (S-DOTA) methods, namely CycleGAN-enabled Scanner Transform (ST) and targeted Stain Vector Augmentation (SVA), and compared them against the International Color Consortium (ICC) profile-based color calibration (ICC Cal) method and a baseline method using traditional brightness, color and noise augmentations. We evaluated the ability of these techniques to improve model generalization to various tasks and settings: four models, two model types (tissue segmentation and cell classification), two loss functions, six labs, six scanners, and three indications (hepatocellular carcinoma (HCC), nonalcoholic steatohepatitis (NASH), prostate adenocarcinoma). We compared these methods based on the macro-averaged F1 scores on in-distribution (ID) and out-of-distribution (OOD) test sets across multiple domains, and found that S-DOTA methods (i.e., ST and SVA) led to significant improvements over ICC Cal and baseline on OOD data while maintaining comparable performance on ID data. Thus, we demonstrate that S-DOTA may help address generalization due to domain shift in real world applications. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2209.03888 [pdf, ps, other]

Optimal Communication and Control Strategies for a Multi-Agent System in the Presence of an Adversary

Authors: Dhruva Kartik, Sagar Sudhakara, Rahul Jain, Ashutosh Nayyar

Abstract: We consider a multi-agent system in which a decentralized team of agents controls a stochastic system in the presence of an adversary. Instead of committing to a fixed information sharing protocol, the agents can strategically decide at each time whether to share their private information with each other or not. The agents incur a cost whenever they communicate with each other and the adversary ma… ▽ More We consider a multi-agent system in which a decentralized team of agents controls a stochastic system in the presence of an adversary. Instead of committing to a fixed information sharing protocol, the agents can strategically decide at each time whether to share their private information with each other or not. The agents incur a cost whenever they communicate with each other and the adversary may eavesdrop on their communication. Thus, the agents in the team must effectively coordinate with each other while being robust to the adversary's malicious actions. We model this interaction between the team and the adversary as a stochastic zero-sum game where the team aims to minimize a cost while the adversary aims to maximize it. Under some assumptions on the adversary's capabilities, we characterize a min-max control and communication strategy for the team. We supplement this characterization with several structural results that can make the computation of the min-max strategy more tractable. △ Less

Submitted 8 September, 2022; originally announced September 2022.

Comments: In proceedings of Conference of Decision and Control (2022)

arXiv:2203.09038 [pdf, other]

Optimal Control of Partially Observable Markov Decision Processes with Finite Linear Temporal Logic Constraints

Authors: Krishna C. Kalagarla, Dhruva Kartik, Dongming Shen, Rahul Jain, Ashutosh Nayyar, Pierluigi Nuzzo

Abstract: Autonomous agents often operate in scenarios where the state is partially observed. In addition to maximizing their cumulative reward, agents must execute complex tasks with rich temporal and logical structures. These tasks can be expressed using temporal logic languages like finite linear temporal logic (LTL_f). This paper, for the first time, provides a structured framework for designing agent p… ▽ More Autonomous agents often operate in scenarios where the state is partially observed. In addition to maximizing their cumulative reward, agents must execute complex tasks with rich temporal and logical structures. These tasks can be expressed using temporal logic languages like finite linear temporal logic (LTL_f). This paper, for the first time, provides a structured framework for designing agent policies that maximize the reward while ensuring that the probability of satisfying the temporal logic specification is sufficiently high. We reformulate the problem as a constrained partially observable Markov decision process (POMDP) and provide a novel approach that can leverage off-the-shelf unconstrained POMDP solvers for solving it. Our approach guarantees approximate optimality and constraint satisfaction with high probability. We demonstrate its effectiveness by implementing it on several models of interest. △ Less

Submitted 16 March, 2022; originally announced March 2022.

arXiv:2107.10140 [pdf, other]

AUGCO: Augmentation Consistency-guided Self-training for Source-free Domain Adaptive Semantic Segmentation

Authors: Viraj Prabhu, Shivam Khare, Deeksha Kartik, Judy Hoffman

Abstract: Most modern approaches for domain adaptive semantic segmentation rely on continued access to source data during adaptation, which may be infeasible due to computational or privacy constraints. We focus on source-free domain adaptation for semantic segmentation, wherein a source model must adapt itself to a new target domain given only unlabeled target data. We propose Augmentation Consistency-guid… ▽ More Most modern approaches for domain adaptive semantic segmentation rely on continued access to source data during adaptation, which may be infeasible due to computational or privacy constraints. We focus on source-free domain adaptation for semantic segmentation, wherein a source model must adapt itself to a new target domain given only unlabeled target data. We propose Augmentation Consistency-guided Self-training (AUGCO), a source-free adaptation algorithm that uses the model's pixel-level predictive consistency across diverse, automatically generated views of each target image along with model confidence to identify reliable pixel predictions, and selectively self-trains on those. AUGCO achieves state-of-the-art results for source-free adaptation on 3 standard benchmarks for semantic segmentation, all within a simple to implement and fast to converge method. △ Less

Submitted 6 January, 2022; v1 submitted 21 July, 2021; originally announced July 2021.

arXiv:2104.10923 [pdf, ps, other]

Optimal communication and control strategies in a multi-agent MDP problem

Authors: Sagar Sudhakara, Dhruva Kartik, Rahul Jain, Ashutosh Nayyar

Abstract: The problem of controlling multi-agent systems under different models of information sharing among agents has received significant attention in the recent literature. In this paper, we consider a setup where rather than committing to a fixed information sharing protocol (e.g. periodic sharing or no sharing etc), agents can dynamically decide at each time step whether to share information with each… ▽ More The problem of controlling multi-agent systems under different models of information sharing among agents has received significant attention in the recent literature. In this paper, we consider a setup where rather than committing to a fixed information sharing protocol (e.g. periodic sharing or no sharing etc), agents can dynamically decide at each time step whether to share information with each other and incur the resulting communication cost. This setup requires a joint design of agents' communication and control strategies in order to optimize the trade-off between communication costs and control objective. We first show that agents can ignore a big part of their private information without compromising the system performance. We then provide a common information approach based solution for the strategy optimization problem. This approach relies on constructing a fictitious POMDP whose solution (obtained via a dynamic program) characterizes the optimal strategies for the agents. We also show that our solution can be easily modified to incorporate constraints on when and how frequently agents can communicate. △ Less

Submitted 22 April, 2021; originally announced April 2021.

arXiv:2102.05838 [pdf, ps, other]

Common Information Belief based Dynamic Programs for Stochastic Zero-sum Games with Competing Teams

Authors: Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

Abstract: Decentralized team problems where players have asymmetric information about the state of the underlying stochastic system have been actively studied, but \emph{games} between such teams are less understood. We consider a general model of zero-sum stochastic games between two competing teams. This model subsumes many previously considered team and zero-sum game models. For this general model, we pr… ▽ More Decentralized team problems where players have asymmetric information about the state of the underlying stochastic system have been actively studied, but \emph{games} between such teams are less understood. We consider a general model of zero-sum stochastic games between two competing teams. This model subsumes many previously considered team and zero-sum game models. For this general model, we provide bounds on the upper (min-max) and lower (max-min) values of the game. Furthermore, if the upper and lower values of the game are identical (i.e., if the game has a \emph{value}), our bounds coincide with the value of the game. Our bounds are obtained using two dynamic programs based on a sufficient statistic known as the common information belief (CIB). We also identify certain information structures in which only the minimizing team controls the evolution of the CIB. In these cases, we show that one of our CIB based dynamic programs can be used to find the min-max strategy (in addition to the min-max value). We propose an approximate dynamic programming approach for computing the values (and the strategy when applicable) and illustrate our results with the help of an example. △ Less

Submitted 27 September, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

Comments: arXiv admin note: text overlap with arXiv:1909.01445

arXiv:2012.11460 [pdf, other]

SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation

Authors: Viraj Prabhu, Shivam Khare, Deeksha Kartik, Judy Hoffman

Abstract: Many existing approaches for unsupervised domain adaptation (UDA) focus on adapting under only data distribution shift and offer limited success under additional cross-domain label distribution shift. Recent work based on self-training using target pseudo-labels has shown promise, but on challenging shifts pseudo-labels may be highly unreliable, and using them for self-training may cause error acc… ▽ More Many existing approaches for unsupervised domain adaptation (UDA) focus on adapting under only data distribution shift and offer limited success under additional cross-domain label distribution shift. Recent work based on self-training using target pseudo-labels has shown promise, but on challenging shifts pseudo-labels may be highly unreliable, and using them for self-training may cause error accumulation and domain misalignment. We propose Selective Entropy Optimization via Committee Consistency (SENTRY), a UDA algorithm that judges the reliability of a target instance based on its predictive consistency under a committee of random image transformations. Our algorithm then selectively minimizes predictive entropy to increase confidence on highly consistent target instances, while maximizing predictive entropy to reduce confidence on highly inconsistent ones. In combination with pseudo-label based approximate target class balancing, our approach leads to significant improvements over the state-of-the-art on 27/31 domain shifts from standard UDA benchmarks as well as benchmarks designed to stress-test adaptation under label distribution shift. △ Less

Submitted 9 October, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: Published at ICCV 2021. Code available at https://github.com/virajprabhu/SENTRY

arXiv:2012.04137 [pdf, ps, other]

Adaptive Sampling for Estimating Distributions: A Bayesian Upper Confidence Bound Approach

Authors: Dhruva Kartik, Neeraj Sood, Urbashi Mitra, Tara Javidi

Abstract: The problem of adaptive sampling for estimating probability mass functions (pmf) uniformly well is considered. Performance of the sampling strategy is measured in terms of the worst-case mean squared error. A Bayesian variant of the existing upper confidence bound (UCB) based approaches is proposed. It is shown analytically that the performance of this Bayesian variant is no worse than the existin… ▽ More The problem of adaptive sampling for estimating probability mass functions (pmf) uniformly well is considered. Performance of the sampling strategy is measured in terms of the worst-case mean squared error. A Bayesian variant of the existing upper confidence bound (UCB) based approaches is proposed. It is shown analytically that the performance of this Bayesian variant is no worse than the existing approaches. The posterior distribution on the pmfs in the Bayesian setting allows for a tighter computation of upper confidence bounds which leads to significant performance gains in practice. Using this approach, adaptive sampling protocols are proposed for estimating SARS-CoV-2 seroprevalence in various groups such as location and ethnicity. The effectiveness of this strategy is discussed using data obtained from a seroprevalence survey in Los Angeles county. △ Less

Submitted 7 December, 2020; originally announced December 2020.

arXiv:2005.07696 [pdf, ps, other]

Testing for Anomalies: Active Strategies and Non-asymptotic Analysis

Authors: Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

Abstract: The problem of verifying whether a multi-component system has anomalies or not is addressed. Each component can be probed over time in a data-driven manner to obtain noisy observations that indicate whether the selected component is anomalous or not. The aim is to minimize the probability of incorrectly declaring the system to be free of anomalies while ensuring that the probability of correctly d… ▽ More The problem of verifying whether a multi-component system has anomalies or not is addressed. Each component can be probed over time in a data-driven manner to obtain noisy observations that indicate whether the selected component is anomalous or not. The aim is to minimize the probability of incorrectly declaring the system to be free of anomalies while ensuring that the probability of correctly declaring it to be safe is sufficiently large. This problem is modeled as an active hypothesis testing problem in the Neyman-Pearson setting. Component-selection and inference strategies are designed and analyzed in the non-asymptotic regime. For a specific class of homogeneous problems, stronger (with respect to prior work) non-asymptotic converse and achievability bounds are provided. △ Less

Submitted 14 May, 2020; originally announced May 2020.

Comments: arXiv admin note: text overlap with arXiv:1911.06912

arXiv:1911.06912 [pdf, ps, other]

Fixed-horizon Active Hypothesis Testing

Authors: Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

Abstract: Two active hypothesis testing problems are formulated. In these problems, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The first problem is an asymmetric formulation in which the the objective is to minimize the probability of incorrectly declaring a particular hypothesis t… ▽ More Two active hypothesis testing problems are formulated. In these problems, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The first problem is an asymmetric formulation in which the the objective is to minimize the probability of incorrectly declaring a particular hypothesis to be true while ensuring that the probability of correctly declaring that hypothesis is moderately high. This formulation can be seen as a generalization of the formulation in the classical Chernoff-Stein lemma to an active setting. The second problem is a symmetric formulation in which the objective is to minimize the probability of making an incorrect inference (misclassification probability) while ensuring that the true hypothesis is declared conclusively with moderately high probability. For these problems, lower and upper bounds on the optimal misclassification probabilities are derived and these bounds are shown to be asymptotically tight. Classical approaches for experiment selection suggest use of randomized and, in some cases, open-loop strategies. As opposed to these classical approaches, fully deterministic and adaptive experiment selection strategies are provided. It is shown that these strategies are asymptotically optimal and further, using numerical experiments, it is demonstrated that these novel experiment selection strategies (coupled with appropriate inference strategies) have a significantly better performance in the non-asymptotic regime. △ Less

Submitted 15 November, 2019; originally announced November 2019.

Comments: Submitted to IEEE Transactions on Automatic Control

arXiv:1909.01445 [pdf, ps, other]

Zero-sum Stochastic Games with Asymmetric Information

Authors: Dhruva Kartik, Ashutosh Nayyar

Abstract: A general model for zero-sum stochastic games with asymmetric information is considered. In this model, each player's information at each time can be divided into a common information part and a private information part. Under certain conditions on the evolution of the common and private information, a dynamic programming characterization of the value of the game (if it exists) is presented. If th… ▽ More A general model for zero-sum stochastic games with asymmetric information is considered. In this model, each player's information at each time can be divided into a common information part and a private information part. Under certain conditions on the evolution of the common and private information, a dynamic programming characterization of the value of the game (if it exists) is presented. If the value of the zero-sum game does not exist, then the dynamic program provides bounds on the upper and lower values of the game. This dynamic program is then used for a class of zero-sum stochastic games with complete information on one side and partial information on the other, that is, games where one player has complete information about state, actions and observation history while the other player may only have partial information about the state and action history. For such games, it is shown that the value exists and can be characterized using the dynamic program. It is further shown that for this class of games, the dynamic program can be used to compute an equilibrium strategy for the more informed player in which the player selects its action using its private information and the common information belief. △ Less

Submitted 24 December, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

Comments: Accepted for presentation at the 58th Conference on Decision and Control (CDC), 2019 Submitted to Dynamic Games and Applications

arXiv:1901.06795 [pdf, ps, other]

Active Hypothesis Testing: Beyond Chernoff-Stein

Authors: Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

Abstract: An active hypothesis testing problem is formulated. In this problem, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The objective is to minimize the probability of making an incorrect inference (misclassification probability) while ensuring that the true hypothesis is declare… ▽ More An active hypothesis testing problem is formulated. In this problem, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The objective is to minimize the probability of making an incorrect inference (misclassification probability) while ensuring that the true hypothesis is declared conclusively with moderately high probability. For this problem, lower and upper bounds on the optimal misclassification probability are derived and these bounds are shown to be asymptotically tight. In the analysis, a sub-problem, which can be viewed as a generalization of the Chernoff-Stein lemma, is formulated and analyzed. A heuristic approach to strategy design is proposed and its relationship with existing heuristic strategies is discussed. △ Less

Submitted 21 January, 2019; originally announced January 2019.

Comments: Submitted to 2019 IEEE International Symposium on Information Theory (ISIT)

arXiv:1812.01137 [pdf, ps, other]

Sequential Experiment Design for Hypothesis Verification

Authors: Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

Abstract: Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the… ▽ More Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the confidence level on this hypothesis to the desired level. In this paper, the focus is on the verification phase. A confidence measure is defined and active hypothesis testing is formulated as a confidence maximization problem in an infinite-horizon average-reward Partially Observable Markov Decision Process (POMDP) setting. The problem of maximizing confidence conditioned on a particular hypothesis is referred to as the hypothesis verification problem. The relationship between hypothesis testing and verification problems is established. The verification problem can be formulated as a Markov Decision Process (MDP). Optimal solutions for the verification MDP are characterized and a simple heuristic adaptive strategy for verification is proposed based on a zero-sum game interpretation of Kullback-Leibler divergences. It is demonstrated through numerical experiments that the heuristic performs better in some scenarios compared to existing methods in literature. △ Less

Submitted 3 December, 2018; originally announced December 2018.

Comments: 52nd Annual Asilomar Conference on Signals, Systems, and Computers. arXiv admin note: text overlap with arXiv:1810.04859

arXiv:1810.04859 [pdf, other]

Policy Design for Active Sequential Hypothesis Testing using Deep Learning

Authors: Dhruva Kartik, Ekraam Sabir, Urbashi Mitra, Prem Natarajan

Abstract: Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally h… ▽ More Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios. △ Less

Submitted 11 October, 2018; originally announced October 2018.

Comments: Accepted at 56th Annual Allerton Conference on Communication, Control, and Computing

Showing 1–15 of 15 results for author: Kartik, D