Search | arXiv e-print repository

Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis

Authors: Sharu Theresa Jose, Shana Moothedath

Abstract: We explore a stochastic contextual linear bandit problem where the agent observes a noisy, corrupted version of the true context through a noise channel with an unknown noise parameter. Our objective is to design an action policy that can approximate" that of an oracle, which has access to the reward model, the channel parameter, and the predictive distribution of the true context from the observe… ▽ More We explore a stochastic contextual linear bandit problem where the agent observes a noisy, corrupted version of the true context through a noise channel with an unknown noise parameter. Our objective is to design an action policy that can approximate" that of an oracle, which has access to the reward model, the channel parameter, and the predictive distribution of the true context from the observed noisy context. In a Bayesian framework, we introduce a Thompson sampling algorithm for Gaussian bandits with Gaussian context noise. Adopting an information-theoretic analysis, we demonstrate the Bayesian regret of our algorithm concerning the oracle's action policy. We also extend this problem to a scenario where the agent observes the true context with some delay after receiving the reward and show that delayed true contexts lead to lower Bayesian regret. Finally, we empirically demonstrate the performance of the proposed algorithms against baselines. △ Less

Submitted 22 March, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

arXiv:2401.11563 [pdf, other]

Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints

Authors: Jiabin Lin, Shana Moothedath

Abstract: We present the problem of conservative distributed multi-task learning in stochastic linear contextual bandits with heterogeneous agents. This extends conservative linear bandits to a distributed setting where M agents tackle different but related tasks while adhering to stage-wise performance constraints. The exact context is unknown, and only a context distribution is available to the agents as… ▽ More We present the problem of conservative distributed multi-task learning in stochastic linear contextual bandits with heterogeneous agents. This extends conservative linear bandits to a distributed setting where M agents tackle different but related tasks while adhering to stage-wise performance constraints. The exact context is unknown, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm constructs a pruned action set during each round to ensure the constraints are met. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. We prove the regret and communication bounds on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. For this setting, we provide a modified algorithm, DiSC-UCB2, and we show that the modified algorithm achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100K data. △ Less

Submitted 9 April, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

arXiv:2303.17043 [pdf, other]

Federated Learning for Heterogeneous Bandits with Unobserved Contexts

Authors: Jiabin Lin, Shana Moothedath

Abstract: We study the problem of federated stochastic multi-arm contextual bandits with unknown contexts, in which M agents are faced with different bandits and collaborate to learn. The communication model consists of a central server and the agents share their estimates with the central server periodically to learn to choose optimal actions in order to minimize the total regret. We assume that the exact… ▽ More We study the problem of federated stochastic multi-arm contextual bandits with unknown contexts, in which M agents are faced with different bandits and collaborate to learn. The communication model consists of a central server and the agents share their estimates with the central server periodically to learn to choose optimal actions in order to minimize the total regret. We assume that the exact contexts are not observable and the agents observe only a distribution of the contexts. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism. Our goal is to develop a distributed and federated algorithm that facilitates collaborative learning among the agents to select a sequence of optimal actions so as to maximize the cumulative reward. By performing a feature vector transformation, we propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions. Finally, we validated the performance of our algorithm and compared it with another baseline approach using numerical simulations on synthetic data and on the real-world movielens dataset. △ Less

Submitted 29 January, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

arXiv:2207.14391 [pdf, other]

Distributed Stochastic Bandit Learning with Delayed Context Observation

Authors: Jiabin Lin, Shana Moothedath

Abstract: We consider the problem where M agents collaboratively interact with an instance of a stochastic K-armed contextual bandit, where K>>M. The goal of the agents is to simultaneously minimize the cumulative regret over all the agents over a time horizon T. We consider a setting where the exact context is observed after a delay and at the time of choosing the action the agents are unaware of the conte… ▽ More We consider the problem where M agents collaboratively interact with an instance of a stochastic K-armed contextual bandit, where K>>M. The goal of the agents is to simultaneously minimize the cumulative regret over all the agents over a time horizon T. We consider a setting where the exact context is observed after a delay and at the time of choosing the action the agents are unaware of the context and only a distribution on the set of contexts is available. Such a situation arises in different applications where at the time of the decision the context needs to be predicted (e.g., weather forecasting or stock market prediction), and the context can be estimated once the reward is obtained. We propose an Upper Confidence Bound (UCB)-based distributed algorithm and prove the regret and communications bounds for linearly parametrized reward functions. We validated the performance of our algorithm via numerical simulations on synthetic data and real-world Movielens data. △ Less

Submitted 15 November, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

arXiv:2204.08117 [pdf, other]

Fast Decentralized Federated Low Rank Matrix Recovery from Column-wise Linear Projections

Authors: Shana Moothedath, Namrata Vaswani

Abstract: This work develops a provably accurate fully-decentralized alternating projected gradient descent (GD) algorithm for recovering a low rank (LR) matrix from mutually independent projections of each of its columns, in a fast and communication-efficient fashion. To our best knowledge, this work is the first attempt to develop a provably correct decentralized algorithm (i) for any problem involving th… ▽ More This work develops a provably accurate fully-decentralized alternating projected gradient descent (GD) algorithm for recovering a low rank (LR) matrix from mutually independent projections of each of its columns, in a fast and communication-efficient fashion. To our best knowledge, this work is the first attempt to develop a provably correct decentralized algorithm (i) for any problem involving the use of an alternating projected GD algorithm; (ii) and for any problem in which the constraint set to be projected to is a non-convex set. △ Less

Submitted 11 February, 2024; v1 submitted 17 April, 2022; originally announced April 2022.

arXiv:2203.15629 [pdf, other]

Stochastic Conservative Contextual Linear Bandits

Authors: Jiabin Lin, Xian Yeow Lee, Talukder Jubery, Shana Moothedath, Soumik Sarkar, Baskar Ganapathysubramanian

Abstract: Many physical systems have underlying safety considerations that require that the strategy deployed ensures the satisfaction of a set of constraints. Further, often we have only partial information on the state of the system. We study the problem of safe real-time decision making under uncertainty. In this paper, we formulate a conservative stochastic contextual bandit formulation for real-time de… ▽ More Many physical systems have underlying safety considerations that require that the strategy deployed ensures the satisfaction of a set of constraints. Further, often we have only partial information on the state of the system. We study the problem of safe real-time decision making under uncertainty. In this paper, we formulate a conservative stochastic contextual bandit formulation for real-time decision making when an adversary chooses a distribution on the set of possible contexts and the learner is subject to certain safety/performance constraints. The learner observes only the context distribution and the exact context is unknown, and the goal is to develop an algorithm that selects a sequence of optimal actions to maximize the cumulative reward without violating the safety constraints at any time step. By leveraging the UCB algorithm for this setting, we propose a conservative linear UCB algorithm for stochastic bandits with context distribution. We prove an upper bound on the regret of the algorithm and show that it can be decomposed into three terms: (i) an upper bound for the regret of the standard linear UCB algorithm, (ii) a constant term (independent of time horizon) that accounts for the loss of being conservative in order to satisfy the safety constraint, and (ii) a constant term (independent of time horizon) that accounts for the loss for the contexts being unknown and only the distribution being known. To validate the performance of our approach we perform extensive simulations on synthetic data and on real-world maize data collected through the Genomes to Fields (G2F) initiative. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2112.13412 [pdf, other]

Fully Decentralized and Federated Low Rank Compressive Sensing

Authors: Shana Moothedath, Namrata Vaswani

Abstract: In this work we develop a fully decentralized, federated, and fast solution to the recently studied Low Rank Compressive Sensing (LRCS) problem: recover an nxq low-rank matrix from column-wise linear projections. An important application where this problem occurs, and a decentralized solution is desirable is in federated sketching: efficiently compressing the vast amounts of distributed images/vid… ▽ More In this work we develop a fully decentralized, federated, and fast solution to the recently studied Low Rank Compressive Sensing (LRCS) problem: recover an nxq low-rank matrix from column-wise linear projections. An important application where this problem occurs, and a decentralized solution is desirable is in federated sketching: efficiently compressing the vast amounts of distributed images/videos generated by smartphones and various other devices while respecting the users' privacy. Images from different devices, once grouped by category, are similar and hence the matrix formed by the vectorized images of a certain category is well-modeled as being low rank. Suppose there are p nodes (say p smartphones), and each store a subset of the sketches of its images. We develop a decentralized projected gradient descent (GD) based approach to jointly reconstruct the images of all the phones/users from their respective stored sketches. The algorithm is such that the phones/users never share their raw data but only summaries of this data with the other phones at each algorithm iteration. Also, the reconstructed images of user g are obtained only locally. Other users cannot reconstruct them. Only the column span of the matrix is reconstructed globally. By "decentralized" we mean that there is no central node to which all nodes are connected and thus the only way to aggregate the summaries from the various nodes is by use of an iterative consensus algorithm that eventually provides an estimate of the aggregate at each node, as long as the network is strongly connected. We validated the effectiveness of our algorithm via extensive simulation experiments. △ Less

Submitted 26 December, 2021; originally announced December 2021.

arXiv:2007.12327 [pdf, other]

Stochastic Dynamic Information Flow Tracking Game using Supervised Learning for Detecting Advanced Persistent Threats

Authors: Shana Moothedath, Dinuka Sahabandu, Joey Allen, Linda Bushnell, Wenke Lee, Radha Poovendran

Abstract: Advanced persistent threats (APTs) are organized prolonged cyberattacks by sophisticated attackers. Although APT activities are stealthy, they interact with the system components and these interactions lead to information flows. Dynamic Information Flow Tracking (DIFT) has been proposed as one of the effective ways to detect APTs using the information flows. However, wide range security analysis u… ▽ More Advanced persistent threats (APTs) are organized prolonged cyberattacks by sophisticated attackers. Although APT activities are stealthy, they interact with the system components and these interactions lead to information flows. Dynamic Information Flow Tracking (DIFT) has been proposed as one of the effective ways to detect APTs using the information flows. However, wide range security analysis using DIFT results in a significant increase in performance overhead and high rates of false-positives and false-negatives generated by DIFT. In this paper, we model the strategic interaction between APT and DIFT as a non-cooperative stochastic game. The game unfolds on a state space constructed from an information flow graph (IFG) that is extracted from the system log. The objective of the APT in the game is to choose transitions in the IFG to find an optimal path in the IFG from an entry point of the attack to an attack target. On the other hand, the objective of DIFT is to dynamically select nodes in the IFG to perform security analysis for detecting APT. Our game model has imperfect information as the players do not have information about the actions of the opponent. We consider two scenarios of the game (i) when the false-positive and false-negative rates are known to both players and (ii) when the false-positive and false-negative rates are unknown to both players. Case (i) translates to a game model with complete information and we propose a value iteration-based algorithm and prove the convergence. Case (ii) translates to a game with unknown transition probabilities. In this case, we propose Hierarchical Supervised Learning (HSL) algorithm that integrates a neural network, to predict the value vector of the game, with a policy iteration algorithm to compute an approximate equilibrium. We implemented our algorithms on real attack datasets and validated the performance of our approach. △ Less

Submitted 25 June, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

arXiv:2007.00076 [pdf, other]

A Reinforcement Learning Approach for Dynamic Information Flow Tracking Games for Detecting Advanced Persistent Threats

Authors: Dinuka Sahabandu, Shana Moothedath, Joey Allen, Linda Bushnell, Wenke Lee, Radha Poovendran

Abstract: Advanced Persistent Threats (APTs) are stealthy attacks that threaten the security and privacy of sensitive information. Interactions of APTs with victim system introduce information flows that are recorded in the system logs. Dynamic Information Flow Tracking (DIFT) is a promising detection mechanism for detecting APTs. DIFT taints information flows originating at system entities that are suscept… ▽ More Advanced Persistent Threats (APTs) are stealthy attacks that threaten the security and privacy of sensitive information. Interactions of APTs with victim system introduce information flows that are recorded in the system logs. Dynamic Information Flow Tracking (DIFT) is a promising detection mechanism for detecting APTs. DIFT taints information flows originating at system entities that are susceptible to an attack, tracks the propagation of the tainted flows, and authenticates the tainted flows at certain system components according to a pre-defined security policy. Deployment of DIFT to defend against APTs in cyber systems is limited by the heavy resource and performance overhead associated with DIFT. In this paper, we propose a resource-efficient model for DIFT by incorporating the security costs, false-positives, and false-negatives associated with DIFT. Specifically, we develop a game-theoretic framework and provide an analytical model of DIFT that enables the study of trade-off between resource efficiency and the effectiveness of detection. Our game model is a nonzero-sum, infinite-horizon, average reward stochastic game. Our model incorporates the information asymmetry between players that arises from DIFT's inability to distinguish malicious flows from benign flows and APT's inability to know the locations where DIFT performs a security analysis. Additionally, the game has incomplete information as the transition probabilities (false-positive and false-negative rates) are unknown. We propose a multiple-time scale stochastic approximation algorithm to learn an equilibrium solution of the game. We prove that our algorithm converges to an average reward Nash equilibrium. We evaluate our proposed model and algorithm on a real-world ransomware dataset and validate the effectiveness of the proposed approach. △ Less

Submitted 28 June, 2021; v1 submitted 30 June, 2020; originally announced July 2020.

Comments: 15

arXiv:2006.12327 [pdf, other]

Dynamic Information Flow Tracking for Detection of Advanced Persistent Threats: A Stochastic Game Approach

Authors: Shana Moothedath, Dinuka Sahabandu, Joey Allen, Andrew Clark, Linda Bushnell, Wenke Lee, Radha Poovendran

Abstract: Advanced Persistent Threats (APTs) are stealthy customized attacks by intelligent adversaries. This paper deals with the detection of APTs that infiltrate cyber systems and compromise specifically targeted data and/or infrastructures. Dynamic information flow tracking is an information trace-based detection mechanism against APTs that taints suspicious information flows in the system and generates… ▽ More Advanced Persistent Threats (APTs) are stealthy customized attacks by intelligent adversaries. This paper deals with the detection of APTs that infiltrate cyber systems and compromise specifically targeted data and/or infrastructures. Dynamic information flow tracking is an information trace-based detection mechanism against APTs that taints suspicious information flows in the system and generates security analysis for unauthorized use of tainted data. In this paper, we develop an analytical model for resource-efficient detection of APTs using an information flow tracking game. The game is a nonzero-sum, turn-based, stochastic game with asymmetric information as the defender cannot distinguish whether an incoming flow is malicious or benign and hence has only partial state observation. We analyze equilibrium of the game and prove that a Nash equilibrium is given by a solution to the minimum capacity cut set problem on a flow-network derived from the system, where the edge capacities are obtained from the cost of performing security analysis. Finally, we implement our algorithm on the real-world dataset for a data exfiltration attack augmented with false-negative and false-positive rates and compute an optimal defender strategy. △ Less

Submitted 25 June, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

arXiv:1811.05622 [pdf, other]

A Game Theoretic Approach for Dynamic Information Flow Tracking to Detect Multi-Stage Advanced Persistent Threats

Authors: Shana Moothedath, Dinuka Sahabandu, Joey Allen, Andrew Clark, Linda Bushnell, Wenke Lee, Radha Poovendran

Abstract: Advanced Persistent Threats (APTs) infiltrate cyber systems and compromise specifically targeted data and/or resources through a sequence of stealthy attacks consisting of multiple stages. Dynamic information flow tracking has been proposed to detect APTs. In this paper, we develop a dynamic information flow tracking game for resource-efficient detection of APTs via multi-stage dynamic games. The… ▽ More Advanced Persistent Threats (APTs) infiltrate cyber systems and compromise specifically targeted data and/or resources through a sequence of stealthy attacks consisting of multiple stages. Dynamic information flow tracking has been proposed to detect APTs. In this paper, we develop a dynamic information flow tracking game for resource-efficient detection of APTs via multi-stage dynamic games. The game evolves on an information flow graph, whose nodes are processes and objects (e.g. file, network endpoints) in the system and the edges capture the interaction between different processes and objects. Each stage of the game has pre-specified targets which are characterized by a set of nodes of the graph and the goal of the APT is to evade detection and reach a target node of that stage. The goal of the defender is to maximize the detection probability while minimizing performance overhead on the system. The resource costs of the players are different and the information structure is asymmetric resulting in a nonzero-sum imperfect information game. We first calculate the best responses of the players and characterize the set of Nash equilibria for single stage attacks. Subsequently, we provide a polynomial-time algorithm to compute a correlated equilibrium for the multi-stage attack case. Finally, we experiment our model and algorithms on real-world nation state attack data obtained from Refinable Attack Investigation system. △ Less

Submitted 13 November, 2018; originally announced November 2018.

Comments: 16

arXiv:1710.10037 [pdf, other]

Rapidly Mixing Markov Chain Monte Carlo Technique for Matching Problems with Global Utility Function

Authors: Shana Moothedath, Prasanna Chaporkar, Madhu N. Belur

Abstract: This paper deals with a complete bipartite matching problem with the objective of finding an optimal matching that maximizes a certain generic predefined utility function on the set of all matchings. After proving the NP-hardness of the problem using reduction from the 3-SAT problem, we propose a randomized algorithm based on Markov Chain Monte Carlo (MCMC) technique for solving this. We sample fr… ▽ More This paper deals with a complete bipartite matching problem with the objective of finding an optimal matching that maximizes a certain generic predefined utility function on the set of all matchings. After proving the NP-hardness of the problem using reduction from the 3-SAT problem, we propose a randomized algorithm based on Markov Chain Monte Carlo (MCMC) technique for solving this. We sample from Gibb's distribution and construct a reversible positive recurrent discrete time Markov chain (DTMC) that has the steady state distribution same as the Gibb's distribution. In one of our key contributions, we show that the constructed chain is `rapid mixing', i.e., the convergence time to reach within a specified distance to the desired distribution is polynomial in the problem size. The rapid mixing property is established by obtaining a lower bound on the conductance of the DTMC graph and this result is of independent interest. △ Less

Submitted 27 October, 2017; originally announced October 2017.

arXiv:1705.09600 [pdf, ps, other]

Approximating Constrained Minimum Cost Input-Output Selection for Generic Arbitrary Pole Placement in Structured Systems

Authors: Shana Moothedath, Prasanna Chaporkar, Madhu N. Belur

Abstract: This paper is about minimum cost constrained selection of inputs and outputs for generic arbitrary pole placement. The input-output set is constrained in the sense that the set of states that each input can influence and the set of states that each output can sense is pre-specified. Our goal is to optimally select an input-output set that the system has no structurally fixed modes. Polynomial algo… ▽ More This paper is about minimum cost constrained selection of inputs and outputs for generic arbitrary pole placement. The input-output set is constrained in the sense that the set of states that each input can influence and the set of states that each output can sense is pre-specified. Our goal is to optimally select an input-output set that the system has no structurally fixed modes. Polynomial algorithms do not exist for solving this problem unless P=NP. To this end, we propose an approximation algorithm by splitting the problem in to three sub-problems: a) minimum cost accessibility problem, b) minimum cost sensability problem and c) minimum cost disjoint cycle problem. We prove that problems a) and b) are equivalent to a suitably defined weighted set cover problems. We also show that problem c) is equivalent to a minimum cost perfect matching problem. Using these we give an approximation algorithm which solves the minimum cost generic arbitrary pole placement problem. The proposed algorithm incorporates an approximation algorithm to solve the weighted set cover problem for solving a) and b) and a minimum cost perfect matching algorithm to solve c). Further, we show that the algorithm is polynomial time an gives an order optimal solution to the minimum cost input-output selection for generic arbitrary pole placement problem. △ Less

Submitted 9 January, 2018; v1 submitted 26 May, 2017; originally announced May 2017.

Comments: 11 pages, 2 figures

Showing 1–13 of 13 results for author: Moothedath, S