Search | arXiv e-print repository

Defense against Joint Poison and Evasion Attacks: A Case Study of DERMS

Authors: Zain ul Abdeen, Padmaksha Roy, Ahmad Al-Tawaha, Rouxi Jia, Laura Freeman, Peter Beling, Chen-Ching Liu, Alberto Sangiovanni-Vincentelli, Ming **

Abstract: There is an upward trend of deploying distributed energy resource management systems (DERMS) to control modern power grids. However, DERMS controller communication lines are vulnerable to cyberattacks that could potentially impact operational reliability. While a data-driven intrusion detection system (IDS) can potentially thwart attacks during deployment, also known as the evasion attack, the tra… ▽ More There is an upward trend of deploying distributed energy resource management systems (DERMS) to control modern power grids. However, DERMS controller communication lines are vulnerable to cyberattacks that could potentially impact operational reliability. While a data-driven intrusion detection system (IDS) can potentially thwart attacks during deployment, also known as the evasion attack, the training of the detection algorithm may be corrupted by adversarial data injected into the database, also known as the poisoning attack. In this paper, we propose the first framework of IDS that is robust against joint poisoning and evasion attacks. We formulate the defense mechanism as a bilevel optimization, where the inner and outer levels deal with attacks that occur during training time and testing time, respectively. We verify the robustness of our method on the IEEE-13 bus feeder model against a diverse set of poisoning and evasion attack scenarios. The results indicate that our proposed method outperforms the baseline technique in terms of accuracy, precision, and recall for intrusion detection. △ Less

Submitted 5 May, 2024; originally announced May 2024.

arXiv:2404.03775 [pdf, ps, other]

A Systems Theoretic Approach to Online Machine Learning

Authors: Anli du Preez, Peter A. Beling, Tyler Cody

Abstract: The machine learning formulation of online learning is incomplete from a systems theoretic perspective. Typically, machine learning research emphasizes domains and tasks, and a problem solving worldview. It focuses on algorithm parameters, features, and samples, and neglects the perspective offered by considering system structure and system behavior or dynamics. Online learning is an active field… ▽ More The machine learning formulation of online learning is incomplete from a systems theoretic perspective. Typically, machine learning research emphasizes domains and tasks, and a problem solving worldview. It focuses on algorithm parameters, features, and samples, and neglects the perspective offered by considering system structure and system behavior or dynamics. Online learning is an active field of research and has been widely explored in terms of statistical theory and computational algorithms, however, in general, the literature still lacks formal system theoretical frameworks for modeling online learning systems and resolving systems-related concept drift issues. Furthermore, while the machine learning formulation serves to classify methods and literature, the systems theoretic formulation presented herein serves to provide a framework for the top-down design of online learning systems, including a novel definition of online learning and the identification of key design parameters. The framework is formulated in terms of input-output systems and is further divided into system structure and system behavior. Concept drift is a critical challenge faced in online learning, and this work formally approaches it as part of the system behavior characteristics. Healthcare provider fraud detection using machine learning is used as a case study throughout the paper to ground the discussion in a real-world online learning challenge. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted by the 18th Annual IEEE International Systems Conference (SysCon)

arXiv:2404.03769 [pdf, ps, other]

On Extending the Automatic Test Markup Language (ATML) for Machine Learning

Authors: Tyler Cody, Bingtong Li, Peter A. Beling

Abstract: This paper addresses the urgent need for messaging standards in the operational test and evaluation (T&E) of machine learning (ML) applications, particularly in edge ML applications embedded in systems like robots, satellites, and unmanned vehicles. It examines the suitability of the IEEE Standard 1671 (IEEE Std 1671), known as the Automatic Test Markup Language (ATML), an XML-based standard origi… ▽ More This paper addresses the urgent need for messaging standards in the operational test and evaluation (T&E) of machine learning (ML) applications, particularly in edge ML applications embedded in systems like robots, satellites, and unmanned vehicles. It examines the suitability of the IEEE Standard 1671 (IEEE Std 1671), known as the Automatic Test Markup Language (ATML), an XML-based standard originally developed for electronic systems, for ML application testing. The paper explores extending IEEE Std 1671 to encompass the unique challenges of ML applications, including the use of datasets and dependencies on software. Through modeling various tests such as adversarial robustness and drift detection, this paper offers a framework adaptable to specific applications, suggesting that minor modifications to ATML might suffice to address the novelties of ML. This paper differentiates ATML's focus on testing from other ML standards like Predictive Model Markup Language (PMML) or Open Neural Network Exchange (ONNX), which concentrate on ML model specification. We conclude that ATML is a promising tool for effective, near real-time operational T&E of ML applications, an essential aspect of AI lifecycle management, safety, and governance. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted by the 18th Annual IEEE International Systems Conference (SysCon)

arXiv:2311.10786 [pdf, other]

A Systems-Theoretical Formalization of Closed Systems

Authors: Niloofar Shadab, Tyler Cody, Alejandro Salado, Peter Beling

Abstract: There is a lack of formalism for some key foundational concepts in systems engineering. One of the most recently acknowledged deficits is the inadequacy of systems engineering practices for engineering intelligent systems. In our previous works, we proposed that closed systems precepts could be used to accomplish a required paradigm shift for the systems engineering of intelligent systems. However… ▽ More There is a lack of formalism for some key foundational concepts in systems engineering. One of the most recently acknowledged deficits is the inadequacy of systems engineering practices for engineering intelligent systems. In our previous works, we proposed that closed systems precepts could be used to accomplish a required paradigm shift for the systems engineering of intelligent systems. However, to enable such a shift, formal foundations for closed systems precepts that expand the theory of systems engineering are needed. The concept of closure is a critical concept in the formalism underlying closed systems precepts. In this paper, we provide formal, systems- and information-theoretic definitions of closure to identify and distinguish different types of closed systems. Then, we assert a mathematical framework to evaluate the subjective formation of the boundaries and constraints of such systems. Finally, we argue that engineering an intelligent system can benefit from appropriate closed and open systems paradigms on multiple levels of abstraction of the system. In the main, this framework will provide the necessary fundamentals to aid in systems engineering of intelligent systems. △ Less

Submitted 16 November, 2023; originally announced November 2023.

Comments: 11 pages, 3 figures

arXiv:2302.14567 [pdf, other]

Active Learning with Combinatorial Coverage

Authors: Sai Prathyush Katragadda, Tyler Cody, Peter Beling, Laura Freeman

Abstract: Active learning is a practical field of machine learning that automates the process of selecting which data to label. Current methods are effective in reducing the burden of data labeling but are heavily model-reliant. This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias. Both issues are of crucial concern in machine learning deployment… ▽ More Active learning is a practical field of machine learning that automates the process of selecting which data to label. Current methods are effective in reducing the burden of data labeling but are heavily model-reliant. This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias. Both issues are of crucial concern in machine learning deployment. We propose active learning methods utilizing combinatorial coverage to overcome these issues. The proposed methods are data-centric, as opposed to model-centric, and through our experiments we show that the inclusion of coverage in active learning leads to sampling data that tends to be the best in transferring to better performing models and has a competitive sampling bias compared to benchmark methods. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: Accepted 2022 IEEE International Conference on Machine Learning and Applications (IEEE ICMLA)

arXiv:2211.03027 [pdf, other]

Exposing Surveillance Detection Routes via Reinforcement Learning, Attack Graphs, and Cyber Terrain

Authors: Lanxiao Huang, Tyler Cody, Christopher Redino, Abdul Rahman, Akshay Kakkar, Deepak Kushwaha, Cheng Wang, Ryan Clark, Daniel Radke, Peter Beling, Edward Bowen

Abstract: Reinforcement learning (RL) operating on attack graphs leveraging cyber terrain principles are used to develop reward and state associated with determination of surveillance detection routes (SDR). This work extends previous efforts on develo** RL methods for path analysis within enterprise networks. This work focuses on building SDR where the routes focus on exploring the network services while… ▽ More Reinforcement learning (RL) operating on attack graphs leveraging cyber terrain principles are used to develop reward and state associated with determination of surveillance detection routes (SDR). This work extends previous efforts on develo** RL methods for path analysis within enterprise networks. This work focuses on building SDR where the routes focus on exploring the network services while trying to evade risk. RL is utilized to support the development of these routes by building a reward mechanism that would help in realization of these paths. The RL algorithm is modified to have a novel warm-up phase which decides in the initial exploration which areas of the network are safe to explore based on the rewards and penalty scale factor. △ Less

Submitted 6 November, 2022; originally announced November 2022.

arXiv:2208.02837 [pdf, other]

Core and Periphery as Closed-System Precepts for Engineering General Intelligence

Authors: Tyler Cody, Niloofar Shadab, Alejandro Salado, Peter Beling

Abstract: Engineering methods are centered around traditional notions of decomposition and recomposition that rely on partitioning the inputs and outputs of components to allow for component-level properties to hold after their composition. In artificial intelligence (AI), however, systems are often expected to influence their environments, and, by way of their environments, to influence themselves. Thus, i… ▽ More Engineering methods are centered around traditional notions of decomposition and recomposition that rely on partitioning the inputs and outputs of components to allow for component-level properties to hold after their composition. In artificial intelligence (AI), however, systems are often expected to influence their environments, and, by way of their environments, to influence themselves. Thus, it is unclear if an AI system's inputs will be independent of its outputs, and, therefore, if AI systems can be treated as traditional components. This paper posits that engineering general intelligence requires new general systems precepts, termed the core and periphery, and explores their theoretical uses. The new precepts are elaborated using abstract systems theory and the Law of Requisite Variety. By using the presented material, engineers can better understand the general character of regulating the outcomes of AI to achieve stakeholder needs and how the general systems nature of embodiment challenges traditional engineering practice. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: The 15th International Conference on Artificial General Intelligence (AGI-22)

arXiv:2203.11414 [pdf, other]

BESSIE: A Behavior and Epidemic Simulator for Use With Synthetic Populations

Authors: Henning S Mortveit, Stephen Adams, Faraz Dadgostari, Samarth Swarup, Peter Beling

Abstract: In this paper, we present BESSIE (Behavior and Epidemic Simulator for Synthetic Information Environments), an open source, agent-based simulator for COVID-type epidemics. BESSIE uses a synthetic population where each person has demographic attributes, belong to a household, and has a base activity- and visit schedule covering seven days. The simulated disease spreads through contacts that arise fr… ▽ More In this paper, we present BESSIE (Behavior and Epidemic Simulator for Synthetic Information Environments), an open source, agent-based simulator for COVID-type epidemics. BESSIE uses a synthetic population where each person has demographic attributes, belong to a household, and has a base activity- and visit schedule covering seven days. The simulated disease spreads through contacts that arise from joint visits to the locations where activities take place. The simulation model has a plugin-type programmable behavioral model where, based on the dynamics and observables tracked by the simulator, agents decide on actions such as wearing a mask, engaging in social distancing, or refraining from certain activity types by staying at home instead. The plugins are supplied as Python code. To the best of our knowledge, BESSIE is a unique simulator supporting this feature set, and most certainly as open software. To illustrate the use of BESSIE, we provide a COVID-relevant example demonstrating some of its capabilities. The example uses a synthetic population for the City of Charlottesville, Virginia. Both this population and the Python plugin modules used in the example are made available. The Python implementation, which can run on anything from a laptop to a cluster, is made available under the Apache 2.0 license (https://www.apache.org/licenses/LICENSE-2.0.html). The example population accompanying this publication is made available under the CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/). △ Less

Submitted 21 March, 2022; originally announced March 2022.

Comments: 24 pages; the git repository and accompanying data will be made openly available upon final journal publication

arXiv:2201.12416 [pdf, other]

Discovering Exfiltration Paths Using Reinforcement Learning with Attack Graphs

Authors: Tyler Cody, Abdul Rahman, Christopher Redino, Lanxiao Huang, Ryan Clark, Akshay Kakkar, Deepak Kushwaha, Paul Park, Peter Beling, Edward Bowen

Abstract: Reinforcement learning (RL), in conjunction with attack graphs and cyber terrain, are used to develop reward and state associated with determination of optimal paths for exfiltration of data in enterprise networks. This work builds on previous crown jewels (CJ) identification that focused on the target goal of computing optimal paths that adversaries may traverse toward compromising CJs or hosts w… ▽ More Reinforcement learning (RL), in conjunction with attack graphs and cyber terrain, are used to develop reward and state associated with determination of optimal paths for exfiltration of data in enterprise networks. This work builds on previous crown jewels (CJ) identification that focused on the target goal of computing optimal paths that adversaries may traverse toward compromising CJs or hosts within their proximity. This work inverts the previous CJ approach based on the assumption that data has been stolen and now must be quietly exfiltrated from the network. RL is utilized to support the development of a reward function based on the identification of those paths where adversaries desire reduced detection. Results demonstrate promising performance for a sizable network environment. △ Less

Submitted 25 April, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

Comments: The 5th IEEE Conference on Dependable and Secure Computing (IEEE DSC 2022)

arXiv:2108.00293 [pdf, other]

Inverse Reinforcement Learning for Strategy Identification

Authors: Mark Rucker, Stephen Adams, Roy Hayes, Peter A. Beling

Abstract: In adversarial environments, one side could gain an advantage by identifying the opponent's strategy. For example, in combat games, if an opponents strategy is identified as overly aggressive, one could lay a trap that exploits the opponent's aggressive nature. However, an opponent's strategy is not always apparent and may need to be estimated from observations of their actions. This paper propose… ▽ More In adversarial environments, one side could gain an advantage by identifying the opponent's strategy. For example, in combat games, if an opponents strategy is identified as overly aggressive, one could lay a trap that exploits the opponent's aggressive nature. However, an opponent's strategy is not always apparent and may need to be estimated from observations of their actions. This paper proposes to use inverse reinforcement learning (IRL) to identify strategies in adversarial environments. Specifically, the contributions of this work are 1) the demonstration of this concept on gaming combat data generated from three pre-defined strategies and 2) the framework for using IRL to achieve strategy identification. The numerical experiments demonstrate that the recovered rewards can be identified using a variety of techniques. In this paper, the recovered reward are visually displayed, clustered using unsupervised learning, and classified using a supervised learner. △ Less

Submitted 31 July, 2021; originally announced August 2021.

Comments: The paper has been accepted as a regular paper in IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2021

arXiv:2107.01196 [pdf, other]

A Systems Theory of Transfer Learning

Authors: Tyler Cody, Peter A. Beling

Abstract: Existing frameworks for transfer learning are incomplete from a systems theoretic perspective. They place emphasis on notions of domain and task, and neglect notions of structure and behavior. In doing so, they limit the extent to which formalism can be carried through into the elaboration of their frameworks. Herein, we use Mesarovician systems theory to define transfer learning as a relation on… ▽ More Existing frameworks for transfer learning are incomplete from a systems theoretic perspective. They place emphasis on notions of domain and task, and neglect notions of structure and behavior. In doing so, they limit the extent to which formalism can be carried through into the elaboration of their frameworks. Herein, we use Mesarovician systems theory to define transfer learning as a relation on sets and subsequently characterize the general nature of transfer learning as a mathematical construct. We interpret existing frameworks in terms of ours and go beyond existing frameworks to define notions of transferability, transfer roughness, and transfer distance. Importantly, despite its formalism, our framework avoids the detailed mathematics of learning theory or machine learning solution methods without excluding their consideration. As such, we provide a formal, general systems framework for modeling transfer learning that offers a rigorous foundation for system design and analysis. △ Less

Submitted 2 July, 2021; originally announced July 2021.

arXiv:2107.01184 [pdf, other]

doi 10.1109/JSYST.2022.3144837

Empirically Measuring Transfer Distance for System Design and Operation

Authors: Tyler Cody, Stephen Adams, Peter A. Beling

Abstract: Classical machine learning approaches are sensitive to non-stationarity. Transfer learning can address non-stationarity by sharing knowledge from one system to another, however, in areas like machine prognostics and defense, data is fundamentally limited. Therefore, transfer learning algorithms have little, if any, examples from which to learn. Herein, we suggest that these constraints on algorith… ▽ More Classical machine learning approaches are sensitive to non-stationarity. Transfer learning can address non-stationarity by sharing knowledge from one system to another, however, in areas like machine prognostics and defense, data is fundamentally limited. Therefore, transfer learning algorithms have little, if any, examples from which to learn. Herein, we suggest that these constraints on algorithmic learning can be addressed by systems engineering. We formally define transfer distance in general terms and demonstrate its use in empirically quantifying the transferability of models. We consider the use of transfer distance in the design of machine rebuild procedures to allow for transferable prognostic models. We also consider the use of transfer distance in predicting operational performance in computer vision. Practitioners can use the presented methodology to design and operate systems with consideration for the learning theoretic challenges faced by component learning systems. △ Less

Submitted 2 July, 2021; originally announced July 2021.

arXiv:2011.14469 [pdf, other]

doi 10.1109/MC.2020.3039491

Cyberphysical Security Through Resiliency: A Systems-centric Approach

Authors: Cody Fleming, Carl Elks, Georgios Bakirtzis, Stephen C. Adams, Bryan Carter, Peter A. Beling, Barry Horowitz

Abstract: Cyber-physical systems (CPS) are often defended in the same manner as information technology (IT) systems -- by using perimeter security. Multiple factors make such defenses insufficient for CPS. Resiliency shows potential in overcoming these shortfalls. Techniques for achieving resilience exist; however, methods and theory for evaluating resilience in CPS are lacking. We argue that such methods a… ▽ More Cyber-physical systems (CPS) are often defended in the same manner as information technology (IT) systems -- by using perimeter security. Multiple factors make such defenses insufficient for CPS. Resiliency shows potential in overcoming these shortfalls. Techniques for achieving resilience exist; however, methods and theory for evaluating resilience in CPS are lacking. We argue that such methods and theory should assist stakeholders in deciding where and how to apply design patterns for resilience. Such a problem potentially involves tradeoffs between different objectives and criteria, and such decisions need to be driven by traceable, defensible, repeatable engineering evidence. Multi-criteria resiliency problems require a system-oriented approach that evaluates systems in the presence of threats as well as potential design solutions once vulnerabilities have been identified. We present a systems-oriented view of cyber-physical security, termed Mission Aware, that is based on a holistic understanding of mission goals, system dynamics, and risk. △ Less

Submitted 9 October, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

arXiv:2007.12306 [pdf, other]

Value-Decomposition Multi-Agent Actor-Critics

Authors: Jianyu Su, Stephen Adams, Peter A. Beling

Abstract: The exploitation of extra state information has been an active research area in multi-agent reinforcement learning (MARL). QMIX represents the joint action-value using a non-negative function approximator and achieves the best performance, by far, on multi-agent benchmarks, StarCraft II micromanagement tasks. However, our experiments show that, in some cases, QMIX is incompatible with A2C, a train… ▽ More The exploitation of extra state information has been an active research area in multi-agent reinforcement learning (MARL). QMIX represents the joint action-value using a non-negative function approximator and achieves the best performance, by far, on multi-agent benchmarks, StarCraft II micromanagement tasks. However, our experiments show that, in some cases, QMIX is incompatible with A2C, a training paradigm that promotes algorithm training efficiency. To obtain a reasonable trade-off between training efficiency and algorithm performance, we extend value-decomposition to actor-critics that are compatible with A2C and propose a novel actor-critic framework, value-decomposition actor-critics (VDACs). We evaluate VDACs on the testbed of StarCraft II micromanagement tasks and demonstrate that the proposed framework improves median performance over other actor-critic methods. Furthermore, we use a set of ablation experiments to identify the key factors that contribute to the performance of VDACs. △ Less

Submitted 18 December, 2020; v1 submitted 23 July, 2020; originally announced July 2020.

Comments: Accepted by 35th AAAI Conference on Artificial Intelligence (AAAI 2021)

arXiv:2006.05304 [pdf, other]

doi 10.1007/s10270-021-00892-z

An Ontological Metamodel for Cyber-Physical System Safety, Security, and Resilience Coengineering

Authors: Georgios Bakirtzis, Tim Sherburne, Stephen Adams, Barry M. Horowitz, Peter A. Beling, Cody H. Fleming

Abstract: System complexity has become ubiquitous in the design, assessment, and implementation of practical and useful cyber-physical systems. This increased complexity is impacting the management of models necessary for designing cyber-physical systems that are able to take into account a number of ``-ilities'', such that they are safe and secure and ultimately resilient to disruption of service. We propo… ▽ More System complexity has become ubiquitous in the design, assessment, and implementation of practical and useful cyber-physical systems. This increased complexity is impacting the management of models necessary for designing cyber-physical systems that are able to take into account a number of ``-ilities'', such that they are safe and secure and ultimately resilient to disruption of service. We propose an ontological metamodel for system design that augments an already existing industry metamodel to capture the relationships between various model elements and safety, security, and resilient considerations. Employing this metamodel leads to more cohesive and structured modeling efforts with an overall increase in scalability, usability, and unification of already existing models. In turn, this leads to a mission-oriented perspective in designing security defenses and resilience mechanisms to combat undesirable behaviors. We illustrate this metamodel in an open-source GraphQL implementation, which can interface with a number of modeling languages. We support our proposed metamodel with a detailed demonstration using an oil and gas pipeline model. △ Less

Submitted 9 June, 2020; originally announced June 2020.

arXiv:2004.00470 [pdf, other]

Counterfactual Multi-Agent Reinforcement Learning with Graph Convolution Communication

Authors: Jianyu Su, Stephen Adams, Peter A. Beling

Abstract: We consider a fully cooperative multi-agent system where agents cooperate to maximize a system's utility in a partial-observable environment. We propose that multi-agent systems must have the ability to (1) communicate and understand the inter-plays between agents and (2) correctly distribute rewards based on an individual agent's contribution. In contrast, most work in this setting considers only… ▽ More We consider a fully cooperative multi-agent system where agents cooperate to maximize a system's utility in a partial-observable environment. We propose that multi-agent systems must have the ability to (1) communicate and understand the inter-plays between agents and (2) correctly distribute rewards based on an individual agent's contribution. In contrast, most work in this setting considers only one of the above abilities. In this study, we develop an architecture that allows for communication among agents and tailors the system's reward for each individual agent. Our architecture represents agent communication through graph convolution and applies an existing credit assignment structure, counterfactual multi-agent policy gradient (COMA), to assist agents to learn communication by back-propagation. The flexibility of the graph structure enables our method to be applicable to a variety of multi-agent systems, e.g. dynamic systems that consist of varying numbers of agents and static systems with a fixed number of agents. We evaluate our method on a range of tasks, demonstrating the advantage of marrying communication with credit assignment. In the experiments, our proposed method yields better performance than the state-of-art methods, including COMA. Moreover, we show that the communication strategies offers us insights and interpretability of the system's cooperative policies. △ Less

Submitted 28 December, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

arXiv:1911.09837 [pdf, other]

Graph Convolution Networks for Probabilistic Modeling of Driving Acceleration

Authors: Jianyu Su, Peter A. Beling, Rui Guo, Kyungtae Han

Abstract: The ability to model and predict ego-vehicle's surrounding traffic is crucial for autonomous pilots and intelligent driver-assistance systems. Acceleration prediction is important as one of the major components of traffic prediction. This paper proposes novel approaches to the acceleration prediction problem. By representing spatial relationships between vehicles with a graph model, we build a gen… ▽ More The ability to model and predict ego-vehicle's surrounding traffic is crucial for autonomous pilots and intelligent driver-assistance systems. Acceleration prediction is important as one of the major components of traffic prediction. This paper proposes novel approaches to the acceleration prediction problem. By representing spatial relationships between vehicles with a graph model, we build a generalized acceleration prediction framework. This paper studies the effectiveness of proposed Graph Convolution Networks, which operate on graphs predicting the acceleration distribution for vehicles driving on highways. We further investigate prediction improvement through integrating of Recurrent Neural Networks to disentangle the temporal complexity inherent in the traffic data. Results from simulation studies using comprehensive performance metrics support the conclusion that our proposed networks outperform state-of-the-art methods in generating realistic trajectories over a prediction horizon. △ Less

Submitted 7 May, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

Comments: Accepted by ITSC 2020

arXiv:1806.09795 [pdf, ps, other]

doi 10.1613/jair.1.11541

Multi-agent Inverse Reinforcement Learning for Certain General-sum Stochastic Games

Authors: Xiaomin Lin, Stephen C. Adams, Peter A. Beling

Abstract: This paper addresses the problem of multi-agent inverse reinforcement learning (MIRL) in a two-player general-sum stochastic game framework. Five variants of MIRL are considered: uCS-MIRL, advE-MIRL, cooE-MIRL, uCE-MIRL, and uNE-MIRL, each distinguished by its solution concept. Problem uCS-MIRL is a cooperative game in which the agents employ cooperative strategies that aim to maximize the total g… ▽ More This paper addresses the problem of multi-agent inverse reinforcement learning (MIRL) in a two-player general-sum stochastic game framework. Five variants of MIRL are considered: uCS-MIRL, advE-MIRL, cooE-MIRL, uCE-MIRL, and uNE-MIRL, each distinguished by its solution concept. Problem uCS-MIRL is a cooperative game in which the agents employ cooperative strategies that aim to maximize the total game value. In problem uCE-MIRL, agents are assumed to follow strategies that constitute a correlated equilibrium while maximizing total game value. Problem uNE-MIRL is similar to uCE-MIRL in total game value maximization, but it is assumed that the agents are playing a Nash equilibrium. Problems advE-MIRL and cooE-MIRL assume agents are playing an adversarial equilibrium and a coordination equilibrium, respectively. We propose novel approaches to address these five problems under the assumption that the game observer either knows or is able to accurate estimate the policies and solution concepts for players. For uCS-MIRL, we first develop a characteristic set of solutions ensuring that the observed bi-policy is a uCS and then apply a Bayesian inverse learning method. For uCE-MIRL, we develop a linear programming problem subject to constraints that define necessary and sufficient conditions for the observed policies to be correlated equilibria. The objective is to choose a solution that not only minimizes the total game value difference between the observed bi-policy and a local uCS, but also maximizes the scale of the solution. We apply a similar treatment to the problem of uNE-MIRL. The remaining two problems can be solved efficiently by taking advantage of solution uniqueness and setting up a convex optimization problem. Results are validated on various benchmark grid-world games. △ Less

Submitted 10 October, 2019; v1 submitted 26 June, 2018; originally announced June 2018.

Comments: 30 pages

Journal ref: Journal of Artificial Intelligence Research 66 (2019), pp 473-502

arXiv:1403.6822 [pdf, ps, other]

Comparison of Multi-agent and Single-agent Inverse Learning on a Simulated Soccer Example

Authors: Xiaomin Lin, Peter A. Beling, Randy Cogill

Abstract: We compare the performance of Inverse Reinforcement Learning (IRL) with the relative new model of Multi-agent Inverse Reinforcement Learning (MIRL). Before comparing the methods, we extend a published Bayesian IRL approach that is only applicable to the case where the reward is only state dependent to a general one capable of tackling the case where the reward depends on both state and action. Com… ▽ More We compare the performance of Inverse Reinforcement Learning (IRL) with the relative new model of Multi-agent Inverse Reinforcement Learning (MIRL). Before comparing the methods, we extend a published Bayesian IRL approach that is only applicable to the case where the reward is only state dependent to a general one capable of tackling the case where the reward depends on both state and action. Comparison between IRL and MIRL is made in the context of an abstract soccer game, using both a game model in which the reward depends only on state and one in which it depends on both state and action. Results suggest that the IRL approach performs much worse than the MIRL approach. We speculate that the underperformance of IRL is because it fails to capture equilibrium information in the manner possible in MIRL. △ Less

Submitted 26 March, 2014; originally announced March 2014.

Comments: arXiv admin note: text overlap with arXiv:1403.6508

arXiv:1403.6508 [pdf, ps, other]

doi 10.1109/TCIAIG.2017.2679115

Multi-agent Inverse Reinforcement Learning for Two-person Zero-sum Games

Authors: Xiaomin Lin, Peter A. Beling, Randy Cogill

Abstract: The focus of this paper is a Bayesian framework for solving a class of problems termed multi-agent inverse reinforcement learning (MIRL). Compared to the well-known inverse reinforcement learning (IRL) problem, MIRL is formalized in the context of stochastic games, which generalize Markov decision processes to game theoretic scenarios. We establish a theoretical foundation for competitive two-agen… ▽ More The focus of this paper is a Bayesian framework for solving a class of problems termed multi-agent inverse reinforcement learning (MIRL). Compared to the well-known inverse reinforcement learning (IRL) problem, MIRL is formalized in the context of stochastic games, which generalize Markov decision processes to game theoretic scenarios. We establish a theoretical foundation for competitive two-agent zero-sum MIRL problems and propose a Bayesian solution approach in which the generative model is based on an assumption that the two agents follow a minimax bi-policy. Numerical results are presented comparing the Bayesian MIRL method with two existing methods in the context of an abstract soccer game. Investigation centers on relationships between the extent of prior information and the quality of learned rewards. Results suggest that covariance structure is more important than mean value in reward priors. △ Less

Submitted 29 July, 2019; v1 submitted 25 March, 2014; originally announced March 2014.

Journal ref: IEEE Transactions on Games 10(1) (2018) 56-68

arXiv:1403.6248 [pdf, ps, other]

Classroom Video Assessment and Retrieval via Multiple Instance Learning

Authors: Qifeng Qiao, Peter A. Beling

Abstract: We propose a multiple instance learning approach to content-based retrieval of classroom video for the purpose of supporting human assessing the learning environment. The key element of our approach is a map** between the semantic concepts of the assessment system and features of the video that can be measured using techniques from the fields of computer vision and speech analysis. We report on… ▽ More We propose a multiple instance learning approach to content-based retrieval of classroom video for the purpose of supporting human assessing the learning environment. The key element of our approach is a map** between the semantic concepts of the assessment system and features of the video that can be measured using techniques from the fields of computer vision and speech analysis. We report on a formative experiment in content-based video retrieval involving trained experts in the Classroom Assessment Scoring System, a widely used framework for assessment and improvement of learning environments. The results of this experiment suggest that our approach has potential application to productivity enhancement in assessment and to broader retrieval tasks. △ Less

Submitted 25 March, 2014; originally announced March 2014.

Journal ref: The 14th International Conference on Artificial Intelligence in Education 2011

arXiv:1401.5424 [pdf]

Real Time Strategy Language

Authors: Roy Hayes, Peter Beling, William Scherer

Abstract: Real Time Strategy (RTS) games provide complex domain to test the latest artificial intelligence (AI) research. In much of the literature, AI systems have been limited to playing one game. Although, this specialization has resulted in stronger AI gaming systems it does not address the key concerns of AI researcher. AI researchers seek the development of AI agents that can autonomously interpret le… ▽ More Real Time Strategy (RTS) games provide complex domain to test the latest artificial intelligence (AI) research. In much of the literature, AI systems have been limited to playing one game. Although, this specialization has resulted in stronger AI gaming systems it does not address the key concerns of AI researcher. AI researchers seek the development of AI agents that can autonomously interpret learn, and apply new knowledge. To achieve human level performance, current AI systems rely on game specific knowledge of an expert. The paper presents the full RTS language in hopes of shifting the current research focus to the development of general RTS agents. General RTS agents are AI gaming systems that can play any RTS games, defined in the RTS language. This prevents game specific knowledge from being hard coded into the system, thereby facilitating research that addresses the fundamental concerns of artificial intelligence. △ Less

Submitted 21 January, 2014; originally announced January 2014.

arXiv:1309.5946 [pdf, ps, other]

A Complexity of double dummy bridge

Authors: Piotr Beling

Abstract: This paper presents an analysis of complexity of double dummy bridge. Values of both, a state-space (search-space) complexity and a game tree complexity have been estimated. ----- Oszacowanie złożoności problemu rozgrywki w otwarte karty w brydżu Artykuł zawiera analiz{\ke} złożoności problemu rozgrywki w otwarte karty w brydżu, przy użyciu miar zaproponowanych przez Louisa Victora Allisa. O… ▽ More This paper presents an analysis of complexity of double dummy bridge. Values of both, a state-space (search-space) complexity and a game tree complexity have been estimated. ----- Oszacowanie złożoności problemu rozgrywki w otwarte karty w brydżu Artykuł zawiera analiz{\ke} złożoności problemu rozgrywki w otwarte karty w brydżu, przy użyciu miar zaproponowanych przez Louisa Victora Allisa. Oszacowane s{\ka} w nim złożoności przestrzeni stanów i drzewa wspomnianej gry. △ Less

Submitted 23 September, 2013; originally announced September 2013.

Comments: 10 pages, in Polish

arXiv:1309.1307 [pdf, ps, other]

C++11 -- idea r-wartości i przenoszenia

Authors: Piotr Beling

Abstract: This paper presents a review of some new futures introduced to C++ language by ISO/IEC 14882:2011 standard (known as C++11). It describes the ideas of r-values and move constructors. ---- Niniejszy artykuł jest jednym z serii artykułów w których zawarto przegl{\ka}d nowych elementów j{\ke}zyka C++ wprowadzonych przez standard ISO/IEC 14882:2011, znany pod nazw{\ka} C++11. W artykule przedstawi… ▽ More This paper presents a review of some new futures introduced to C++ language by ISO/IEC 14882:2011 standard (known as C++11). It describes the ideas of r-values and move constructors. ---- Niniejszy artykuł jest jednym z serii artykułów w których zawarto przegl{\ka}d nowych elementów j{\ke}zyka C++ wprowadzonych przez standard ISO/IEC 14882:2011, znany pod nazw{\ka} C++11. W artykule przedstawiono nowe możliwości zwi{\ka}zane z przekazywaniem parametrów i pisaniem konstruktorów. Zawarto w nim dokładne omówienie idei r-wartości i przenoszenia obiektów. △ Less

Submitted 5 September, 2013; originally announced September 2013.

Comments: 7 pages, in Polish

arXiv:1304.7600 [pdf, ps, other]

C++11 - określanie typów

Authors: Piotr Beling

Abstract: This paper presents a review of some new futures introduced to C++ language by ISO/IEC 14882:2011 standard (known as C++11). It describes new language elements which allow to easier expressed of types of variables: auto and decltype keywords, new function declaration syntax, and tools which are included in type_traits header. ----- Niniejszy artykuł jest jednym z serii artykułów w których zawa… ▽ More This paper presents a review of some new futures introduced to C++ language by ISO/IEC 14882:2011 standard (known as C++11). It describes new language elements which allow to easier expressed of types of variables: auto and decltype keywords, new function declaration syntax, and tools which are included in type_traits header. ----- Niniejszy artykuł jest jednym z serii artykułów w których zawarto przegl{\ka}d nowych elementów j{\ke}zyka C++ wprowadzonych przez standard ISO/IEC 14882:2011, znany pod nazw{\ka} C++11. W artykule przedstawiono nowe możliwości zwi{\ka}zane ze wskazywaniem typów zmiennych. Opisano słowa kluczowe auto i decltype, now{\ka} składnie deklarowania funkcji/metod oraz narz{\ke}dzia zawarte w pliku nagłówkowym <type_traits>. △ Less

Submitted 29 April, 2013; originally announced April 2013.

Comments: 6 pages, in Polish

arXiv:1301.3630 [pdf, ps, other]

Behavior Pattern Recognition using A New Representation Model

Authors: Qifeng Qiao, Peter A. Beling

Abstract: We study the use of inverse reinforcement learning (IRL) as a tool for the recognition of agents' behavior on the basis of observation of their sequential decision behavior interacting with the environment. We model the problem faced by the agents as a Markov decision process (MDP) and model the observed behavior of the agents in terms of forward planning for the MDP. We use IRL to learn reward fu… ▽ More We study the use of inverse reinforcement learning (IRL) as a tool for the recognition of agents' behavior on the basis of observation of their sequential decision behavior interacting with the environment. We model the problem faced by the agents as a Markov decision process (MDP) and model the observed behavior of the agents in terms of forward planning for the MDP. We use IRL to learn reward functions and then use these reward functions as the basis for clustering or classification models. Experimental studies with GridWorld, a navigation problem, and the secretary problem, an optimal stop** problem, suggest reward vectors found from IRL can be a good basis for behavior pattern recognition problems. Empirical comparisons of our method with several existing IRL algorithms and with direct methods that use feature statistics observed in state-action space suggest it may be superior for recognition problems. △ Less

Submitted 20 March, 2013; v1 submitted 16 January, 2013; originally announced January 2013.

arXiv:1209.4771 [pdf, ps, other]

Counting common substrings effectively

Authors: Stanisław Goldstein, Piotr Beling

Abstract: This article presents effective (dynamic) algorithm for solving a problem of counting the number of substrings of given string which are also substrings of second string. Presented algorithm can be used for example for quick calculation of strings similarity measure using generalized $n$-gram method (Niewiadomski measure), which are shown. Correctness and complexity analyses are included. -----… ▽ More This article presents effective (dynamic) algorithm for solving a problem of counting the number of substrings of given string which are also substrings of second string. Presented algorithm can be used for example for quick calculation of strings similarity measure using generalized $n$-gram method (Niewiadomski measure), which are shown. Correctness and complexity analyses are included. ----- W artykule przedstawiono efektywny (dynamiczny) algorytm wyznaczający miarę podobieństwa wyrazów za pomocą uogólnionej metody $n$-gramów (miary Niewiadomskiego). Uzasadniono także poprawność działania algorytmu i oszacowano jego złożoność obliczeniową. △ Less

Submitted 21 September, 2012; originally announced September 2012.

Comments: 6 pages, in Polish

arXiv:1208.2112 [pdf, ps, other]

Inverse Reinforcement Learning with Gaussian Process

Authors: Qifeng Qiao, Peter A. Beling

Abstract: We present new algorithms for inverse reinforcement learning (IRL, or inverse optimal control) in convex optimization settings. We argue that finite-space IRL can be posed as a convex quadratic program under a Bayesian inference framework with the objective of maximum a posterior estimation. To deal with problems in large or even infinite state space, we propose a Gaussian process model and use pr… ▽ More We present new algorithms for inverse reinforcement learning (IRL, or inverse optimal control) in convex optimization settings. We argue that finite-space IRL can be posed as a convex quadratic program under a Bayesian inference framework with the objective of maximum a posterior estimation. To deal with problems in large or even infinite state space, we propose a Gaussian process model and use preference graphs to represent observations of decision trajectories. Our method is distinguished from other approaches to IRL in that it makes no assumptions about the form of the reward function and yet it retains the promise of computationally manageable implementations for potential real-world applications. In comparison with an establish algorithm on small-scale numerical problems, our method demonstrated better accuracy in apprenticeship learning and a more robust dependence on the number of observations. △ Less

Submitted 21 January, 2013; v1 submitted 10 August, 2012; originally announced August 2012.

Comments: conferencel American Control Conference 2011

Showing 1–28 of 28 results for author: Beling, P