Search | arXiv e-print repository

Learning Environment Models with Continuous Stochastic Dynamics

Authors: Martin Tappler, Edi Muškardin, Bernhard K. Aichernig, Bettina Könighofer

Abstract: Solving control tasks in complex environments automatically through learning offers great potential. While contemporary techniques from deep reinforcement learning (DRL) provide effective solutions, their decision-making is not transparent. We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent. Howeve… ▽ More Solving control tasks in complex environments automatically through learning offers great potential. While contemporary techniques from deep reinforcement learning (DRL) provide effective solutions, their decision-making is not transparent. We aim to provide insights into the decisions faced by the agent by learning an automaton model of environmental behavior under the control of an agent. However, for most control problems, automata learning is not scalable enough to learn a useful model. In this work, we raise the capabilities of automata learning such that it is possible to learn models for environments that have complex and continuous dynamics. The core of the scalability of our method lies in the computation of an abstract state-space representation, by applying dimensionality reduction and clustering on the observed environmental state space. The stochastic transitions are learned via passive automata learning from observed interactions of the agent and the environment. In an iterative model-based RL process, we sample additional trajectories to learn an accurate environment model in the form of a discrete-state Markov decision process (MDP). We apply our automata learning framework on popular RL benchmarking environments in the OpenAI Gym, including LunarLander, CartPole, Mountain Car, and Acrobot. Our results show that the learned models are so precise that they enable the computation of policies solving the respective control tasks. Yet the models are more concise and more general than neural-network-based policies and by using MDPs we benefit from a wealth of tools available for analyzing them. When solving the task of LunarLander, the learned model even achieved similar or higher rewards than deep RL policies learned with stable-baselines3. △ Less

Submitted 29 June, 2023; originally announced June 2023.

arXiv:2306.16854 [pdf, other]

On the Relationship Between RNN Hidden State Vectors and Semantic Ground Truth

Authors: Edi Muškardin, Martin Tappler, Ingo Pill, Bernhard K. Aichernig, Thomas Pock

Abstract: We examine the assumption that the hidden-state vectors of recurrent neural networks (RNNs) tend to form clusters of semantically similar vectors, which we dub the clustering hypothesis. While this hypothesis has been assumed in the analysis of RNNs in recent years, its validity has not been studied thoroughly on modern neural network architectures. We examine the clustering hypothesis in the cont… ▽ More We examine the assumption that the hidden-state vectors of recurrent neural networks (RNNs) tend to form clusters of semantically similar vectors, which we dub the clustering hypothesis. While this hypothesis has been assumed in the analysis of RNNs in recent years, its validity has not been studied thoroughly on modern neural network architectures. We examine the clustering hypothesis in the context of RNNs that were trained to recognize regular languages. This enables us to draw on perfect ground-truth automata in our evaluation, against which we can compare the RNN's accuracy and the distribution of the hidden-state vectors. We start with examining the (piecewise linear) separability of an RNN's hidden-state vectors into semantically different classes. We continue the analysis by computing clusters over the hidden-state vector space with multiple state-of-the-art unsupervised clustering approaches. We formally analyze the accuracy of computed clustering functions and the validity of the clustering hypothesis by determining whether clusters group semantically similar vectors to the same state in the ground-truth model. Our evaluation supports the validity of the clustering hypothesis in the majority of examined cases. We observed that the hidden-state vectors of well-trained RNNs are separable, and that the unsupervised clustering techniques succeed in finding clusters of similar state vectors. △ Less

Submitted 29 June, 2023; originally announced June 2023.

arXiv:2211.16074 [pdf, other]

doi 10.1007/s10703-023-00425-y

Fingerprinting and Analysis of Bluetooth Devices with Automata Learning

Authors: Andrea Pferscher, Bernhard K. Aichernig

Abstract: Automata learning is a technique to automatically infer behavioral models of black-box systems. Today's learning algorithms enable the deduction of models that describe complex system properties, e.g., timed or stochastic behavior. Despite recent improvements in the scalability of learning algorithms, their practical applicability is still an open issue. Little work exists that actually learns mod… ▽ More Automata learning is a technique to automatically infer behavioral models of black-box systems. Today's learning algorithms enable the deduction of models that describe complex system properties, e.g., timed or stochastic behavior. Despite recent improvements in the scalability of learning algorithms, their practical applicability is still an open issue. Little work exists that actually learns models of physical black-box systems. To fill this gap in the literature, we present a case study on applying automata learning on the Bluetooth Low Energy (BLE) protocol. It shows that not only the size of the system limits the applicability of automata learning. Also, the interaction with the system under learning creates a major bottleneck that is rarely discussed. In this article, we propose a general automata learning architecture for learning a behavioral model of the BLE protocol implemented by a physical device. With this framework, we can successfully learn the behavior of six investigated BLE devices. Furthermore, we extended the learning technique to learn security critical behavior, e.g., key-exchange procedures for encrypted communication. The learned models depict several behavioral differences and inconsistencies to the BLE specification. This shows that automata learning can be used for fingerprinting black-box devices, i.e., characterizing systems via their specific learned models. Moreover, learning revealed a crashing scenario for one device. △ Less

Submitted 24 May, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: This version of the article has been accepted for publication, after peer review but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at https://doi.org/10.1007/s10703-023-00425-y

arXiv:2209.14031 [pdf, other]

doi 10.4204/EPTCS.371.1

Active vs. Passive: A Comparison of Automata Learning Paradigms for Network Protocols

Authors: Bernhard K. Aichernig, Edi Muškardin, Andrea Pferscher

Abstract: Active automata learning became a popular tool for the behavioral analysis of communication protocols. The main advantage is that no manual modeling effort is required since a behavioral model is automatically inferred from a black-box system. However, several real-world applications of this technique show that the overhead for the establishment of an active interface might hamper the practical ap… ▽ More Active automata learning became a popular tool for the behavioral analysis of communication protocols. The main advantage is that no manual modeling effort is required since a behavioral model is automatically inferred from a black-box system. However, several real-world applications of this technique show that the overhead for the establishment of an active interface might hamper the practical applicability. Our recent work on the active learning of Bluetooth Low Energy (BLE) protocol found that the active interaction creates a bottleneck during learning. Considering the automata learning toolset, passive learning techniques appear as a promising solution since they do not require an active interface to the system under learning. Instead, models are learned based on a given data set. In this paper, we evaluate passive learning for two network protocols: BLE and Message Queuing Telemetry Transport (MQTT). Our results show that passive techniques can correctly learn with less data than required by active learning. However, a general random data generation for passive learning is more expensive compared to the costs of active learning. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: In Proceedings FMAS2022 ASYDE2022, arXiv:2209.13181

Journal ref: EPTCS 371, 2022, pp. 1-19

arXiv:2206.11708 [pdf, other]

Reinforcement Learning under Partial Observability Guided by Learned Environment Models

Authors: Edi Muskardin, Martin Tappler, Bernhard K. Aichernig, Ingo Pill

Abstract: In practical applications, we can rarely assume full observability of a system's environment, despite such knowledge being important for determining a reactive control system's precise interaction with its environment. Therefore, we propose an approach for reinforcement learning (RL) in partially observable environments. While assuming that the environment behaves like a partially observable Marko… ▽ More In practical applications, we can rarely assume full observability of a system's environment, despite such knowledge being important for determining a reactive control system's precise interaction with its environment. Therefore, we propose an approach for reinforcement learning (RL) in partially observable environments. While assuming that the environment behaves like a partially observable Markov decision process with known discrete actions, we assume no knowledge about its structure or transition probabilities. Our approach combines Q-learning with IoAlergia, a method for learning Markov decision processes (MDP). By learning MDP models of the environment from episodes of the RL agent, we enable RL in partially observable domains without explicit, additional memory to track previous interactions for dealing with ambiguities stemming from partial observability. We instead provide RL with additional observations in the form of abstract environment states by simulating new experiences on learned environment models to track the explored states. In our evaluation, we report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques with recurrent neural networks and fixed memory. △ Less

Submitted 23 June, 2022; originally announced June 2022.

arXiv:2205.04887 [pdf, other]

Search-Based Testing of Reinforcement Learning

Authors: Martin Tappler, Filip Cano Córdoba, Bernhard K. Aichernig, Bettina Könighofer

Abstract: Evaluation of deep reinforcement learning (RL) is inherently challenging. Especially the opaqueness of learned policies and the stochastic nature of both agents and environments make testing the behavior of deep RL agents difficult. We present a search-based testing framework that enables a wide range of novel analysis capabilities for evaluating the safety and performance of deep RL agents. For s… ▽ More Evaluation of deep reinforcement learning (RL) is inherently challenging. Especially the opaqueness of learned policies and the stochastic nature of both agents and environments make testing the behavior of deep RL agents difficult. We present a search-based testing framework that enables a wide range of novel analysis capabilities for evaluating the safety and performance of deep RL agents. For safety testing, our framework utilizes a search algorithm that searches for a reference trace that solves the RL task. The backtracking states of the search, called boundary states, pose safety-critical situations. We create safety test-suites that evaluate how well the RL agent escapes safety-critical situations near these boundary states. For robust performance testing, we create a diverse set of traces via fuzz testing. These fuzz traces are used to bring the agent into a wide variety of potentially unknown states from which the average performance of the agent is compared to the average performance of the fuzz traces. We apply our search-based testing approach on RL for Nintendo's Super Mario Bros. △ Less

Submitted 14 May, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

Comments: 11 pages, 15 figures, Accepted at IJCAI-ECAI 2022 (Main Track)

arXiv:1907.04708 [pdf, other]

Learning a Behavior Model of Hybrid Systems Through Combining Model-Based Testing and Machine Learning (Full Version)

Authors: Bernhard K. Aichernig, Roderick Bloem, Masoud Ebrahimi, Martin Horn, Franz Pernkopf, Wolfgang Roth, Astrid Rupp, Martin Tappler, Markus Tranninger

Abstract: Models play an essential role in the design process of cyber-physical systems. They form the basis for simulation and analysis and help in identifying design problems as early as possible. However, the construction of models that comprise physical and digital behavior is challenging. Therefore, there is considerable interest in learning such hybrid behavior by means of machine learning which requi… ▽ More Models play an essential role in the design process of cyber-physical systems. They form the basis for simulation and analysis and help in identifying design problems as early as possible. However, the construction of models that comprise physical and digital behavior is challenging. Therefore, there is considerable interest in learning such hybrid behavior by means of machine learning which requires sufficient and representative training data covering the behavior of the physical system adequately. In this work, we exploit a combination of automata learning and model-based testing to generate sufficient training data fully automatically. Experimental results on a platooning scenario show that recurrent neural networks learned with this data achieved significantly better results compared to models learned from randomly generated data. In particular, the classification error for crash detection is reduced by a factor of five and a similar F1-score is obtained with up to three orders of magnitude fewer training samples. △ Less

Submitted 10 July, 2019; originally announced July 2019.

Comments: This is an extended version of the conference paper "Learning a Behavior Model of Hybrid Systems Through Combining Model-Based Testing and Machine Learning" accepted for presentation at IFIP-ICTSS 2019, the 31st International Conference on Testing Software and Systems in Paris, France

arXiv:1906.12239 [pdf, ps, other]

L*-Based Learning of Markov Decision Processes (Extended Version)

Authors: Martin Tappler, Bernhard K. Aichernig, Giovanni Bacci, Maria Eichlseder, Kim G. Larsen

Abstract: Automata learning techniques automatically generate system models from test observations. These techniques usually fall into two categories: passive and active. Passive learning uses a predetermined data set, e.g., system logs. In contrast, active learning actively queries the system under learning, which is considered more efficient. An influential active learning technique is Angluin's L* algo… ▽ More Automata learning techniques automatically generate system models from test observations. These techniques usually fall into two categories: passive and active. Passive learning uses a predetermined data set, e.g., system logs. In contrast, active learning actively queries the system under learning, which is considered more efficient. An influential active learning technique is Angluin's L* algorithm for regular languages which inspired several generalisations from DFAs to other automata-based modelling formalisms. In this work, we study L*-based learning of deterministic Markov decision processes, first assuming an ideal setting with perfect information. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling system traces via testing. Experiments with the implementation of our sampling-based algorithm suggest that it achieves better accuracy than state-of-the-art passive learning techniques with the same amount of test data. Unlike existing learning algorithms with predefined states, our algorithm learns the complete model structure including the states. △ Less

Submitted 28 June, 2019; originally announced June 2019.

Comments: an extended version of a conference paper accepted for presentation at FM 2019, the 23rd international symposium on formal methods

arXiv:1904.07075 [pdf, other]

doi 10.1109/ICST.2017.32

Model-Based Testing IoT Communication via Active Automata Learning

Authors: Martin Tappler, Bernhard K. Aichernig, Roderick Bloem

Abstract: This paper presents a learning-based approach to detecting failures in reactive systems. The technique is based on inferring models of multiple implementations of a common specification which are pair-wise cross-checked for equivalence. Any counterexample to equivalence is flagged as suspicious and has to be analysed manually. Hence, it is possible to find possible failures in a semi-automatic way… ▽ More This paper presents a learning-based approach to detecting failures in reactive systems. The technique is based on inferring models of multiple implementations of a common specification which are pair-wise cross-checked for equivalence. Any counterexample to equivalence is flagged as suspicious and has to be analysed manually. Hence, it is possible to find possible failures in a semi-automatic way without prior modelling. We show that the approach is effective by means of a case study. For this case study, we carried out experiments in which we learned models of five implementations of MQTT brokers/servers, a protocol used in the Internet of Things. Examining these models, we found several violations of the MQTT specification. All but one of the considered implementations showed faulty behaviour. In the analysis, we discuss effectiveness and also issues we faced. △ Less

Submitted 15 April, 2019; originally announced April 2019.

arXiv:1808.07744 [pdf, other]

Learning Timed Automata via Genetic Programming

Authors: Martin Tappler, Bernhard K. Aichernig, Kim Guldstrand Larsen, Florian Lorber

Abstract: Model learning has gained increasing interest in recent years. It derives behavioural models from test data of black-box systems. The main advantage offered by such techniques is that they enable model-based analysis without access to the internals of a system. Applications range from fully automated testing over model checking to system understanding. Current work focuses on learning variations o… ▽ More Model learning has gained increasing interest in recent years. It derives behavioural models from test data of black-box systems. The main advantage offered by such techniques is that they enable model-based analysis without access to the internals of a system. Applications range from fully automated testing over model checking to system understanding. Current work focuses on learning variations of finite state machines. However, most techniques consider discrete time. In this paper, we present a method for learning timed automata, finite state machines extended with real-valued clocks. The learning method generates a model consistent with a set of timed traces collected by testing. This generation is based on genetic programming, a search-based technique for automatic program creation. We evaluate our approach on 44 timed systems, comprising four systems from the literature and 40 randomly generated examples. △ Less

Submitted 15 February, 2019; v1 submitted 23 August, 2018; originally announced August 2018.

Comments: added missing link

arXiv:1508.03575 [pdf, other]

Bounded Determinization of Timed Automata with Silent Transitions

Authors: Florian Lorber, Amnon Rosenmann, Dejan Nickovic, Bernhard Aichernig

Abstract: Deterministic timed automata are strictly less expressive than their non-deterministic counterparts, which are again less expressive than those with silent transitions. As a consequence, timed automata are in general non-determinizable. This is unfortunate since deterministic automata play a major role in model-based testing, observability and implementability. However, by bounding the length of t… ▽ More Deterministic timed automata are strictly less expressive than their non-deterministic counterparts, which are again less expressive than those with silent transitions. As a consequence, timed automata are in general non-determinizable. This is unfortunate since deterministic automata play a major role in model-based testing, observability and implementability. However, by bounding the length of the traces in the automaton, effective determinization becomes possible. We propose a novel procedure for bounded determinization of timed automata. The procedure unfolds the automata to bounded trees, removes all silent transitions and determinizes via disjunction of guards. The proposed algorithms are optimized to the bounded setting and thus are more efficient and can handle a larger class of timed automata than the general algorithms. The approach is implemented in a prototype tool and evaluated on several examples. To our best knowledge, this is the first implementation of this type of procedure for timed automata. △ Less

Submitted 14 August, 2015; originally announced August 2015.

Comments: 25 pages

arXiv:1202.6123 [pdf, other]

doi 10.4204/EPTCS.80.7

Towards Symbolic Model-Based Mutation Testing: Combining Reachability and Refinement Checking

Authors: Bernhard K. Aichernig, Elisabeth Jöbstl

Abstract: Model-based mutation testing uses altered test models to derive test cases that are able to reveal whether a modelled fault has been implemented. This requires conformance checking between the original and the mutated model. This paper presents an approach for symbolic conformance checking of action systems, which are well-suited to specify reactive systems. We also consider nondeterminism in our… ▽ More Model-based mutation testing uses altered test models to derive test cases that are able to reveal whether a modelled fault has been implemented. This requires conformance checking between the original and the mutated model. This paper presents an approach for symbolic conformance checking of action systems, which are well-suited to specify reactive systems. We also consider nondeterminism in our models. Hence, we do not check for equivalence, but for refinement. We encode the transition relation as well as the conformance relation as a constraint satisfaction problem and use a constraint solver in our reachability and refinement checking algorithms. Explicit conformance checking techniques often face state space explosion. First experimental evaluations show that our approach has potential to outperform explicit conformance checkers. △ Less

Submitted 28 February, 2012; originally announced February 2012.

Comments: In Proceedings MBT 2012, arXiv:1202.5826

Journal ref: EPTCS 80, 2012, pp. 88-102

arXiv:cs/0407050 [pdf, ps, other]

Modeling and Validating Hybrid Systems Using VDM and Mathematica

Authors: Bernhard K. Aichernig, Reinhold Kainhofer

Abstract: Hybrid systems are characterized by the hybrid evolution of their state: A part of the state changes discretely, the other part changes continuously over time. Typically, modern control applications belong to this class of systems, where a digital controller interacts with a physical environment. In this article we illustrate how a combination of the formal method VDM and the computer algebra sy… ▽ More Hybrid systems are characterized by the hybrid evolution of their state: A part of the state changes discretely, the other part changes continuously over time. Typically, modern control applications belong to this class of systems, where a digital controller interacts with a physical environment. In this article we illustrate how a combination of the formal method VDM and the computer algebra system Mathematica can be used to model and simulate both aspects: the control logic and the physics involved. A new Mathematica package emulating VDM-SL has been developed that allows the integration of differential equation systems into formal specifications. The SAFER example from Kelly (1997) serves to demonstrate the new simulation capabilities Mathematica adds: After the thruster selection process, the astronaut's actual position and velocity is calculated by numerically solving Euler's and Newton's equations for rotation and translation. Furthermore, interactive validation is supported by a graphical user interface and data animation. △ Less

Submitted 20 July, 2004; originally announced July 2004.

Comments: Presented at LfM2000, published in the proceedings

ACM Class: D.2.4

Journal ref: In C.Michael Holloway, editor, Lfm2000, Fifth NASA Langley Formal Methods Workshop, Williamsburg, Virginia, June 2000, number CP-2000-210100, pages 35-46. NASA, June 2000

Showing 1–13 of 13 results for author: Aichernig, B