Search | arXiv e-print repository

PDFA Distillation via String Probability Queries

Authors: Robert Baumgartner, Sicco Verwer

Abstract: Probabilistic deterministic finite automata (PDFA) are discrete event systems modeling conditional probabilities over languages: Given an already seen sequence of tokens they return the probability of tokens of interest to appear next. These types of models have gained interest in the domain of explainable machine learning, where they are used as surrogate models for neural networks trained as lan… ▽ More Probabilistic deterministic finite automata (PDFA) are discrete event systems modeling conditional probabilities over languages: Given an already seen sequence of tokens they return the probability of tokens of interest to appear next. These types of models have gained interest in the domain of explainable machine learning, where they are used as surrogate models for neural networks trained as language models. In this work we present an algorithm to distill PDFA from neural networks. Our algorithm is a derivative of the L# algorithm and capable of learning PDFA from a new type of query, in which the algorithm infers conditional probabilities from the probability of the queried string to occur. We show its effectiveness on a recent public dataset by distilling PDFA from a set of trained neural networks. △ Less

Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

Comments: LearnAUT 2024

arXiv:2406.07208 [pdf, other]

Database-assisted automata learning

Authors: Hielke Walinga, Robert Baumgartner, Sicco Verwer

Abstract: This paper presents DAALder (Database-Assisted Automata Learning, with Dutch suffix from leerder), a new algorithm for learning state machines, or automata, specifically deterministic finite-state automata (DFA). When learning state machines from log data originating from software systems, the large amount of log data can pose a challenge. Conventional state merging algorithms cannot efficiently d… ▽ More This paper presents DAALder (Database-Assisted Automata Learning, with Dutch suffix from leerder), a new algorithm for learning state machines, or automata, specifically deterministic finite-state automata (DFA). When learning state machines from log data originating from software systems, the large amount of log data can pose a challenge. Conventional state merging algorithms cannot efficiently deal with this, as they require a large amount of memory. To solve this, we utilized database technologies to efficiently query a big trace dataset and construct a state machine from it, as databases allow to save large amounts of data on disk while still being able to query it efficiently. Building on research in both active learning and passive learning, the proposed algorithm is a combination of the two. It can quickly find a characteristic set of traces from a database using heuristics from a state merging algorithm. Experiments show that our algorithm has similar performance to conventional state merging algorithms on large datasets, but requires far less memory. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 8 pages body, 12 pages total, LearnAut 2024 Keywords: Active/Passive state machine learning, Incomplete Minimally Adequate Teacher

arXiv:2401.09838 [pdf, other]

doi 10.1145/3639478.3640022

CATMA: Conformance Analysis Tool For Microservice Applications

Authors: Clinton Cao, Simon Schneider, Nicolás E. Díaz Ferreyra, Sicco Verwer, Annibale Panichella, Riccardo Scandariato

Abstract: The microservice architecture allows developers to divide the core functionality of their software system into multiple smaller services. However, this architectural style also makes it harder for them to debug and assess whether the system's deployment conforms to its implementation. We present CATMA, an automated tool that detects non-conformances between the system's deployment and implementati… ▽ More The microservice architecture allows developers to divide the core functionality of their software system into multiple smaller services. However, this architectural style also makes it harder for them to debug and assess whether the system's deployment conforms to its implementation. We present CATMA, an automated tool that detects non-conformances between the system's deployment and implementation. It automatically visualizes and generates potential interpretations for the detected discrepancies. Our evaluation of CATMA shows promising results in terms of performance and providing useful insights. CATMA is available at \url{https://cyber-analytics.nl/catma.github.io/}, and a demonstration video is available at \url{https://youtu.be/WKP1hG-TDKc}. △ Less

Submitted 23 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: 5 pages, 5 figures, ICSE '24 Demonstration Track

arXiv:2305.15394 [pdf, other]

Differentially-Private Decision Trees and Provable Robustness to Data Poisoning

Authors: Daniël Vos, Jelle Vos, Tianyu Li, Zekeriya Erkin, Sicco Verwer

Abstract: Decision trees are interpretable models that are well-suited to non-linear learning problems. Much work has been done on extending decision tree learning algorithms with differential privacy, a system that guarantees the privacy of samples within the training data. However, current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit. These solutions crea… ▽ More Decision trees are interpretable models that are well-suited to non-linear learning problems. Much work has been done on extending decision tree learning algorithms with differential privacy, a system that guarantees the privacy of samples within the training data. However, current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit. These solutions create random decision nodes that reduce decision tree accuracy or spend an excessive share of the privacy budget on labeling leaves. Moreover, many works do not support continuous features or leak information about them. We propose a new method called PrivaTree based on private histograms that chooses good splits while consuming a small privacy budget. The resulting trees provide a significantly better privacy-utility trade-off and accept mixed numerical and categorical data without leaking information about numerical features. Finally, while it is notoriously hard to give robustness guarantees against data poisoning attacks, we demonstrate bounds for the expected accuracy and success rates of backdoor attacks against differentially-private learners. By leveraging the better privacy-utility trade-off of PrivaTree we are able to train decision trees with significantly better robustness against backdoor attacks compared to regular decision trees and with meaningful theoretical guarantees. △ Less

Submitted 12 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

arXiv:2301.13185 [pdf, other]

doi 10.24963/ijcai.2023/606

Optimal Decision Tree Policies for Markov Decision Processes

Authors: Daniël Vos, Sicco Verwer

Abstract: Interpretability of reinforcement learning policies is essential for many real-world tasks but learning such interpretable policies is a hard problem. Particularly rule-based policies such as decision trees and rules lists are difficult to optimize due to their non-differentiability. While existing techniques can learn verifiable decision tree policies there is no guarantee that the learners gener… ▽ More Interpretability of reinforcement learning policies is essential for many real-world tasks but learning such interpretable policies is a hard problem. Particularly rule-based policies such as decision trees and rules lists are difficult to optimize due to their non-differentiability. While existing techniques can learn verifiable decision tree policies there is no guarantee that the learners generate a decision that performs optimally. In this work, we study the optimization of size-limited decision trees for Markov Decision Processes (MPDs) and propose OMDTs: Optimal MDP Decision Trees. Given a user-defined size limit and MDP formulation OMDT directly maximizes the expected discounted return for the decision tree using Mixed-Integer Linear Programming. By training optimal decision tree policies for different MDPs we empirically study the optimality gap for existing imitation learning techniques and find that they perform sub-optimally. We show that this is due to an inherent shortcoming of imitation learning, namely that complex policies cannot be represented using size-limited trees. In such cases, it is better to directly optimize the tree for expected return. While there is generally a trade-off between the performance and interpretability of machine learning models, we find that OMDTs limited to a depth of 3 often perform close to the optimal limit. △ Less

Submitted 13 February, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

Journal ref: Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence Main Track, 5457-5465 (2023)

arXiv:2208.10605 [pdf, other]

SoK: Explainable Machine Learning for Computer Security Applications

Authors: Azqa Nadeem, Daniël Vos, Clinton Cao, Luca Pajola, Simon Dieck, Robert Baumgartner, Sicco Verwer

Abstract: Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine learning (ML) pipelines. We systematize the increasingly growing (but fragmented) microcosm of studies that develop and utilize XAI methods for defensive and offensive cybersecurity tasks. We identify 3 cybersecurity stakeholders, i.e., model users, designers, and adversaries, who utilize XAI for 4 distinct objec… ▽ More Explainable Artificial Intelligence (XAI) aims to improve the transparency of machine learning (ML) pipelines. We systematize the increasingly growing (but fragmented) microcosm of studies that develop and utilize XAI methods for defensive and offensive cybersecurity tasks. We identify 3 cybersecurity stakeholders, i.e., model users, designers, and adversaries, who utilize XAI for 4 distinct objectives within an ML pipeline, namely 1) XAI-enabled user assistance, 2) XAI-enabled model verification, 3) explanation verification & robustness, and 4) offensive use of explanations. Our analysis of the literature indicates that many of the XAI applications are designed with little understanding of how they might be integrated into analyst workflows -- user studies for explanation evaluation are conducted in only 14% of the cases. The security literature sometimes also fails to disentangle the role of the various stakeholders, e.g., by providing explanations to model users and designers while also exposing them to adversaries. Additionally, the role of model designers is particularly minimized in the security literature. To this end, we present an illustrative tutorial for model designers, demonstrating how XAI can help with model verification. We also discuss scenarios where interpretability by design may be a better alternative. The systematization and the tutorial enable us to challenge several assumptions, and present open problems that can help shape the future of XAI research within cybersecurity. △ Less

Submitted 3 March, 2023; v1 submitted 22 August, 2022; originally announced August 2022.

Comments: 13 pages. Accepted at Euro S&P

arXiv:2207.12087 [pdf, other]

doi 10.1145/3538969.3543810

Learning State Machines to Monitor and Detect Anomalies on a Kubernetes Cluster

Authors: Clinton Cao, Agathe Blaise, Sicco Verwer, Filippo Rebecchi

Abstract: These days more companies are shifting towards using cloud environments to provide their services to their client. While it is easy to set up a cloud environment, it is equally important to monitor the system's runtime behaviour and identify anomalous behaviours that occur during its operation. In recent years, the utilisation of \ac{rnn} and \ac{dnn} to detect anomalies that might occur during ru… ▽ More These days more companies are shifting towards using cloud environments to provide their services to their client. While it is easy to set up a cloud environment, it is equally important to monitor the system's runtime behaviour and identify anomalous behaviours that occur during its operation. In recent years, the utilisation of \ac{rnn} and \ac{dnn} to detect anomalies that might occur during runtime has been a trending approach. However, it is unclear how to explain the decisions made by these networks and how these networks should be interpreted to understand the runtime behaviour that they model. On the contrary, state machine models provide an easier manner to interpret and understand the behaviour that they model. In this work, we propose an approach that learns state machine models to model the runtime behaviour of a cloud environment that runs multiple microservice applications. To the best of our knowledge, this is the first work that tries to apply state machine models to microservice architectures. The state machine model is used to detect the different types of attacks that we launch on the cloud environment. From our experiment results, our approach can detect the attacks very well, achieving a balanced accuracy of 99.2% and an F1 score of 0.982. △ Less

Submitted 28 June, 2022; originally announced July 2022.

Comments: 9 pages, 12 figures, workshop paper

arXiv:2207.03890 [pdf, other]

ENCODE: Encoding NetFlows for Network Anomaly Detection

Authors: Clinton Cao, Annibale Panichella, Sicco Verwer, Agathe Blaise, Filippo Rebecchi

Abstract: NetFlow data is a popular network log format used by many network analysts and researchers. The advantages of using NetFlow over deep packet inspection are that it is easier to collect and process, and it is less privacy intrusive. Many works have used machine learning to detect network attacks using NetFlow data. The first step for these machine learning pipelines is to pre-process the data befor… ▽ More NetFlow data is a popular network log format used by many network analysts and researchers. The advantages of using NetFlow over deep packet inspection are that it is easier to collect and process, and it is less privacy intrusive. Many works have used machine learning to detect network attacks using NetFlow data. The first step for these machine learning pipelines is to pre-process the data before it is given to the machine learning algorithm. Many approaches exist to pre-process NetFlow data; however, these simply apply existing methods to the data, not considering the specific properties of network data. We argue that for data originating from software systems, such as NetFlow or software logs, similarities in frequency and contexts of feature values are more important than similarities in the value itself. In this work, we propose an encoding algorithm that directly takes the frequency and the context of the feature values into account when the data is being processed. Different types of network behaviours can be clustered using this encoding, thus aiding the process of detecting anomalies within the network. We train several machine learning models for anomaly detection using the data that has been encoded with our encoding algorithm. We evaluate the effectiveness of our encoding on a new dataset that we created for network attacks on Kubernetes clusters and two well-known public NetFlow datasets. We empirically demonstrate that the machine learning models benefit from using our encoding for anomaly detection. △ Less

Submitted 4 August, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

Comments: 11 pages, 17 figures

arXiv:2207.01516 [pdf, other]

Learning state machines via efficient hashing of future traces

Authors: Robert Baumgartner, Sicco Verwer

Abstract: State machines are popular models to model and visualize discrete systems such as software systems, and to represent regular grammars. Most algorithms that passively learn state machines from data assume all the data to be available from the beginning and they load this data into memory. This makes it hard to apply them to continuously streaming data and results in large memory requirements when d… ▽ More State machines are popular models to model and visualize discrete systems such as software systems, and to represent regular grammars. Most algorithms that passively learn state machines from data assume all the data to be available from the beginning and they load this data into memory. This makes it hard to apply them to continuously streaming data and results in large memory requirements when dealing with large datasets. In this paper we propose a method to learn state machines from data streams using the count-min-sketch data structure to reduce memory requirements. We apply state merging using the well-known red-blue-framework to reduce the search space. We implemented our approach in an established framework for learning state machines, and evaluated it on a well know dataset to provide experimental data, showing the effectiveness of our approach with respect to quality of the results and run-time. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:2206.12190 [pdf, other]

SECLEDS: Sequence Clustering in Evolving Data Streams via Multiple Medoids and Medoid Voting

Authors: Azqa Nadeem, Sicco Verwer

Abstract: Sequence clustering in a streaming environment is challenging because it is computationally expensive, and the sequences may evolve over time. K-medoids or Partitioning Around Medoids (PAM) is commonly used to cluster sequences since it supports alignment-based distances, and the k-centers being actual data items helps with cluster interpretability. However, offline k-medoids has no support for co… ▽ More Sequence clustering in a streaming environment is challenging because it is computationally expensive, and the sequences may evolve over time. K-medoids or Partitioning Around Medoids (PAM) is commonly used to cluster sequences since it supports alignment-based distances, and the k-centers being actual data items helps with cluster interpretability. However, offline k-medoids has no support for concept drift, while also being prohibitively expensive for clustering data streams. We therefore propose SECLEDS, a streaming variant of the k-medoids algorithm with constant memory footprint. SECLEDS has two unique properties: i) it uses multiple medoids per cluster, producing stable high-quality clusters, and ii) it handles concept drift using an intuitive Medoid Voting scheme for approximating cluster distances. Unlike existing adaptive algorithms that create new clusters for new concepts, SECLEDS follows a fundamentally different approach, where the clusters themselves evolve with an evolving stream. Using real and synthetic datasets, we empirically demonstrate that SECLEDS produces high-quality clusters regardless of drift, stream size, data dimensionality, and number of clusters. We compare against three popular stream and batch clustering algorithms. The state-of-the-art BanditPAM is used as an offline benchmark. SECLEDS achieves comparable F1 score to BanditPAM while reducing the number of required distance computations by 83.7%. Importantly, SECLEDS outperforms all baselines by 138.7% when the stream contains drift. We also cluster real network traffic, and provide evidence that SECLEDS can support network bandwidths of up to 1.08 Gbps while using the (expensive) dynamic time war** distance. △ Less

Submitted 24 June, 2022; originally announced June 2022.

Comments: Accepted to appear in ECML/PKDD 2022

arXiv:2206.07776 [pdf, other]

Robust Attack Graph Generation

Authors: Dennis Mouwen, Sicco Verwer, Azqa Nadeem

Abstract: We present a method to learn automaton models that are more robust to input modifications. It iteratively aligns sequences to a learned model, modifies the sequences to their aligned versions, and re-learns the model. Automaton learning algorithms are typically very good at modeling the frequent behavior of a software system. Our solution can be used to also learn the behavior present in infrequen… ▽ More We present a method to learn automaton models that are more robust to input modifications. It iteratively aligns sequences to a learned model, modifies the sequences to their aligned versions, and re-learns the model. Automaton learning algorithms are typically very good at modeling the frequent behavior of a software system. Our solution can be used to also learn the behavior present in infrequent sequences, as these will be aligned to the frequent ones represented by the model. We apply our method to the SAGE tool for modeling attacker behavior from intrusion alerts. In experiments, we demonstrate that our algorithm learns models that can handle noise such as added and removed symbols from sequences. Furthermore, it learns more concise models that fit better to the training data. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Comments: Appeared at LearnAut '22

arXiv:2203.16331 [pdf, other]

FlexFringe: Modeling Software Behavior by Learning Probabilistic Automata

Authors: Sicco Verwer, Christian Hammerschmidt

Abstract: We present the efficient implementations of probabilistic deterministic finite automaton learning methods available in FlexFringe. These implement well-known strategies for state-merging including several modifications to improve their performance in practice. We show experimentally that these algorithms obtain competitive results and significant improvements over a default implementation. We also… ▽ More We present the efficient implementations of probabilistic deterministic finite automaton learning methods available in FlexFringe. These implement well-known strategies for state-merging including several modifications to improve their performance in practice. We show experimentally that these algorithms obtain competitive results and significant improvements over a default implementation. We also demonstrate how to use FlexFringe to learn interpretable models from software logs and use these for anomaly detection. Although less interpretable, we show that learning smaller more convoluted models improves the performance of FlexFringe on anomaly detection, outperforming an existing solution based on neural nets. △ Less

Submitted 24 August, 2023; v1 submitted 28 March, 2022; originally announced March 2022.

arXiv:2201.10453 [pdf, other]

The First AI4TSP Competition: Learning to Solve Stochastic Routing Problems

Authors: Laurens Bliek, Paulo da Costa, Reza Refaei Afshar, Yingqian Zhang, Tom Catshoek, Daniël Vos, Sicco Verwer, Fynn Schmitt-Ulms, André Hottung, Tapan Shah, Meinolf Sellmann, Kevin Tierney, Carl Perreault-Lafleur, Caroline Leboeuf, Federico Bobbio, Justine Pepin, Warley Almeida Silva, Ricardo Gama, Hugo L. Fernandes, Martin Zaefferer, Manuel López-Ibáñez, Ekhine Irurozki

Abstract: This paper reports on the first international competition on AI for the traveling salesman problem (TSP) at the International Joint Conference on Artificial Intelligence 2021 (IJCAI-21). The TSP is one of the classical combinatorial optimization problems, with many variants inspired by real-world applications. This first competition asked the participants to develop algorithms to solve a time-depe… ▽ More This paper reports on the first international competition on AI for the traveling salesman problem (TSP) at the International Joint Conference on Artificial Intelligence 2021 (IJCAI-21). The TSP is one of the classical combinatorial optimization problems, with many variants inspired by real-world applications. This first competition asked the participants to develop algorithms to solve a time-dependent orienteering problem with stochastic weights and time windows (TD-OPSWTW). It focused on two types of learning approaches: surrogate-based optimization and deep reinforcement learning. In this paper, we describe the problem, the setup of the competition, the winning methods, and give an overview of the results. The winning methods described in this work have advanced the state-of-the-art in using AI for stochastic routing problems. Overall, by organizing this competition we have introduced routing problems as an interesting problem setting for AI researchers. The simulator of the problem has been made open-source and can be used by other researchers as a benchmark for new AI methods. △ Less

Submitted 25 January, 2022; originally announced January 2022.

Comments: 21 pages

MSC Class: 68T05

arXiv:2109.03857 [pdf, other]

Robust Optimal Classification Trees Against Adversarial Examples

Authors: Daniël Vos, Sicco Verwer

Abstract: Decision trees are a popular choice of explainable model, but just like neural networks, they suffer from adversarial examples. Existing algorithms for fitting decision trees robust against adversarial examples are greedy heuristics and lack approximation guarantees. In this paper we propose ROCT, a collection of methods to train decision trees that are optimally robust against user-specified atta… ▽ More Decision trees are a popular choice of explainable model, but just like neural networks, they suffer from adversarial examples. Existing algorithms for fitting decision trees robust against adversarial examples are greedy heuristics and lack approximation guarantees. In this paper we propose ROCT, a collection of methods to train decision trees that are optimally robust against user-specified attack models. We show that the min-max optimization problem that arises in adversarial learning can be solved using a single minimization formulation for decision trees with 0-1 loss. We propose such formulations in Mixed-Integer Linear Programming and Maximum Satisfiability, which widely available solvers can optimize. We also present a method that determines the upper bound on adversarial accuracy for any model using bipartite matching. Our experimental results demonstrate that the existing heuristics achieve close to optimal scores while ROCT achieves state-of-the-art scores. △ Less

Submitted 8 September, 2021; originally announced September 2021.

arXiv:2107.02783 [pdf, other]

SAGE: Intrusion Alert-driven Attack Graph Extractor

Authors: Azqa Nadeem, Sicco Verwer, Shanchieh Jay Yang

Abstract: Attack graphs (AG) are used to assess pathways availed by cyber adversaries to penetrate a network. State-of-the-art approaches for AG generation focus mostly on deriving dependencies between system vulnerabilities based on network scans and expert knowledge. In real-world operations however, it is costly and ineffective to rely on constant vulnerability scanning and expert-crafted AGs. We propose… ▽ More Attack graphs (AG) are used to assess pathways availed by cyber adversaries to penetrate a network. State-of-the-art approaches for AG generation focus mostly on deriving dependencies between system vulnerabilities based on network scans and expert knowledge. In real-world operations however, it is costly and ineffective to rely on constant vulnerability scanning and expert-crafted AGs. We propose to automatically learn AGs based on actions observed through intrusion alerts, without prior expert knowledge. Specifically, we develop an unsupervised sequence learning system, SAGE, that leverages the temporal and probabilistic dependence between alerts in a suffix-based probabilistic deterministic finite automaton (S-PDFA) -- a model that accentuates infrequent severe alerts and summarizes paths leading to them. AGs are then derived from the S-PDFA on a per-objective, per-victim basis. Tested with intrusion alerts collected through Collegiate Penetration Testing Competition, SAGE compresses over 330k alerts into 93 AGs. These AGs reflect the strategies used by the participating teams. The AGs are succinct, interpretable, and capture behavioral dynamics, e.g., that attackers will often follow shorter paths to re-exploit objectives. △ Less

Submitted 14 October, 2021; v1 submitted 6 July, 2021; originally announced July 2021.

Comments: Appeared at VizSec '21 (proceedings) and KDD AI4Cyber '21 (without proceedings)

arXiv:2106.04618 [pdf, other]

doi 10.1016/j.asoc.2023.110744

EXPObench: Benchmarking Surrogate-based Optimisation Algorithms on Expensive Black-box Functions

Authors: Laurens Bliek, Arthur Guijt, Rickard Karlsson, Sicco Verwer, Mathijs de Weerdt

Abstract: Surrogate algorithms such as Bayesian optimisation are especially designed for black-box optimisation problems with expensive objectives, such as hyperparameter tuning or simulation-based optimisation. In the literature, these algorithms are usually evaluated with synthetic benchmarks which are well established but have no expensive objective, and only on one or two real-life applications which va… ▽ More Surrogate algorithms such as Bayesian optimisation are especially designed for black-box optimisation problems with expensive objectives, such as hyperparameter tuning or simulation-based optimisation. In the literature, these algorithms are usually evaluated with synthetic benchmarks which are well established but have no expensive objective, and only on one or two real-life applications which vary wildly between papers. There is a clear lack of standardisation when it comes to benchmarking surrogate algorithms on real-life, expensive, black-box objective functions. This makes it very difficult to draw conclusions on the effect of algorithmic contributions and to give substantial advice on which method to use when. A new benchmark library, EXPObench, provides first steps towards such a standardisation. The library is used to provide an extensive comparison of six different surrogate algorithms on four expensive optimisation problems from different real-life applications. This has led to new insights regarding the relative importance of exploration, the evaluation time of the objective, and the used model. We also provide rules of thumb for which surrogate algorithm to use in which situation. A further contribution is that we make the algorithms and benchmark problem instances publicly available, contributing to more uniform analysis of surrogate algorithms. Most importantly, we include the performance of the six algorithms on all evaluated problem instances. This results in a unique new dataset that lowers the bar for researching new methods as the number of expensive evaluations required for comparison is significantly reduced. △ Less

Submitted 1 December, 2022; v1 submitted 8 June, 2021; originally announced June 2021.

Comments: 33 pages

arXiv:2012.10438 [pdf, other]

Efficient Training of Robust Decision Trees Against Adversarial Examples

Authors: Daniël Vos, Sicco Verwer

Abstract: In the present day we use machine learning for sensitive tasks that require models to be both understandable and robust. Although traditional models such as decision trees are understandable, they suffer from adversarial attacks. When a decision tree is used to differentiate between a user's benign and malicious behavior, an adversarial attack allows the user to effectively evade the model by pert… ▽ More In the present day we use machine learning for sensitive tasks that require models to be both understandable and robust. Although traditional models such as decision trees are understandable, they suffer from adversarial attacks. When a decision tree is used to differentiate between a user's benign and malicious behavior, an adversarial attack allows the user to effectively evade the model by perturbing the inputs the model receives. We can use algorithms that take adversarial attacks into account to fit trees that are more robust. In this work we propose an algorithm, GROOT, that is two orders of magnitude faster than the state-of-the-art-work while scoring competitively on accuracy against adversaries. GROOT accepts an intuitive and permissible threat model. Where previous threat models were limited to distance norms, we allow each feature to be perturbed with a user-specified parameter: either a maximum distance or constraints on the direction of perturbation. Previous works assumed that both benign and malicious users attempt model evasion but we allow the user to select which classes perform adversarial attacks. Additionally, we introduce a hyperparameter rho that allows GROOT to trade off performance in the regular and adversarial settings. △ Less

Submitted 18 December, 2020; originally announced December 2020.

arXiv:2011.03431 [pdf, other]

doi 10.1007/978-3-030-76640-5_4

Continuous surrogate-based optimization algorithms are well-suited for expensive discrete problems

Authors: Rickard Karlsson, Laurens Bliek, Sicco Verwer, Mathijs de Weerdt

Abstract: One method to solve expensive black-box optimization problems is to use a surrogate model that approximates the objective based on previous observed evaluations. The surrogate, which is cheaper to evaluate, is optimized instead to find an approximate solution to the original problem. In the case of discrete problems, recent research has revolved around surrogate models that are specifically constr… ▽ More One method to solve expensive black-box optimization problems is to use a surrogate model that approximates the objective based on previous observed evaluations. The surrogate, which is cheaper to evaluate, is optimized instead to find an approximate solution to the original problem. In the case of discrete problems, recent research has revolved around surrogate models that are specifically constructed to deal with discrete structures. A main motivation is that literature considers continuous methods, such as Bayesian optimization with Gaussian processes as the surrogate, to be sub-optimal (especially in higher dimensions) because they ignore the discrete structure by, e.g., rounding off real-valued solutions to integers. However, we claim that this is not true. In fact, we present empirical evidence showing that the use of continuous surrogate models displays competitive performance on a set of high-dimensional discrete benchmark problems, including a real-life application, against state-of-the-art discrete surrogate-based methods. Our experiments on different discrete structures and time constraints also give more insight into which algorithms work well on which type of problem. △ Less

Submitted 30 November, 2020; v1 submitted 6 November, 2020; originally announced November 2020.

Comments: 16 pages, 3 figures; added keywords, typos corrected, additional information in Figure 3 but results unchanged

Journal ref: Proceedings of BNAIC/BeneLearn (2020), 88-102

arXiv:2006.04508 [pdf, other]

doi 10.1145/3449726.3463136

Black-box Mixed-Variable Optimisation using a Surrogate Model that Satisfies Integer Constraints

Authors: Laurens Bliek, Arthur Guijt, Sicco Verwer, Mathijs de Weerdt

Abstract: A challenging problem in both engineering and computer science is that of minimising a function for which we have no mathematical formulation available, that is expensive to evaluate, and that contains continuous and integer variables, for example in automatic algorithm configuration. Surrogate-based algorithms are very suitable for this type of problem, but most existing techniques are designed w… ▽ More A challenging problem in both engineering and computer science is that of minimising a function for which we have no mathematical formulation available, that is expensive to evaluate, and that contains continuous and integer variables, for example in automatic algorithm configuration. Surrogate-based algorithms are very suitable for this type of problem, but most existing techniques are designed with only continuous or only discrete variables in mind. Mixed-Variable ReLU-based Surrogate Modelling (MVRSM) is a surrogate-based algorithm that uses a linear combination of rectified linear units, defined in such a way that (local) optima satisfy the integer constraints. This method outperforms the state of the art on several synthetic benchmarks with up to 238 continuous and integer variables, and achieves competitive performance on two real-life benchmarks: XGBoost hyperparameter tuning and Electrostatic Precipitator optimisation. △ Less

Submitted 15 September, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

Comments: Ann Math Artif Intell (2020)

Journal ref: Proceedings of the Genetic and Evolutionary Computation Conference Companion 2021

arXiv:1911.08817 [pdf, other]

doi 10.1007/s10472-020-09712-4

Black-box Combinatorial Optimization using Models with Integer-valued Minima

Authors: Laurens Bliek, Sicco Verwer, Mathijs de Weerdt

Abstract: When a black-box optimization objective can only be evaluated with costly or noisy measurements, most standard optimization algorithms are unsuited to find the optimal solution. Specialized algorithms that deal with exactly this situation make use of surrogate models. These models are usually continuous and smooth, which is beneficial for continuous optimization problems, but not necessarily for c… ▽ More When a black-box optimization objective can only be evaluated with costly or noisy measurements, most standard optimization algorithms are unsuited to find the optimal solution. Specialized algorithms that deal with exactly this situation make use of surrogate models. These models are usually continuous and smooth, which is beneficial for continuous optimization problems, but not necessarily for combinatorial problems. However, by choosing the basis functions of the surrogate model in a certain way, we show that it can be guaranteed that the optimal solution of the surrogate model is integer. This approach outperforms random search, simulated annealing and one Bayesian optimization algorithm on the problem of finding robust routes for a noise-perturbed traveling salesman benchmark problem, with similar performance as another Bayesian optimization algorithm, and outperforms all compared algorithms on a convex binary optimization problem with a large number of variables. △ Less

Submitted 20 November, 2019; originally announced November 2019.

Journal ref: Annals of Mathematics and Artificial Intelligence 89, 639-653 (2021)

arXiv:1910.13526 [pdf, other]

Learning a Safety Verifiable Adaptive Cruise Controller from Human Driving Data

Authors: Qin Lin, Sicco Verwer, John Dolan

Abstract: Imitation learning provides a way to automatically construct a controller by mimicking human behavior from data. For safety-critical systems such as autonomous vehicles, it can be problematic to use controllers learned from data because they cannot be guaranteed to be collision-free. Recently, a method has been proposed for learning a multi-mode hybrid automaton cruise controller (MOHA). Besides b… ▽ More Imitation learning provides a way to automatically construct a controller by mimicking human behavior from data. For safety-critical systems such as autonomous vehicles, it can be problematic to use controllers learned from data because they cannot be guaranteed to be collision-free. Recently, a method has been proposed for learning a multi-mode hybrid automaton cruise controller (MOHA). Besides being accurate, the logical nature of this model makes it suitable for formal verification. In this paper, we demonstrate this capability using the SpaceEx hybrid model checker as follows. After learning, we translate the automaton model into constraints and equations required by SpaceEx. We then verify that a pure MOHA controller is not collision-free. By adding a safety state based on headway in time, a rule that human drivers should follow anyway, we do obtain a provably safe cruise control. Moreover, the safe controller remains more human-like than existing cruise controllers. △ Less

Submitted 29 October, 2019; originally announced October 2019.

arXiv:1904.01371 [pdf, other]

doi 10.1007/978-3-030-62582-5_15

Beyond Labeling: Using Clustering to Build Network Behavioral Profiles of Malware Families

Authors: Azqa Nadeem, Christian Hammerschmidt, Carlos H. Gañán, Sicco Verwer

Abstract: Malware family labels are known to be inconsistent. They are also black-box since they do not represent the capabilities of malware. The current state-of-the-art in malware capability assessment include mostly manual approaches, which are infeasible due to the ever-increasing volume of discovered malware samples. We propose a novel unsupervised machine learning-based method called MalPaCA, which a… ▽ More Malware family labels are known to be inconsistent. They are also black-box since they do not represent the capabilities of malware. The current state-of-the-art in malware capability assessment include mostly manual approaches, which are infeasible due to the ever-increasing volume of discovered malware samples. We propose a novel unsupervised machine learning-based method called MalPaCA, which automates capability assessment by clustering the temporal behavior in malware's network traces. MalPaCA provides meaningful behavioral clusters using only 20 packet headers. Behavioral profiles are generated based on the cluster membership of malware's network traces. A Directed Acyclic Graph shows the relationship between malwares according to their overlap** behaviors. The behavioral profiles together with the DAG provide more insightful characterization of malware than current family designations. We also propose a visualization-based evaluation method for the obtained clusters to assist practitioners in understanding the clustering results. We apply MalPaCA on a financial malware dataset collected in the wild that comprises of 1.1k malware samples resulting in 3.6M packets. Our experiments show that (i) MalPaCA successfully identifies capabilities, such as port scans and reuse of Command and Control servers; (ii) It uncovers multiple discrepancies between behavioral clusters and malware family labels; and (iii) It demonstrates the effectiveness of clustering traces using temporal features by producing an error rate of 8.3%, compared to 57.5% obtained from statistical features. △ Less

Submitted 13 November, 2020; v1 submitted 2 April, 2019; originally announced April 2019.

Comments: Accepted as a chapter in Springer MAAIDL 2020

arXiv:1707.09430 [pdf, ps, other]

Human in the Loop: Interactive Passive Automata Learning via Evidence-Driven State-Merging Algorithms

Authors: Christian A. Hammerschmidt, Radu State, Sicco Verwer

Abstract: We present an interactive version of an evidence-driven state-merging (EDSM) algorithm for learning variants of finite state automata. Learning these automata often amounts to recovering or reverse engineering the model generating the data despite noisy, incomplete, or imperfectly sampled data sources rather than optimizing a purely numeric target function. Domain expertise and human knowledge abo… ▽ More We present an interactive version of an evidence-driven state-merging (EDSM) algorithm for learning variants of finite state automata. Learning these automata often amounts to recovering or reverse engineering the model generating the data despite noisy, incomplete, or imperfectly sampled data sources rather than optimizing a purely numeric target function. Domain expertise and human knowledge about the target domain can guide this process, and typically is captured in parameter settings. Often, domain expertise is subconscious and not expressed explicitly. Directly interacting with the learning algorithm makes it easier to utilize this knowledge effectively. △ Less

Submitted 28 July, 2017; originally announced July 2017.

Comments: 4 pages, presented at the Human in the Loop workshop at ICML 2017

arXiv:1706.01663 [pdf, other]

Learning Pairwise Disjoint Simple Languages from Positive Examples

Authors: Alexis Linard, Rick Smetsers, Frits Vaandrager, Umar Waqas, Joost van Pinxten, Sicco Verwer

Abstract: A classical problem in grammatical inference is to identify a deterministic finite automaton (DFA) from a set of positive and negative examples. In this paper, we address the related - yet seemingly novel - problem of identifying a set of DFAs from examples that belong to different unknown simple regular languages. We propose two methods based on compression for clustering the observed positive ex… ▽ More A classical problem in grammatical inference is to identify a deterministic finite automaton (DFA) from a set of positive and negative examples. In this paper, we address the related - yet seemingly novel - problem of identifying a set of DFAs from examples that belong to different unknown simple regular languages. We propose two methods based on compression for clustering the observed positive examples. We apply our methods to a set of print jobs submitted to large industrial printers. △ Less

Submitted 6 June, 2017; originally announced June 2017.

Comments: This paper has been accepted at the Learning and Automata (LearnAut) Workshop, LICS 2017 (Reykjavik, Iceland)

arXiv:1705.09650 [pdf, other]

Anomaly Detection in a Digital Video Broadcasting System Using Timed Automata

Authors: Xiaoran Liu, Qin Lin, Sicco Verwer, Dmitri Jarnikov

Abstract: This paper focuses on detecting anomalies in a digital video broadcasting (DVB) system from providers' perspective. We learn a probabilistic deterministic real timed automaton profiling benign behavior of encryption control in the DVB control access system. This profile is used as a one-class classifier. Anomalous items in a testing sequence are detected when the sequence is not accepted by the le… ▽ More This paper focuses on detecting anomalies in a digital video broadcasting (DVB) system from providers' perspective. We learn a probabilistic deterministic real timed automaton profiling benign behavior of encryption control in the DVB control access system. This profile is used as a one-class classifier. Anomalous items in a testing sequence are detected when the sequence is not accepted by the learned model. △ Less

Submitted 24 May, 2017; originally announced May 2017.

Comments: This paper has been accepted by the Thirty-Second Annual ACM/IEEE Symposium on Logic in Computer Science (LICS) Workshop on Learning and Automata (LearnAut)

arXiv:1611.07100 [pdf, other]

Interpreting Finite Automata for Sequential Data

Authors: Christian Albert Hammerschmidt, Sicco Verwer, Qin Lin, Radu State

Abstract: Automaton models are often seen as interpretable models. Interpretability itself is not well defined: it remains unclear what interpretability means without first explicitly specifying objectives or desired attributes. In this paper, we identify the key properties used to interpret automata and propose a modification of a state-merging approach to learn variants of finite state automata. We apply… ▽ More Automaton models are often seen as interpretable models. Interpretability itself is not well defined: it remains unclear what interpretability means without first explicitly specifying objectives or desired attributes. In this paper, we identify the key properties used to interpret automata and propose a modification of a state-merging approach to learn variants of finite state automata. We apply the approach to problems beyond typical grammar inference tasks. Additionally, we cover several use-cases for prediction, classification, and clustering on sequential data in both supervised and unsupervised scenarios to show how the identified key properties are applicable in a wide range of contexts. △ Less

Submitted 24 November, 2016; v1 submitted 21 November, 2016; originally announced November 2016.

Comments: Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

ACM Class: I.2.6

arXiv:1611.02429 [pdf, ps, other]

Complementing Model Learning with Mutation-Based Fuzzing

Authors: Rick Smetsers, Joshua Moerman, Mark Janssen, Sicco Verwer

Abstract: An ongoing challenge for learning algorithms formulated in the Minimally Adequate Teacher framework is to efficiently obtain counterexamples. In this paper we compare and combine conformance testing and mutation-based fuzzing methods for obtaining counterexamples when learning finite state machine models for the reactive software systems of the Rigorous Exampination of Reactive Systems (RERS) chal… ▽ More An ongoing challenge for learning algorithms formulated in the Minimally Adequate Teacher framework is to efficiently obtain counterexamples. In this paper we compare and combine conformance testing and mutation-based fuzzing methods for obtaining counterexamples when learning finite state machine models for the reactive software systems of the Rigorous Exampination of Reactive Systems (RERS) challenge. We have found that for the LTL problems of the challenge the fuzzer provided an independent confirmation that the learning process had been successful, since no additional counterexamples were found. For the reachability problems of the challenge, however, the fuzzer discovered more reachable error states than the learner and tester, albeit in some cases the learner and tester found some that were not discovered by the fuzzer. This leads us to believe that these orthogonal approaches are complementary in the context of model learning. △ Less

Submitted 8 November, 2016; originally announced November 2016.

Comments: Submitted to the RERS challenge 2016

arXiv:1401.1061 [pdf, other]

doi 10.1016/j.artint.2015.05.004

Learning optimization models in the presence of unknown relations

Authors: Sicco Verwer, Yingqian Zhang, Qing Chuan Ye

Abstract: In a sequential auction with multiple bidding agents, it is highly challenging to determine the ordering of the items to sell in order to maximize the revenue due to the fact that the autonomy and private information of the agents heavily influence the outcome of the auction. The main contribution of this paper is two-fold. First, we demonstrate how to apply machine learning techniques to solve… ▽ More In a sequential auction with multiple bidding agents, it is highly challenging to determine the ordering of the items to sell in order to maximize the revenue due to the fact that the autonomy and private information of the agents heavily influence the outcome of the auction. The main contribution of this paper is two-fold. First, we demonstrate how to apply machine learning techniques to solve the optimal ordering problem in sequential auctions. We learn regression models from historical auctions, which are subsequently used to predict the expected value of orderings for new auctions. Given the learned models, we propose two types of optimization methods: a black-box best-first search approach, and a novel white-box approach that maps learned models to integer linear programs (ILP) which can then be solved by any ILP-solver. Although the studied auction design problem is hard, our proposed optimization methods obtain good orderings with high revenues. Our second main contribution is the insight that the internal structure of regression models can be efficiently evaluated inside an ILP solver for optimization purposes. To this end, we provide efficient encodings of regression trees and linear regression models as ILP constraints. This new way of using learned models for optimization is promising. As the experimental results show, it significantly outperforms the black-box best-first search in nearly all settings. △ Less

Submitted 15 April, 2014; v1 submitted 6 January, 2014; originally announced January 2014.

Comments: 37 pages. Working paper

ACM Class: F.5.3; K.3; K.4

arXiv:1309.6883 [pdf, other]

doi 10.1017/S147106841400009X

Predicate Logic as a Modeling Language: Modeling and Solving some Machine Learning and Data Mining Problems with IDP3

Authors: Maurice Bruynooghe, Hendrik Blockeel, Bart Bogaerts, Broes De Cat, Stef De Pooter, Joachim Jansen, Anthony Labarre, Jan Ramon, Marc Denecker, Sicco Verwer

Abstract: This paper provides a gentle introduction to problem solving with the IDP3 system. The core of IDP3 is a finite model generator that supports first order logic enriched with types, inductive definitions, aggregates and partial functions. It offers its users a modeling language that is a slight extension of predicate logic and allows them to solve a wide range of search problems. Apart from a small… ▽ More This paper provides a gentle introduction to problem solving with the IDP3 system. The core of IDP3 is a finite model generator that supports first order logic enriched with types, inductive definitions, aggregates and partial functions. It offers its users a modeling language that is a slight extension of predicate logic and allows them to solve a wide range of search problems. Apart from a small introductory example, applications are selected from problems that arose within machine learning and data mining research. These research areas have recently shown a strong interest in declarative modeling and constraint solving as opposed to algorithmic approaches. The paper illustrates that the IDP3 system can be a valuable tool for researchers with such an interest. The first problem is in the domain of stemmatology, a domain of philology concerned with the relationship between surviving variant versions of text. The second problem is about a somewhat related problem within biology where phylogenetic trees are used to represent the evolution of species. The third and final problem concerns the classical problem of learning a minimal automaton consistent with a given set of strings. For this last problem, we show that the performance of our solution comes very close to that of a state-of-the art solution. For each of these applications, we analyze the problem, illustrate the development of a logic-based model and explore how alternatives can affect the performance. △ Less

Submitted 28 March, 2014; v1 submitted 26 September, 2013; originally announced September 2013.

Comments: To appear in Theory and Practice of Logic Programming (TPLP)

Journal ref: Theory and Practice of Logic Programming 15 (2014) 783-817

Showing 1–29 of 29 results for author: Verwer, S