\forestset

qtree/.style=for tree=parent anchor=south, child anchor=north,align=center,inner sep=0pt \contourlength1.2pt

¹¹institutetext: Radboud University, Nijmegen, the Netherlands
¹¹email: {loes.kruger,sebastian.junges,jurriaan.rot}@ru.nl

State Matching and Multiple References in Adaptive Active Automata Learning^†^†thanks: This research is partially supported by the NWO grant No. VI.Vidi.223.096.

Loes Kruger✉

Sebastian Junges

Jurriaan Rot

Abstract

Active automata learning (AAL) is a method to infer state machines by interacting with black-box systems. Adaptive AAL aims to reduce the sample complexity of AAL by incorporating domain specific knowledge in the form of (similar) reference models. Such reference models appear naturally when learning multiple versions or variants of a software system. In this paper, we present state matching, which allows flexible use of the structure of these reference models by the learner. State matching is the main ingredient of adaptive $L^{\#}$ , a novel framework for adaptive learning, built on top of $L^{\#}$ . Our empirical evaluation shows that adaptive $L^{\#}$ improves the state of the art by up to two orders of magnitude.

1 Introduction

Automata learning aims to extract state machines from observed input-output sequences of some system-under-learning (SUL). Active automata learning (AAL) assumes that one has black-box access to this SUL, allowing the learner to incrementally choose inputs and observe the outputs. The models learned by AAL can be used as a documentation effort, but are more typically used as basis for testing, verification, conformance checking, fingerprinting—see [22, 11] for an overview of applications. The classical algorithm for AAL is $L^{*}$ , introduced by Angluin [2]; state-of-the-art algorithms are, e.g., $L^{\#}$ [23] and TTT [13], which are available in toolboxes such as LearnLib [14] and AALpy [17].

The primary challenge in AAL is to reduce the number of inputs sent to the SUL, referred to as the sample complexity. To learn a 31-state machine with 22 inputs, state-of-the-art learners may send several million inputs to the SUL [23]. This is not necessarily unexpected: the underlying space of 31-state state machines is huge and it is nontrivial how to maximise information gain. The literature has investigated several approaches to accelerate learners, see the overview of [22]. Nevertheless, scalability remains a core challenge for AAL.

We study adaptive AAL [10], which aims to improve the sample efficiency by utilizing expert knowledge already given to the learner. In (regular) AAL, a learner commonly starts learning from scratch. In adaptive AAL, however, the learner is given a reference model, which ought to be similar to the SUL. Reference models occur naturally in many applications of AAL. For instance:linecolor=orange,backgroundcolor=orange!25,bordercolor=orange,]SJ:shortened these a bit (1) Systems evolve over time due to, e.g., bug fixes or new functionalities—and we may have learned the previous system; (2) Standard protocols may be implemented by a variety of tools; (3) The SUL may be a variant of other systems, e.g., being the same system executing in another environment, or a system configured differently.

Several algorithms for adaptive AAL have been proposed [10, 24, 5, 6, 9]. Intuitively, the idea is that these methods try to rebuild the part of the SUL which is similar to the reference model. This is achieved by deriving suitable queries from the reference model, using so-called access sequences to reach states, and so-called separating sequences to distinguish these from other states.linecolor=green,backgroundcolor=green!25,bordercolor=green,]LK:I would prefer to remove so-called and put emph around access and sep seq linecolor=orange,backgroundcolor=orange!25,bordercolor=orange,]SJ:ok with the emph, but i would keep the so-called as we have not introduced them and do not explain them here? It indicates that a reader does not need the definition These algorithms rely on a rather strict notion of similarity that depends on the way we reach these stateslinecolor=orange,backgroundcolor=orange!25,bordercolor=orange,]SJ:pls check. In particular, existing rebuilding algorithms cannot effectively learn an SUL from a reference model that has a different initial state, see Sec. 2. linecolor=green,backgroundcolor=green!25,bordercolor=green,]LK:Should be merged more with the related work possibly. Also feels a bit incomplete

We propose an approach to adaptive AAL based on state matching, which allows flexibly identifying parts of the unknown SUL where the reference model may be an informative guidelinecolor=orange,backgroundcolor=orange!25,bordercolor=orange,]SJ:removed: to search for queries. More specifically, in this approach, we match states in the model that we have learned so far (captured as a tree-shaped automaton) with states in the reference model such that the outputs agree on all enabled input sequences. This matching allows for targeted re-use of separating sequences from the reference model and is independent of the access sequences. We refine the approach by using approximate state matching, where we match a current state with one from the reference model that agrees on most inputs.

Approximate state matching is the essential ingredient for the novel $AL^{\#}$ algorithm. This algorithm is a conservative extension of the recent $L^{\#}$ [23]. Along with approximate state matching, $AL^{\#}$ includes rebuilding steps, which are similar to existing methods, but tightly integrated in $L^{\#}$ . Finally, $AL^{\#}$ is the first approach with dedicated support to use more than one reference model.

Contributions. We make the following contributions to the state-of-the-art in adaptive AAL. First, we present state matching and its generalization to approximate state matching which allows flexible re-use of separating sequences from the reference model. Second, we include state matching and rebuilding in an unifying approach, called $AL^{\#}$ , which generalizes the $L^{\#}$ algorithm for non-adaptive automata learning. We analyse the resulting framework in terms of termination and complexity. This framework naturally supports using multiple reference models as well as removing and adding inputs to the alphabet. Our empirical results show the efficacy of $AL^{\#}$ . In particular, $AL^{\#}$ may reduce the number of inputs to the SUL by two orders of magnitude. linecolor=green,backgroundcolor=green!25,bordercolor=green,]LK:Was in rel work not in intro: we consider pdlstar and IKV state of the art. Maybe put in experiments?

Related work. Adaptive AAL goes back to [10]. That paper, and many of the follow-up approaches [5, 4, 6, 9] re-use access sequences and separating sequences from the reference model (or from the data structures constructed when learning that model). The recent approach in [6] removes redundant access sequences during rebuilding and continues learning with informative separating sequences. In [24], an $L^{*}$ -based adaptive AAL approach is proposed where the algorithm starts by including all separating sequences that arise when learning the reference model with $L^{*}$ , ignoring access sequences. This algorithm is used in [12] for a general study of the usefulness of adaptive AAL: Among others, the authors suggest using more advanced data structures than the observation tables in $L^{*}$ . Indeed, in [4] the internal data structure of the TTT algorithm is used [13] in the context of lifelong learning; the precise rebuilding approach is not describedlinecolor=green,backgroundcolor=green!25,bordercolor=green,]LK:Too defensive?. The recent [9] proposes an adaptive AAL method based on discrimination trees as used in the Kearns-Vazirani algorithm [15]. We consider the algorithms proposed in [6, 9] the state-of-the-art and have experimentally compared $AL^{\#}$ in Sec. 8.

2 Overview

We illustrate (1) how adaptive AAL uses a reference model to help learn a system and (2) how this may reduce the sample complexity of the learner.

MAT framework. We recall the standard setting for AAL: Angluin’s MAT framework, cf. [22, 11]. Here, the learner has no direct access to the SUL, but may ask output queries (OQs): these return, for a given input sequence, the sequence of outputs from the SUL; and equivalence queries (EQs): these take a Mealy machine $\mathcal{H}$ as input, and return whether or not $\mathcal{H}$ is equivalent to the SUL. In case it is not, a counterexample is provided in the form of a sequence of inputs for which $\mathcal{H}$ and the SUL return different outputs. EQs are expensive [25, 3, 8, 21], therefore, we aim to learn the SUL using primarily OQs.

Apartness. Learning algorithms in the MAT framework typically assume that two states are equivalent as long as their known residual languages are equivalent. To discover a new state, we must therefore (1) access it by an input sequence and (2) prove this state distinct (apart) from the other states that we already know. Consider the SUL $\mathcal{S}$ in Fig. 1(a). The access sequences $c$ , $ca$ access $q_{4}$ and $q_{5}$ , respectively, from the initial state. These states are different because the response to executing $c$ from $q_{4}$ and $q_{5}$ is distinct: We say $c$ is a separating sequence for $q_{4}$ and $q_{5}$ . This difference can be observed by posing OQs for $cc$ and $cac$ , consisting of the access sequences for $q_{4}$ and $q_{5}$ followed by their separating sequence $c$ .

Aim. The aim of adaptive AAL is to learn SULs with fewer inputs, using knowledge in the form of a reference model, known to the learner and preferrably similar to the SUL. The discovery of states is accelerated by extracting candidates for both (1) access sequences and (2) separating sequences from the reference model.

Rebuilding. The state-of-the-art in adaptive AAL uses access sequences and separating sequences from the reference model [6, 9] in an initial phase. Consider the Mealy machine $\mathcal{R}_{1}$ in Fig. 1(b) as a reference model for the SUL $\mathcal{S}$ in Fig. 1(a). The sequences $\varepsilon$ , $c$ , $ca$ can be used to access all orange states in both $\mathcal{S}$ and $\mathcal{R}_{1}$ . The separating sequences $c$ and $ac$ for these states in $\mathcal{R}_{1}$ also separate the orange states in $\mathcal{S}$ . By asking OQs combining the access sequences and separating sequences, we discover all orange states for $\mathcal{S}$ .

Limits of rebuilding. However, these rebuilding approaches have limitations. Consider $\mathcal{R}_{2}$ in Fig. 1(c). The sequences $\varepsilon$ , $b$ , $bb$ and $bbb$ can be used to access all states in $\mathcal{R}_{2}$ . Concatenating these with any separating sequences from $\mathcal{R}_{2}$ will not be helpful to learn SUL $\mathcal{S}$ , because in $\mathcal{S}$ these sequences all access $q_{0}$ . However, the separating sequences from $\mathcal{R}_{2}$ are useful if executed in the right state of $\mathcal{S}$ . For instance, the sequence $bb$ separates all states in $\mathcal{R}_{2}$ , and the blue states in $\mathcal{S}$ . Thus, rebuilding does not realise the potential of reusing the separating sequences from $\mathcal{R}_{2}$ , since the access sequences for the relevant states are different.

State Matching. We extend adaptive AAL with state matching. State matching overcomes the strong dependency on the access sequences and allows the efficient usage of reference models where the residual languages of the individual states are similar. Suppose that while learning, we have not yet separated $q_{0}$ and $q_{1}$ in $\mathcal{S}$ , but we do know the output of the $b$ -transition from $q_{0}$ . We may use that output to match $q_{0}$ with $p_{3}$ in $\mathcal{R}_{2}$ : these two states agree on input sequences where both are defined. Subsequently, we can use the separating sequence $bb$ between $p_{3}$ and $p_{0}$ to separate $q_{0}$ and $q_{1}$ , through OQs $bb$ and $abb$ .

Approximate State Matching. It rarely happens that states in the SUL exactly match states in the reference model: Consider the scenario where we want to learn $\mathcal{S}$ with reference model $\mathcal{R}_{3}$ from Fig. 1(d). States $q_{0}$ and $s_{3}$ do not match because they have different outputs for input $b$ but are still similar. This motivates an approximate version of matching, where a state is matched to the reference state which maximises the number of inputs with the same output.

Outline. After the preliminaries (Sec. 3), we recall the $L^{\#}$ algorithm and extend it with rebuilding (Sec. 4). We then introduce adaptive AAL with state matching and its approximate variant (Sec. 5). Together with rebuilding, this results in the $AL^{\#}$ algorithm (Sec. 6). We proceed to define a variant that allows the use of multiple reference models (Sec. 7). This is helpful already in the example discussed in this section: given both $\mathcal{R}_{1}$ and $\mathcal{R}_{2}$ , $AL^{\#}$ with multiple reference models allows to discover all states in $\mathcal{S}$ without any EQs, see App. 0.F.

3 Preliminaries

For a partial map $f\colon X\rightharpoonup Y$ , we write $f(x)\mathord{\downarrow}$ if $f(x)$ is defined and $f(x)\mathord{\uparrow}$ otherwise.

Definition 3.1

A partial Mealy machine is a tuple $\mathcal{M}=(Q,I,O,q_{0},\delta,\lambda)$ , where $Q$ , $I$ and $O$ are finite sets of states, inputs and outputs respectively; $q_{0}\in Q$ an initial state, $\delta\colon Q\times I\rightharpoonup Q$ a transition function, and $\lambda\colon Q\times I\rightharpoonup O$ an output function such that $\delta$ and $\lambda$ have the same domain. A (complete) Mealy machine is a partial Mealy machine where $\delta$ and $\lambda$ are total. If not specified otherwise, a Mealy machine is assumed to be complete.

We write $\mathcal{M}|_{I}$ to denote $\mathcal{M}$ restricted to alphabet $I$ . We use the superscript $\mathcal{M}$ to indicate to which Mealy machine we refer, e.g. $Q^{\mathcal{M}}$ and $\delta^{\mathcal{M}}$ . The transition and output functions are naturally extended to input sequences of length $n\in\mathbb{N}$ as functions $\delta\colon Q\times I^{n}\rightharpoonup Q$ and $\lambda\colon Q\times I^{n}\rightharpoonup O^{n}$ . We abbreviate $\delta(q_{0},w)$ by $\delta(w)$ .

Definition 3.2

Let $\mathcal{M}_{1}$ , $\mathcal{M}_{2}$ be partial Mealy machines. States $p\in Q^{\mathcal{M}_{1}}$ and $q\in Q^{\mathcal{M}_{2}}$ match, written $p\scalebox{0.65}{${}\overset{\surd}{=}{}$}q$ , if $\lambda(p,\sigma)=\lambda(q,\sigma)$ for all $\sigma\in(I^{\mathcal{M}_{1}}\cap I^{\mathcal{M}_{2}})^{*}$ with $\delta(p,\sigma){\mathord{\downarrow}}$ and $\delta(q,\sigma){\mathord{\downarrow}}$ . If $p$ and $q$ do not match, they are apart, written $p\mathrel{\#}q$ .

If $p\mathrel{\#}q$ , then there is a separating sequence, i.e., a sequence $\sigma$ such that $\lambda(p,\sigma)\neq\lambda(q,\sigma)$ ; this situation is denoted by $\sigma\vdash p\mathrel{\#}q$ . The definition of matching allows the input (and output) alphabets of the underlying Mealy machines to differ; it requires that they agree on all commonly defined input sequences. If $\mathcal{M}_{1}$ and $\mathcal{M}_{2}$ are complete and have the same alphabet, then the matching of states is referred to as language equivalence. Two complete Mealy machines are equivalent if their initial states are language equivalent.

Let $\mathcal{M}$ be a partial Mealy machine. A state $q\in Q^{\mathcal{M}}$ is reachable if there exists $\sigma\in I^{*}$ such that $\delta^{\mathcal{M}}(q_{0},\sigma)=q$ . The reachable part of $\mathcal{M}$ contains all reachable states in $Q^{\mathcal{M}}$ . A sequence $\sigma$ is an access sequence for $q\in Q^{\mathcal{M}}$ if $\delta^{\mathcal{M}}(\sigma)=q$ . A set $P\subseteq I^{*}$ is a state cover for $\mathcal{M}$ if $P$ contains an access sequence for every reachable state in $\mathcal{M}$ . In this paper, a tree $\mathcal{T}$ is a partial Mealy machine where every state $q$ has a unique access sequence, denoted by $\mathsf{access}(q)$ .

Definition 3.3

Let $\mathcal{M}$ be a complete Mealy machine. A set $W_{q}\subseteq(I^{\mathcal{M}})^{*}$ is a state identifier for $q\in Q^{\mathcal{M}}$ if for all $p\in Q^{\mathcal{M}}$ with $p\mathrel{\#}q$ there exists $\sigma\in W_{q}$ such that $\sigma\vdash p\mathrel{\#}q$ . A separating family is a collection of state identifiers $\{W_{p}\}_{p\in Q^{\mathcal{M}}}$ such that for all $p,q\in Q^{\mathcal{M}}$ with $p\mathrel{\#}q$ there exists $\sigma\in W_{p}\cap W_{q}$ with $\sigma\vdash p\mathrel{\#}q$ .

We use $P^{\mathcal{M}}$ and $\{W_{q}\}^{\mathcal{M}}$ to refer to a minimal state cover and a separating family for $\mathcal{M}$ respectively. State covers and separating families can be constructed for every Mealy machine, but are not necessarily unique.

4 $L^{\#}$ with Rebuilding

We first recall the $L^{\#}$ algorithm for (standard) AAL [23]. Then, we consider adaptive learning by presenting an $L^{\#}$ -compatible variant of rebuilding.

4.1 Observation Trees

$L^{\#}$ uses an observation tree as data structure to store the observed traces of $\mathcal{M}$ .

Definition 4.1

A tree $\mathcal{T}$ is an observation tree if there exists a map** $f\colon Q^{\mathcal{T}}\to Q^{\mathcal{M}}$ such that $f(q_{0}^{\mathcal{T}})=q_{0}^{\mathcal{M}}$ and $q\xrightarrow[]{i/o}q^{\prime}$ implies $f(q)\xrightarrow[]{i/o}f(q^{\prime})$ .

In an observation tree, a basis is a subtree that describes unique behaviour present in the SUL. Initially, a basis $B\subseteq Q^{\mathcal{T}}$ contains the root state. All states in the basis are pairwise apart, i.e., for all $q\neq q^{\prime}\in B$ it holds that $q\mathrel{\#}q^{\prime}$ . For a fixed basis, its frontier is the set of states $F\subseteq Q^{\mathcal{T}}$ which are immediate successors of basis states but which are not in the basis themselves.

Example 4.1

Fig. 2 shows an observation tree $\mathcal{T}^{\prime}$ for the Mealy machine $\mathcal{H}^{\prime}$ from Fig. 2. The separating sequences $c$ and $ac$ show that the states in basis $B=\{t_{0},t_{2},t_{3}\}$ are all pairwise apart. The frontier $F$ is $\{t_{1},t_{4},t_{5},t_{6}\}$ .

We say that a frontier state is isolated if it is apart from all basis states. A frontier state is identified with a basis state $q$ if it is apart from all basis states except $q$ . We say the observation tree is adequate if all frontier states are identified, no frontier states are isolated and each basis state has a transition with every input. If every frontier state is identified and each basis state has a transition for every input, the observation tree can be folded to create a complete Mealy machine (formalized in Def. 0.A.1). The Mealy machine has the same states as the basis. The transitions between basis states are the same as in the observation tree. Transitions from basis states to frontier states are folded back to the basis state the frontier state is identified with. We call the resulting complete Mealy machine a hypothesis whenever this canonical transformation is used.

Example 4.2

In $\mathcal{T}^{\prime}$ (Fig. 2) the frontier states are identified as follows: $t_{1}\mapsto t_{2},t_{4}\mapsto t_{3},t_{5}\mapsto t_{0}$ and $t_{6}\mapsto t_{2}$ . Hypothesis $\mathcal{H}^{\prime}$ (Fig. 2) can be folded back from $\mathcal{T}^{\prime}$ . The dashed transitions in Fig. 2 represent the folded transitions.

4.2 The $L^{\#}$ Algorithm

The $L^{\#}$ algorithm maintains an observation tree $\mathcal{T}$ and a basis $B$ . Initially, $\mathcal{T}$ consists of just a root node $q_{0}$ and $B=\{q_{0}\}$ . We denote the frontier of $B$ by $F$ . The $L^{\#}$ algorithm then repeatedly applies the following four rules.

•

The promotion rule (P) extends $B$ by $r\in F$ when $r$ is isolated.
•

The extension rule (Ex) poses OQ $\mathsf{access}(q)i$ for $q\in B,i\in I$ with $\delta(q,i)\mathord{\uparrow}$ .
•

The separation rule (S) takes a state $r\in F$ that is not apart from $q,q^{\prime}\in B$ and poses OQ $\mathsf{access}(r)\sigma$ with $\sigma\vdash q\mathrel{\#}q^{\prime}$ that shows $r$ is apart from $q$ or $q^{\prime}$ .
•

The equivalence rule (Eq) folds $\mathcal{T}$ into hypothesis $\mathcal{H}$ , checks whether $\mathcal{H}$ and $\mathcal{T}$ agree on all sequences in $\mathcal{T}$ and poses an EQ. If $\mathcal{H}$ and the SUL are not equivalent, counterexample processing isolates a frontier state.

The pre- and postconditions of the rules are summarized in (the top rows of) Table 1. A detailed account is given in the paper introducing $L^{\#}$ [23].

Table 1: Extended

L^{\#}

rules with parameters, preconditions and postconditions.

	Rule	Parameters	Precondition	Postcondition
Sec. 4.2	promotion	$r\in F$	$\forall q\in B,q\mathrel{\#}r$	$r\in B$
	extension	$q\in B$ , $i\in I$	$\delta^{\mathcal{T}}(q,i){\uparrow}$	$\delta^{\mathcal{T}}(q,i){\mathord{\downarrow}}$
	separation	$r\in F,$	$\neg(r\mathrel{\#}q),\neg(r\mathrel{\#}q^{\prime}),q\neq q^{\prime}$	$r\mathrel{\#}q\lor r\mathrel{\#}q^{\prime}$
	separation	$q,q^{\prime}\in B$
	equivalence	-	$\forall q\in B.~{}\forall i\in I.~{}\delta^{\mathcal{T}}(q,i){\mathord{% \downarrow}},$	$\exists r\in F$ s.t.
			$\forall r\in F.~{}\exists q\in B.$	$\forall q\in B.~{}r\mathrel{\#}q$
			$(\neg(r\mathrel{\#}q)\land~{}\forall q^{\prime}\in B\setminus\{q\}.~{}r% \mathrel{\#}q^{\prime})$
Sec 4.3	rebuilding	$q,q^{\prime}\in B,$	$\delta^{\mathcal{T}}(q,i)\notin B,\neg(q^{\prime}\mathrel{\#}\delta^{\mathcal{% T}}(q,i))$ ,	$\delta^{\mathcal{T}}(q,i\sigma){\mathord{\downarrow}}$ ,
		$i\in I$	$\mathsf{access}^{\mathcal{T}}(q)i,\mathsf{access}^{\mathcal{T}}(q^{\prime})\in P% ^{\mathcal{R}}$ ,	$\delta^{\mathcal{T}}(q^{\prime},\sigma){\mathord{\downarrow}}$
			$\sigma=\mathsf{sep}\bm{(}\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)% i),\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))\bm{)}$ ,
			$(\delta^{\mathcal{T}}(q,i\sigma){\uparrow}\lor\delta^{\mathcal{T}}(q^{\prime},% \sigma){\uparrow})$
	prioritized promotion	$r\in F$	$\mathsf{access}^{\mathcal{T}}(r)\in P^{\mathcal{R}},\forall q\in B.~{}q% \mathrel{\#}r$	$r\in B$
Sec. 3, 5.2	match separation	$q,q^{\prime}\in B$ ,	$\delta^{\mathcal{T}}(q,i)=r\in F,\neg(r\mathrel{\#}q^{\prime})$ , $\delta^{\mathcal{R}}(p,i)=p^{\prime}$	$r\mathrel{\#}q^{\prime}~{}\lor$
	match separation	$p\in Q^{\mathcal{R}},i\in I$	$\neg(\exists q^{\prime\prime}\in B$ s.t. $p^{\prime}\scalebox{0.65}{${}\overset{\surd}{=}{}$}q^{\prime\prime}),p% \scalebox{0.65}{${}\overset{\surd}{=}{}$}q$	$(p\scalebox{0.65}{${}\overset{\surd}{\neq}{}$}q\land r\mathrel{\#}p^{\prime})$
	match refinement	$q\in B,$	$p\scalebox{0.65}{${}\overset{\surd}{=}{}$}q,p^{\prime}\scalebox{0.65}{${}% \overset{\surd}{=}{}$}q,$	$p\scalebox{0.65}{${}\overset{\surd}{\neq}{}$}q\lor p^{\prime}\scalebox{0.65}{$% {}\overset{\surd}{\neq}{}$}q$
	match refinement	$p,p^{\prime}\in Q^{\mathcal{R}}$	$\sigma=\mathsf{sep}(p,p^{\prime})$
	prioritized separation	$r\in F,$	$\neg(r\mathrel{\#}q^{\prime}),\neg(r\mathrel{\#}q^{\prime\prime}),\exists i\in I$ s.t. $\delta^{\mathcal{T}}(q,i)=r,$	$r\mathrel{\#}q^{\prime\prime}~{}\lor$
	prioritized separation	$q^{\prime},q^{\prime\prime}\in B$	$\sigma\vdash q^{\prime}\mathrel{\#}q^{\prime\prime},\sigma\in\cup_{p\scalebox{% 0.65}{${}\overset{\surd}{=}{}$}q}W_{\delta^{\mathcal{R}}(p,i)}$	$r\mathrel{\#}q^{\prime}$

Example 4.3

Suppose we learn $\mathcal{R}_{1}$ from Fig. 1. $L^{\#}$ applies the extension rule twice, resulting in $\mathcal{T}$ as in Fig. 2. States $t_{1}$ and $t_{2}$ are identified with $t_{0}$ because there is only one basis state. Next, $L^{\#}$ applies the equivalence rule using hypothesis $\mathcal{H}$ (Fig. 2). Counterexample $aac$ distinguishes $\mathcal{H}$ from $\mathcal{R}_{1}$ . This sequence is added to $\mathcal{T}$ and processed further by posing OQ $ac$ in the equivalence rule. Observations $ac$ and $aac$ show that the states accessed with $\varepsilon$ , $a$ and $aa$ are pairwise apart. States $t_{2}$ and $t_{3}$ are added to the basis using the promotion rule. Next, $L^{\#}$ poses OQ $aaa$ during the extension rule. To identify all frontier states, $L^{\#}$ may use $ac\vdash t_{2}\mathrel{\#}t_{3}$ , $ac\vdash t_{0}\mathrel{\#}t_{2}$ and $c\vdash t_{0}\mathrel{\#}t_{3}$ . Fig. 2 shows one possible observation tree $\mathcal{T}^{\prime}$ after applying the separation rule multiple times. Next, the equivalence rule constructs hypothesis $\mathcal{H}^{\prime}$ (Fig. 2) from $\mathcal{T}^{\prime}$ and $L^{\#}$ terminates because $\mathcal{H}^{\prime}$ and $\mathcal{R}_{1}$ are equivalent.

4.3 Rebuilding in $L^{\#}$

In this subsection, we combine rebuilding from [6, 9] with $L^{\#}$ and implement this using two rules: rebuilding and prioritized promotion, see also Table 1. Both rules depend on a reference model $\mathcal{R}$ , which is a complete Mealy machine, with a possibly different alphabet than the SUL $\mathcal{S}$ . More precisely, these rules depend on a prefix-closed and minimal state cover $P^{\mathcal{R}}$ and a separating family $\{W_{q}\}^{\mathcal{R}}$ computed on $\mathcal{R}|_{I^{\mathcal{S}}}$ for maximal overlap with $\mathcal{S}$ . The separating family can be computed with partition refinement [20]. We fix $\mathsf{sep}(p,p^{\prime})$ with $p,p^{\prime}\in Q^{\mathcal{R}}$ to be a unique sequence from $W_{p}\cap W_{p^{\prime}}$ such that $\mathsf{sep}(p,p^{\prime})\vdash p\mathrel{\#}p^{\prime}$ . Below, we use $q$ for states in $B$ , $r$ for states in $F$ and $p$ for states in $Q^{\mathcal{R}}$ . In App. 0.A, we depict the scenarios in the observation tree and reference model required for the new rules to be applicable.

Rule (R): Rebuilding. Let $q\in B$ , $i\in I$ and suppose $\delta^{\mathcal{T}}(q,i)\notin B$ . The aim of the rebuilding rule is to show apartness between $\delta^{\mathcal{T}}(q,i)$ and a basis state $q^{\prime}$ , using the state cover and separating family from $\mathcal{R}$ . The rebuilding rule is applicable when $\mathsf{access}^{\mathcal{T}}(q)$ and $\mathsf{access}^{\mathcal{T}}(q)i$ are in $P^{\mathcal{R}}$ . If $\mathsf{access}^{\mathcal{T}}(q^{\prime})\in P^{\mathcal{R}}$ then there exists a sequence $\sigma$ such that $\sigma=\mathsf{sep}\bm{(}\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)% i),\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))\bm{)}$ . We pose OQs $\mathsf{access}^{\mathcal{T}}(q)i\sigma$ and $\mathsf{access}^{\mathcal{T}}(q^{\prime})\sigma$ .

Lemma 1

Suppose $\mathsf{access}^{\mathcal{T}}(q^{\prime})\in P^{\mathcal{R}}$ for all $q^{\prime}\in B$ . Consider $q\in B$ , $i\in I$ such that $\delta^{\mathcal{T}}(q,i)\notin B$ and $\mathsf{access}^{\mathcal{T}}(q)i\in P^{\mathcal{R}}$ . If for all $q^{\prime}\in B$ it holds that $\mathsf{sep}\bm{(}\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)i),% \delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))\bm{)}\vdash% \delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q)i)\mathrel{\#}\delta^{% \mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))$ , then after applying the rebuilding rule for $q$ , $i$ and all $q^{\prime}\in B$ with $\neg(q^{\prime}\mathrel{\#}\delta^{\mathcal{T}}(q,i))$ , state $\delta^{\mathcal{T}}(q,i)$ is isolated.

If a state is isolated, it can be added to the basis using the promotion rule.

Rule (PP): Prioritized promotion. Like (regular) promotion, prioritized promotion extends the basis. However, prioritized promotion only applies to states $r$ with $\mathsf{access}^{\mathcal{T}}(r)\in P^{\mathcal{R}}$ . This enforces that the access sequences for basis states are in $P^{\mathcal{R}}$ as often as possible, enabling the use of the rebuilding rule.

Example 4.4

Consider reference $\mathcal{R}_{1}$ and SUL $\mathcal{S}$ from Fig. 1. We learn the orange states similarly as described in Sec. 2: We apply the rebuilding rule with $\mathsf{access}^{\mathcal{T}}(q)=\varepsilon,\mathsf{access}^{\mathcal{T}}(q^{% \prime})=\varepsilon,i=c$ which results in OQs $cac$ and $ac$ . Next, we promote $\delta^{\mathcal{T}}(c)$ with the prioritized promotion rule. We apply the rebuilding rule with $\mathsf{access}^{\mathcal{T}}(q)=c,\mathsf{access}^{\mathcal{T}}(q^{\prime})=c$ and $i=a$ which results in OQs $cac$ (already present in $\mathcal{T}$ ) and $cc$ . Lastly, we promote $\delta^{\mathcal{T}}(ca)$ with prioritized promotion.

The overlap between $\mathcal{S}$ and $P^{\mathcal{R}}$ and $\{W_{q}\}^{\mathcal{R}}$ determines how many states of $\mathcal{S}$ can be discovered via rebuilding. The statement follows from Lemma 1 above.

Theorem 4.1

If $q_{0}^{\mathcal{R}}$ matches $q_{0}^{\mathcal{S}}$ and $\mathcal{T}$ only contains a root $q_{0}^{\mathcal{T}}$ , then after applying only the rebuilding and prioritized promotion rules until they are no longer applicable, the basis consists of $n$ states where $n$ is the number of equivalence classes (w.r.t. language equivalence) in the reachable part of $\mathcal{S}|_{I^{\mathcal{R}}}$ .

Corollary 1

Suppose we learn SUL $\mathcal{S}$ with reference $\mathcal{S}$ . Using the rebuilding and prioritized promotion rules, we can add all reachable states in $\mathcal{S}$ to the basis.

5 $L^{\#}$ using State Matching

In this section, we describe another way to reuse information from references, called state matching, which is independent of the state cover. First, we present a version of state matching using the matching relation ( ${}\overset{\surd}{=}{}$ ) from Def. 3.2 and then we weaken this notion to approximate state matching.

5.1 State Matching

With state matching, the learner maintains the matching relation ${}\overset{\surd}{=}{}$ between basis states and reference model states during learning. In the implementation, before applying a matching rule, the matching is updated based on the OQs asked since the previous match computation. We present two key rules here and an optimisation in the next subsection.

Rule (MS): Match separation. This rule aims to show apartness between the frontier and a basis state using separating sequences from the reference separating family. Let $q$ , $q^{\prime}\in B$ , $r\in F$ with $\delta^{\mathcal{T}}(q,i)=r$ for some $i\in I$ , and $p,p^{\prime}\in Q^{\mathcal{R}}$ . Suppose that $\delta^{\mathcal{R}}(p,i)=p^{\prime}$ , $\neg(r\mathrel{\#}q^{\prime})$ , $p\scalebox{0.65}{${}\overset{\surd}{=}{}$}q$ and $p^{\prime}$ does not match any basis state. In particular, there exists some separating sequence $\sigma$ for $p^{\prime}\mathrel{\#}q^{\prime}$ . The match separation rule poses OQ $\mathsf{access}(q)i\sigma$ to either show $r\mathrel{\#}q^{\prime}$ or $q\scalebox{0.65}{${}\overset{\surd}{\neq}{}$}p$ and $r\mathrel{\#}p^{\prime}$ .

Example 5.1

Suppose we learn $\mathcal{S}$ using $\mathcal{R}_{2}$ from Fig. 1. After applying the extension rule three times, we get $\mathcal{T}_{0}$ (Fig. 3). State $t_{0}$ matches $p_{3}$ as their outputs coincide on sequences from alphabet $I^{\mathcal{S}}\cap I^{\mathcal{R}_{2}}=\{a,b\}$ . State $p_{3}$ transitions to the unmatched state $p_{0}$ with input $a$ . The match separation rule conjectures $t_{1}$ may match $p_{0}$ which implies $t_{1}\mathrel{\#}t_{0}$ . We use OQ $\mathsf{access}(t_{1})a$ to test this conjecture and indeed find that $t_{1}$ can be added to the basis using promotion.

Lemma 2

We fix $p\in Q^{\mathcal{R}}$ , $q\in B$ , $i\in I$ and $\delta^{\mathcal{T}}(q,i)=r\in F$ . Suppose $\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q))\scalebox{0.65}{${}% \overset{\surd}{=}{}$}p$ . If $\delta^{\mathcal{R}}(p,i)\scalebox{0.65}{${}\overset{\surd}{\neq}{}$}q^{\prime}$ for all $q^{\prime}\in B$ , then after applying the match separation rule with $q,p,i$ for all $q^{\prime}\in B$ with $\neg(q^{\prime}\mathrel{\#}r)$ , state $r$ is isolated.

Rule (MR): Match refinement. Let $q\in B$ and $p,p^{\prime}\in Q^{\mathcal{R}}$ . Suppose $q$ matches both $p$ and $p^{\prime}$ and let $\sigma=\mathsf{sep}(p,p^{\prime})$ . The match refinement rule poses OQ $\mathsf{access}(q)\sigma$ resulting in $q$ no longer being matched to $p$ or $p^{\prime}$ .

Example 5.2

Suppose we continue learning $\mathcal{S}$ using $\mathcal{R}_{2}$ from observation tree $\mathcal{T}_{1}$ (Fig. 3). State $t_{1}$ matches both $p_{0}$ and $p_{1}$ . After posing OQ $\mathsf{access}(t_{1})bb$ where $bb\vdash p_{0}\mathrel{\#}p_{1}$ , $t_{1}$ no longer matches $p_{1}$ .

If the initial state of SUL $\mathcal{S}$ is language equivalent to some state in the reference model, then we can discover all reachable states in $\mathcal{S}$ via state matching and $L^{\#}$ rules. The statement uses Lemma 2 above.

Theorem 5.1

Suppose we have reference $\mathcal{R}$ and SUL $\mathcal{S}$ equivalent to $\mathcal{R}$ but with a possibly different initial state. Using only the match refinement, match separation, promotion and extension rules, we can add $n$ states to the basis where $n$ is the number of equivalence classes (w.r.t. language equivalence) in the reachable part of $\mathcal{S}$ .

5.2 Optimised Separation using State Matching

In this subsection, we add an optimisation rule prioritized separation that uses the matching to guide the identification of frontier states. First, we highlight the differences between prioritized separation and the previous separation rules. Both match separation and prioritized separation require that $r\scalebox{0.65}{${}\overset{\surd}{=}{}$}p$ for $r\in F$ and $p\in Q^{\mathcal{R}}$ . The aim of match separation is to isolate $r$ and requires that $p$ does not match any basis state. Instead, the aim of prioritized separation is to guide the identification of $r$ using the state identifier for a $p$ matched with a basis state. The prioritized separation rule is also different from the separation rule (Sec. 4.2) which randomly selects $q,q^{\prime}\in B$ to separate $r$ from $q$ or $q^{\prime}$ .

Rule (PS): Prioritized separation. The prioritized separation rule uses the matching to find a separating sequence from the reference model that is expected to separate a frontier state from a basis state. Let $q^{\prime},q^{\prime\prime}\in B$ and $r\in F$ . Suppose $r$ is not apart from $q^{\prime}$ and $q^{\prime\prime}$ and $\sigma\vdash q^{\prime}\mathrel{\#}q^{\prime\prime}$ . If $\sigma$ is in $\{W_{p}\}^{\mathcal{R}}$ of a reference model state $p$ that matches $r$ , the prioritized separation rule poses OQ $\mathsf{access}(r)\sigma$ resulting in $r$ being apart from $q^{\prime}$ or $q^{\prime\prime}$ ¹¹1The precise specification is more involved, as the learner only keeps track of the match relation on $B\times Q^{\mathcal{R}}$ ..

Example 5.3

Suppose we learn $\mathcal{S}$ using $\mathcal{R}_{1}$ from Fig. 1. Assume we have discovered all states in $\mathcal{S}$ and want to identify $\delta^{\mathcal{T}}(ca,c)\in F$ , which is currently not apart from any basis state. The prioritized separation rule can only be applied with basis states $q^{\prime},q^{\prime\prime}\in B$ such that $c\vdash q^{\prime}\mathrel{\#}q^{\prime\prime}$ , as $c$ is the only sequence in the state identifier of $r_{2}$ which is the state that matches $\delta^{\mathcal{T}}(ca,c)$ . From the sequences $\{bb,ac,c\}$ possibly used by $L^{\#}$ , only $c$ immediately identifies $\delta^{\mathcal{T}}(ca,c)$ .

5.3 Approximate State Matching

In this subsection, we introduce an approximate version of matching, by quantifying matching via a matching degree. Let $\mathcal{T}$ be a tree and $\mathcal{R}$ be a (partial) Mealy machine. Let $I=I^{\mathcal{T}}\cap I^{\mathcal{R}}$ . We define $\mathsf{WI}(q)=\{(w,i)\in I^{*}\times I\mid\delta^{\mathcal{T}}(q,wi)\mathord{% \downarrow}\}$ as prefix-suffix pairs that are defined from $q\in Q^{\mathcal{T}}$ onwards. Then, we define the matching degree $\mathsf{mdeg}:Q^{\mathcal{T}}\times Q^{\mathcal{R}}\to\mathbb{R}$ as

\mathsf{mdeg}(q,p)=\frac{\left|\{(w,i)\in\mathsf{WI}(q)\mid\lambda^{\mathcal{T% }}\bigg{(}\delta^{\mathcal{T}}(q,w),i\bigg{)}=\lambda^{\mathcal{R}}\\ \bigg{(}\delta^{\mathcal{R}}(p,w),i\bigg{)}\}\right|}{\left|\mathsf{WI}(q)% \right|}.

Example 5.4

Consider $t_{1}$ from $\mathcal{T}_{2}$ (Fig. 3) and $p_{0}$ , $p_{1}$ from $\mathcal{R}_{2}$ (Fig. 1). We derive $\mathsf{WI}(t_{1})=\{(\varepsilon,a),(\varepsilon,b),(b,a),(b,b),(bb,b)\}$ from $\mathcal{T}_{2}$ where $I=I^{\mathcal{T}_{2}}\cap I^{\mathcal{R}_{2}}=\{a,b\}$ . On these pairs, all the suffix outputs for $p_{0}$ and $t_{1}$ are equivalent, $\mathsf{mdeg}(t_{1},p_{0})=\nicefrac{{5}}{{5}}=1$ . The matching degree between $t_{1}$ and $p_{1}$ is only $\nicefrac{{3}}{{5}}$ because $\lambda^{\mathcal{R}_{2}}(p_{1},bbb)=120\neq 112=\lambda^{\mathcal{T}}(t_{1},bbb)$ which impacts pairs $(b,b)$ and $(bb,b)$ .

A state $q$ in an observation tree $\mathcal{T}$ approximately matches a state $p\in Q^{\mathcal{R}}$ , written $q\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}p$ , if there does not exist a $p^{\prime}\in Q^{\mathcal{R}}$ such that $\mathsf{mdeg}(q,p^{\prime})>\mathsf{mdeg}(q,p)$ .

Lemma 3

For any $q\in Q^{\mathcal{T}},p\in Q^{\mathcal{R}}$ : $\mathsf{mdeg}(q,p)=1$ implies $q\scalebox{0.65}{${}\overset{\surd}{=}{}$}p$ .

We define rules approximate match separation (AMS), approximate match refinement (AMR) and approximate prioritized separation (APS) that represent the approximate matching variations of match separation, match refinement and prioritized separation respectively. These rules have weaker preconditions and postconditions, see Table 3 in App 0.A.

6 Adaptive $L^{\#}$

The rebuilding, state matching and $L^{\#}$ rules described in Table 1 are ordered and combined into one adaptive learning algorithm called adaptive $L^{\#}$ (written $AL^{\#}$ ). A non-ordered listing of the rules can be found in Algorithm 1 in App. 0.A. We use the abbreviations for the rules defined in previous sections.

Definition 6.1

The $AL^{\#}$ algorithm repeatedly applies the rules from Table 1 (see Algorithm 1), with the following ordering: Ex, APS, (S if APS was not applicable), P, if the observation tree is adequate we try AMR, AMS, Eq. The algorithm starts by applying R and PP until they are no longer applicable; these rules are not applied anymore afterwards.

Similar to $L^{\#}$ , the correctness of $AL^{\#}$ amounts to showing termination because the algorithm can only terminate when the teacher indicates that the SUL and hypothesis are equivalent. We prove termination of $AL^{\#}$ by proving that each rule application lowers a ranking function. The necessary ingredients for the ranking function are derived from the post-conditions of Table 1.

Theorem 6.1

$AL^{\#}$ learns the correct Mealy machine within $\mathcal{O}(kn^{2}+kno+no^{2}+n\log m)$ output queries and at most $n-1$ equivalence queries where $n$ is the number of equivalence classes for $\mathcal{S}$ , $o$ is the number of equivalence classes for $\mathcal{R}$ , $k$ is the number of input symbols and $m$ the length of the longest counterexample.

7 Adaptive Learning with Multiple References

Let $\mathcal{X}$ be a finite set of complete reference models with possibly different alphabets. Assume each reference model $\mathcal{R}\in\mathcal{X}$ has a state cover $P^{\mathcal{R}}$ and separating family $\{W_{q}\}^{\mathcal{R}}$ . We adapt the arguments for the $AL^{\#}$ algorithm to represent the state cover and separating family for the set of reference models.

State cover. We initialize the $AL^{\#}$ algorithm with the union of the state cover of each reference model, i.e., $\cup_{\mathcal{R}\in\mathcal{X}}P^{\mathcal{R}}$ . To reduce the size of $P^{\mathcal{X}}$ , the state cover for each reference model is computed using a fixed ordering on inputs.

Separating family. We combine the separating families for multiple reference models using a stronger notion of apartness, called total apartness, which also separates states based on whether inputs are defined. When changing the alphabet of a reference model to the alphabet of the SUL, as is done when computing the separating family, the reference model may become partial. If states from different reference models behave the same on their common alphabet but their alphabets contain different inputs from the SUL, we still want to distinguish the reference models based on which inputs they enable.

Definition 7.1

Let $\mathcal{M}_{1},\mathcal{M}_{2}$ be partial Mealy machines and $p\in Q^{\mathcal{M}_{1}},q\in Q^{\mathcal{M}_{2}}$ . We say $p$ and $q$ are total apart, written $p\mathrel{\#}_{\uparrow}q$ , if $p\mathrel{\#}q$ or there exists $\sigma\in(I^{\mathcal{M}_{1}}\cap I^{\mathcal{M}_{2}})^{*}$ such that either $\delta^{\mathcal{M}_{1}}(p,w){\uparrow}$ or $\delta^{\mathcal{M}_{2}}(q,w){\uparrow}$ but not both.

We use total apartness to define a total state identifier and a total separating family. This definition is similar to Def. 3.3 but $\mathrel{\#}$ is be replaced by $\mathrel{\#}_{\uparrow}$ . We combine the multiple reference models into a single one with an arbitrary initial state, compute the total separating family and use this to initialize $AL^{\#}$ .

Example 7.1

A total separating family for $\mathcal{X}=\{\mathcal{R}_{1},\mathcal{R}_{2}\}$ and alphabet $I^{\mathcal{S}}$ is $W_{p_{0}}=W_{p_{1}}=\{c,b,bb\},W_{p_{2}}=W_{p_{3}}=\{c,b\},W_{r_{0}}=W_{r_{1}}% =\{c,ac\},W_{r_{2}}=\{c\}$ .

We add an optimisation to $AL^{\#}$ that only chooses $p$ and $p^{\prime}$ from the same reference model during rebuilding. Theorem 6.1 can be generalized to this setting where $o$ represents the number of equivalence classes across the reference models.

8 Experimental Evaluation

In this section, we empirically investigate the performance of our implementation of $AL^{\#}$ . The source code and all benchmarks are available online²²2https://gitlab.science.ru.nl/lkruger/adaptive-lsharp-learnlib/[16]. We present four experiments to answer the following research questions:

R1

What is the performance of adaptive AAL algorithms, when …

Exp 1: …learning models from a similar reference model?
Exp 2: …applied to benchmarks from the literature?

R2

Can multiple references help $AL^{\#}$ , when learning …

Exp 3: …a model from similar reference models?
Exp 4: …a protocol implementation from reference implementations?

Setup. We implement $AL^{\#}$ on top of the $L^{\#}$ LearnLib implementation³³3Obtained from https://github.com/UCL-PPLV/learnlib.git [9]. We invoke conformance testing for the EQs, using the random Wp method from LearnLib with minimal size ${=}3$ and random length ${=}3$ ⁴⁴4These hyperparameters are discussed in the LearnLib documentation, learnlib.de.. We run all experiments with 30 seeds. We measure the performance of the algorithms based on the number of inputs sent to the SUL during both OQs and EQs: Fewer is better.

Table 2: Summed inputs in millions for learning the mutated models with the original models.

Algorithm	$\textit{mut}_{1}$	$\textit{mut}_{2}$	$\textit{mut}_{3}$	$\textit{mut}_{4}$	$\textit{mut}_{5}$	$\textit{mut}_{6}$	$\textit{mut}_{7}$	$\textit{mut}_{8}$	$\textit{mut}_{9}$	$\textit{mut}_{10}$	$\textit{mut}_{11}$	$\textit{mut}_{12}$	$\textit{mut}_{13}$	$\textit{mut}_{14}$
$L^{*}$	115.2	24.2	49.4	69.7	78.7	60.5	50.7	132.9	294.2	36.8	52.5	38.0	18.3	301.9
KV	123.5	17.8	49.6	60.1	68.9	58.7	44.9	103.7	244.3	25.5	28.7	28.0	7.5	253.6
$L^{\#}$	101.7	14.3	50.0	49.2	73.0	58.7	39.9	100.1	313.9	25.4	38.9	28.0	8.0	234.9
$\partial L^{*}_{M}$ [6]	132.7	19.8	22.5	25.0	32.7	26.0	-	178.0	375.0	24.7	25.4	44.1	8.9	256.3
IKV [9]	114.8	18.6	1.6	2.4	0.9	0.8	-	56.6	373.9	11.0	2.1	1.1	5.8	7.0
$AL^{\#}$ (new!)	1.2	0.5	1.5	0.8	0.8	0.8	0.6	68.1	141.1	1.4	1.3	0.8	1.9	7.2
$L^{\#}_{\scalebox{0.7}{R}}$ (new!)	101.7	12.3	1.7	9.4	1.1	7.9	0.7	68.2	306.1	12.6	2.8	1.7	6.4	7.9
$L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{=}{}$}}}$ (new!)	1.2	0.5	3.5	5.2	9.1	7.2	0.7	63.0	36.8	8.7	9.8	10.8	5.7	7.1
$L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}}}$ (new!)	1.2	0.5	1.7	2.7	2.0	2.1	0.7	70.6	186.5	6.0	6.1	1.7	4.8	7.4
$L^{\#}_{\scalebox{0.7}{R},\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{=}% {}$}}}$ (new!)	1.2	0.5	1.5	0.8	1.0	0.8	0.6	69.3	38.7	3.1	2.0	1.0	4.5	7.3

Experiment 1. We evaluate the performance of $AL^{\#}$ against non-adaptive and adaptive algorithms from the literature, in particular $L^{*}$ [2], KV [15], and $L^{\#}$ [23] as well as $\partial L^{*}_{M}$ [6] and (a Mealy machine adaptation of) IKV [9]. As part of an ablation study, we compare $AL^{\#}$ with simpler variations which we refer to as $L^{\#}_{\scalebox{0.7}{R}}$ , $L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{=}{}$}}}$ , $L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}}}$ , $L^{\#}_{\scalebox{0.7}{R},\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{=}% {}$}}}$ . The subscripts indicate which rules are added:
$R$ : R + PP, ${}\overset{\surd}{=}{}$ : MS + MR + PS, ${}\overset{\surd}{\simeq}{}$ : AMS + AMR + APS.

We learn six models from the AutomataWiki benchmarks [18] also used in [23]. We limit ourselves to six models because we mutate every model in 14 different ways (and for 30 seeds). The chosen models represent different types of protocols with varying number of states. We learn the mutated models using the original models, referred to as $\mathcal{S}$ , as a reference. The mutations may add states, divert transitions, remove inputs, perform multiple mutations, or compose the model with a mutated version of the model. We provide details on the used models and mutations in App. 0.E.

Results. Table 2 shows for an algorithm (rows) and a mutation (columns) the total number of inputs ( $\cdot 10^{6}$ ) necessary to learn all models, summed over all seeds⁵⁵5 $\partial L^{*}_{M}$ and IKV do not support removing input inputs, relevant for mutation M7.. The highlighted values indicate the best performing algorithm. We provide detailed pairwise comparisons between algorithms in App. 0.E.

Discussion. First, we observe that $AL^{\#}$ always outperforms non-adaptive learning algorithms, as is expected. By combining state matching and rebuilding, $AL^{\#}$ mostly outperforms algorithms from the literature, with IKV being competitive on some types of mutations. In $\textit{mut}_{9}(\mathcal{S})$ we append $\mathcal{S}$ to $\textit{mut}_{13}(\mathcal{S})$ , $L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{=}{}$}}}$ outperforms $L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}}}$ because $L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}}}$ incorrectly matches $\textit{mut}_{13}(\mathcal{S})$ states with states in $\mathcal{S}$ , making it harder to learn the $\mathcal{S}$ fragment. In the pairwise comparisons in App. 0.E, we see that $AL^{\#}$ performs much better on models GnuTLS, OpenSSH compared to other adaptive approaches. We conjecture that this effect occurs, as these models are hard to learn in general (high number of total inputs) and thus the potential benefit of $AL^{\#}$ is higher.

Experiment 2. We evaluate $L^{\#}$ , $\partial L^{*}_{M}$ , IKV and $AL^{\#}$ on benchmarks that contain reference models. Adaptive-OpenSSL [7], used in [6], contains models learned from different git development branches for the OpenSSL server side. Adaptive-Philips [19] contains models representing some legacy code which evolved over time due to bug fixes and allowing more inputs.

Results. Fig. 4(a) shows the mean total number of inputs required for learning a model from the associated reference model, depicting the $5^{\text{th}}-95^{\text{th}}$ percentile (line) and average (mark) over the seeds.

Discussion. We observe that $L^{\#}$ and $\partial L^{*}_{M}$ perform worse than $AL^{\#}$ . $AL^{\#}$ often outperforms IKV by a factor 2-4, despite that these models are relatively small and thus easy to learn.

Experiment 3. We evaluate $AL^{\#}$ with one or multiple references on the models used in Experiment 1. We either (1) learn $\mathcal{S}$ using several mutations of $\mathcal{S}$ or (2) learn a mutation that represents a combination of the $\mathcal{S}$ and $\textit{mut}_{13}(\mathcal{S})$ .

Results. Tables 4, 4 show for every type of SUL (rows) and every set of references (columns) the total number of inputs ( $\cdot 10^{6}$ ) necessary to learn all models, summed over all seeds. Highlighted values indicate the best performing set of references. Column $\{\mathcal{S}\}$ in Table 4 corresponds to values in row $AL^{\#}$ of Table 2; they are added in Table 4 for clarity.

Discussion. We observe that using multiple references outperforms using one reference, as is expected. We hypothesize that learning with reference $\textit{mut}_{13}(\mathcal{S})$ instead of $\mathcal{S}$ often leads to an increase in total inputs because $\textit{mut}_{13}(\mathcal{S})$ is less complex due to the random transitions. Therefore, discovering states belonging to the $\mathcal{S}$ fragment in $\textit{mut}_{8}(\mathcal{S})$ , $\textit{mut}_{9}(\mathcal{S})$ and $\textit{mut}_{14}(\mathcal{S})$ becomes more difficult.

Experiment 4. We evaluate the performance of $AL^{\#}$ with one or multiple references on learning DTLS and TCP models from AutomataWiki⁶⁶6References represent related models instead of previous models as in Experiment 2.. We consider seven DTLS implementations selected to have the same key exchange algorithm and certification requirement. We consider three TCP client implementations.

Results. Fig. 5 shows the required inputs for learning $\mathcal{S}$ (x-axis) with only the reference model indicated by the colored data point, averaged over the seeds. For each DTLS model, we included learning $\mathcal{S}$ with the $\mathcal{S}$ as a reference model. The $*$ mark indicates using all models except the $\mathcal{S}$ as references, the $\times$ mark indicates using no references, e.g., non-adaptive $L^{\#}$ .

Discussion. We observe that using all references except $\mathcal{S}$ usually performs as well as the best performing reference model that is distinct from $\mathcal{S}$ . In scand-lat, using a set of references outperforms single reference models, almost matching the performance of learning $\mathcal{S}$ with $\mathcal{S}$ as a reference.

9 Conclusion

We introduced the adaptive $L^{\#}$ algorithm ( $AL^{\#}$ ), a new algorithm for adaptive active automata learning that allows to flexibly use domain knowledge in the form of (preferably similar) reference models and thereby aims to reduce the sample complexity for learning new models. Experiments show that the algorithm can lead to significant improvements over the state-of-the-art (Sec. 8).

9.0.1 Future work.

Approximate state matching is sometimes too eager and may mislead the learner, as happens for $\textit{mut}_{9}$ in Experiment 1 (Sec. 8). This may be addressed by only applying matching rules when the matching degree is above some threshold. It is currently unclear how to determine an appropriate threshold.

Further, adaptive methods typically perform well when the reference model and SUL are similar [12]. We would like to dynamically determine which (parts of) reference models are similar, and incorporate this in the rebuilding rule.

Adaptive AAL allows the re-use of information in the form of a Mealy machine. Other sources of information that can be re-used in AAL are, for instance, system logs, realised by combining active and passive learning [25, 1]. An interesting direction of research is the development of a more general methodology that allows the re-use of various forms of previous knowledge.

References

[1] Bernhard K. Aichernig, Edi Muskardin, and Andrea Pferscher. Active vs. passive: A comparison of automata learning paradigms for network protocols. In FMAS/ASYDE@SEFM, volume 371 of EPTCS, pages 1–19, 2022.
[2] Dana Angluin. Learning regular sets from queries and counterexamples. Inf. Comput., 75(2):87–106, 1987.
[3] Kousar Aslam, Loek Cleophas, Ramon R. H. Schiffelers, and Mark van den Brand. Interface protocol inference to aid understanding legacy software components. Softw. Syst. Model., 19(6):1519–1540, 2020.
[4] Alexander Bainczyk, Bernhard Steffen, and Falk Howar. Lifelong learning of reactive systems in practice. In The Logic of Software. A Tasting Menu of Formal Methods, volume 13360 of LNCS, pages 38–53. Springer, 2022.
[5] Sagar Chaki, Edmund M. Clarke, Natasha Sharygina, and Nishant Sinha. Verification of evolving software via component substitutability analysis. Formal Methods Syst. Des., 32(3):235–266, 2008.
[6] Carlos Diego Nascimento Damasceno, Mohammad Reza Mousavi, and Adenilso da Silva Simão. Learning to reuse: Adaptive model learning for evolving systems. In IFM, volume 11918 of LNCS, pages 138–156. Springer, 2019.
[7] Joeri de Ruiter. A tale of the OpenSSL state machine: A large-scale black-box analysis. In NordSec, volume 10014 of LNCS, pages 169–184, 2016.
[8] Joeri de Ruiter and Erik Poll. Protocol state fuzzing of TLS implementations. In USENIX Security Symposium, pages 193–206. USENIX Association, 2015.
[9] Tiago Ferreira, Gerco van Heerdt, and Alexandra Silva. Tree-based adaptive model learning. In A Journey from Process Algebra via Timed Automata to Model Learning, volume 13560 of LNCS, pages 164–179. Springer, 2022.
[10] Alex Groce, Doron A. Peled, and Mihalis Yannakakis. Adaptive model checking. Log. J. IGPL, 14(5):729–744, 2006.
[11] Falk Howar and Bernhard Steffen. Active automata learning in practice - an annotated bibliography of the years 2011 to 2016. In Machine Learning for Dynamic Software Analysis, volume 11026 of LNCS, pages 123–148. Springer, 2018.
[12] David Huistra, Jeroen Meijer, and Jaco van de Pol. Adaptive learning for learn-based regression testing. In FMICS, volume 11119 of LNCS, pages 162–177. Springer, 2018.
[13] Malte Isberner, Falk Howar, and Bernhard Steffen. The TTT algorithm: A redundancy-free approach to active automata learning. In RV, volume 8734 of LNCS, pages 307–322. Springer, 2014.
[14] Malte Isberner, Falk Howar, and Bernhard Steffen. The open-source learnlib - A framework for active automata learning. In CAV (1), volume 9206 of LNCS, pages 487–495. Springer, 2015.
[15] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, 1994. URL: https://mitpress.mit.edu/books/introduction-computational-learning-theory.
[16] Loes Kruger, Sebastian Junges, and Jurriaan Rot. State Matching and Multiple References in Adaptive Active Automata Learning: Supplementary Material, June 2024. doi:10.5281/zenodo.12517574.
[17] Edi Muskardin, Bernhard K. Aichernig, Ingo Pill, Andrea Pferscher, and Martin Tappler. AALpy: An active automata learning library. In ATVA, volume 12971 of LNCS, pages 67–73. Springer, 2021.
[18] Daniel Neider, Rick Smetsers, Frits W. Vaandrager, and Harco Kuppens. Benchmarks for automata learning and conformance testing. In Models, Mindsets, Meta, volume 11200 of LNCS, pages 390–416. Springer, 2018.
[19] Mathijs Schuts, Jozef Hooman, and Frits W. Vaandrager. Refactoring of legacy software using model learning and equivalence checking: An industrial experience report. In IFM, volume 9681 of LNCS, pages 311–325. Springer, 2016.
[20] Rick Smetsers, Joshua Moerman, and David N. Jansen. Minimal separating sequences for all pairs of states. In LATA, volume 9618 of LNCS, pages 181–193. Springer, 2016.
[21] Martin Tappler, Bernhard K. Aichernig, and Roderick Bloem. Model-based testing IoT communication via active automata learning. CoRR, abs/1904.07075, 2019.
[22] Frits W. Vaandrager. Model learning. Commun. ACM, 60(2):86–95, 2017.
[23] Frits W. Vaandrager, Bharat Garhewal, Jurriaan Rot, and Thorsten Wißmann. A new approach for active automata learning based on apartness. In TACAS (1), volume 13243 of LNCS, pages 223–243. Springer, 2022.
[24] Stephan Windmüller, Johannes Neubauer, Bernhard Steffen, Falk Howar, and Oliver Bauer. Active continuous quality control. In CBSE, pages 111–120. ACM, 2013.
[25] Nan Yang, Kousar Aslam, Ramon R. H. Schiffelers, Leonard Lensink, Dennis Hendriks, Loek Cleophas, and Alexander Serebrenik. Improving model inference in industry by combining active and passive learning. In SANER, pages 253–263. IEEE, 2019.

Appendix 0.A Additional Definition, Figure, Table and Algorithm

We define how to fold back an observation tree to a complete Mealy machine.

Definition 0.A.1

Let $\mathcal{T}$ be an observation tree for SUL $\mathcal{S}$ . If each basis state has a transition for every input and each frontier state is identified with a basis state, then $\mathcal{T}$ is folded back to complete Mealy machine $\mathcal{H}=(B,I,O,q_{0}^{\mathcal{T}},\delta^{\mathcal{H}},\lambda^{\mathcal{% T}})$ where for all $q\in B$ and $i\in I$ :

\delta^{\mathcal{H}}(q,i)=\begin{cases}\delta^{\mathcal{T}}(q,i)&\text{ if }% \delta^{\mathcal{T}}(q,i)\in B\\ q^{\prime}&\text{ if }\delta^{\mathcal{T}}(q,i)=r\in F\text{ and }r\text{ is % identified with }q^{\prime}\in B\\ \end{cases}

In Fig. 6, we show the scenarios in the observation tree and the reference model necessary to apply the rebuilding, match refinement, match separation and prioritized separation rules.

In Algorithm 1, we list the rules used for $AL^{\#}$ in a non-deterministic ordering.

procedure ExtendedLSharp(

P^{\mathcal{R}},\{W_{q}\}^{\mathcal{R}}

)

\mathcal{T}\xleftarrow{}\{q_{0}\}

s.t.

\delta^{\mathcal{T}}(\varepsilon)=q_{0}

B\xleftarrow{}\{q_{0}\}

\delta^{\mathcal{T}}(q,i)\notin B

and

\neg(q^{\prime}\mathrel{\#}\delta^{\mathcal{T}}(q,i))

for

q,q^{\prime}\in B

i\in I

s.t.

\mathsf{access}(q)i,\mathsf{access}(q^{\prime})\in P^{\mathcal{R}}

and (

\delta^{\mathcal{T}}(q,i\sigma){\uparrow}

\delta^{\mathcal{T}}(q^{\prime},\sigma){\uparrow}

)

\sigma=\mathsf{sep}\bm{(}\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)% i),\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))\bm{)}

\rightarrow

\triangleright

rebuilding

OutputQuery(

\mathsf{access}(q)i\sigma

)

OutputQuery(

\mathsf{access}(q^{\prime})\sigma

)

r\in F

is isolated and

\mathsf{access}(r)\in P^{\mathcal{R}}

\rightarrow

\triangleright

prefix promotion

B\leftarrow B\cup\{r\}

r\in F

is isolated

\rightarrow

\triangleright

promotion

B\leftarrow B\cup\{r\}

\delta^{\mathcal{T}}(q,i){\uparrow}

, for some

q\in B,i\in I

\rightarrow

\triangleright

extension

\text{{OutputQuery}}(\mathsf{access}(q)i)

\neg(r\mathrel{\#}q)

\neg(r\mathrel{\#}q^{\prime})

, for some

r\in F

q,q^{\prime}\in B

q\neq q^{\prime}

\rightarrow

\triangleright

separation

\sigma\leftarrow\text{witness of $q\mathrel{\#}q^{\prime}$}

\text{{OutputQuery}}(\mathsf{access}(r)\sigma)

p\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}q

for some

q\in B

and there is some

i\in I

s.t.

\delta^{\mathcal{T}}(q,i)=r\in F

\neg(r\mathrel{\#}q^{\prime})

for some

q^{\prime}\in B

and

\neg(r\mathrel{\#}p^{\prime})

for

\delta^{\mathcal{R}}(p,i)=p^{\prime}

and

p^{\prime}\scalebox{0.65}{${}\overset{\surd}{\not\simeq}{}$}q^{\prime\prime}

for any

q^{\prime\prime}\in B

\rightarrow

\triangleright

match separation

\sigma\leftarrow\mbox{witness for }q^{\prime}\mathrel{\#}p^{\prime}

\textsc{OutputQuery}(\mathsf{access}(q)i\sigma)

p\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}q

and

p^{\prime}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}q

for some

q\in B

and

p,p^{\prime}\in Q^{\mathcal{R}}

with

\sigma=\mathsf{sep}(p,p^{\prime})

and

\delta^{\mathcal{T}}(q,\sigma){\uparrow}

\rightarrow

\triangleright

match refinement

\textsc{OutputQuery}(\mathsf{access}(q)\sigma)

\neg(r\mathrel{\#}q^{\prime})

\neg(r\mathrel{\#}q^{\prime\prime})

, for some

r\in F

q,q^{\prime},q^{\prime\prime}\in B

s.t.

\delta^{\mathcal{T}}(q,i)=r

for some

i\in I

\sigma\vdash q\mathrel{\#}q^{\prime}

\sigma\in\cup_{p\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}q}W_{\delta^{% \mathcal{R}}(p,i)}

\rightarrow

\triangleright

prioritized separation

\text{{OutputQuery}}(\mathsf{access}(r)\sigma)

All

r\in F

are identified and

\delta^{\mathcal{T}}(q,i){\mathord{\downarrow}}

for all

q\in B,i\in I

\rightarrow

\triangleright

equivalence

\mathcal{H}\leftarrow\text{{BuildHypothesis}}

(b,\sigma)\leftarrow\text{{CheckConsistency}}(\mathcal{H})

b=\texttt{yes}

then

(b,\rho)\leftarrow\text{{EquivalenceQuery}}(\mathcal{H})

b=\texttt{yes}

then: return

\mathcal{H}

else:

\sigma\leftarrow

shortest prefix of

\rho

such that

\delta^{\mathcal{H}}(q_{0}^{\mathcal{H}},\sigma)\mathrel{\#}\delta^{\mathcal{T% }}(q_{0}^{\mathcal{T}},\sigma)

(in

\mathcal{T}

)

\text{{ProcCounterEx}}(\mathcal{H},\sigma)

Algorithm 1 Extended

L^{\#}

algorithm

Table 3 shows the pre- and postconditions of the approximate matching variations of the state matching rules.

Table 3: Approximate state matching rules with parameters, preconditions and postconditions.

	Rule	Parameters	Precondition	Postcondition
Sec 5.3	approximate match separation	$q,q^{\prime}\in B$ ,	$\delta^{\mathcal{T}}(q,i)=r\in F,\neg(r\mathrel{\#}q^{\prime}),$	$r\mathrel{\#}q^{\prime}~{}\lor~{}$
		$p\in Q^{\mathcal{R}},$	$\neg(r\mathrel{\#}p^{\prime}),p\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}q% ,\delta^{\mathcal{R}}(p,i)=p^{\prime}$ ,	$r\mathrel{\#}p^{\prime}$
		$i\in I$	$\neg(\exists q^{\prime\prime}\in B$ s.t. $p^{\prime}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}q^{\prime\prime})$
	approximate match refinement	$q\in B,$	$p\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}q,p^{\prime}\scalebox{0.65}{${}% \overset{\surd}{\simeq}{}$}q,$	$\delta^{\mathcal{T}}(q,\sigma){\mathord{\downarrow}},$
	approximate match refinement	$p,p^{\prime}\in Q^{\mathcal{R}}$	$\sigma=\mathsf{sep}(p,p^{\prime}),\delta^{\mathcal{T}}(q,\sigma){\uparrow}$
	approximate prioritized separation	$r\in F,$	$\neg(r\mathrel{\#}q^{\prime}),\neg(r\mathrel{\#}q^{\prime\prime})$ ,	$r\mathrel{\#}q^{\prime\prime}\lor r\mathrel{\#}q^{\prime}$
		$q^{\prime},q^{\prime\prime}\in B$	$\exists i\in I$ s.t. $\delta^{\mathcal{T}}(q,i)=r,$
			$\sigma\vdash q^{\prime}\mathrel{\#}q^{\prime\prime},\sigma\in\cup_{p\scalebox{% 0.65}{${}\overset{\surd}{\simeq}{}$}q}W_{\delta^{\mathcal{R}}(p,i)}$

Appendix 0.B Proofs of Section 4

Proof of Lemma 1

Proof

Let $q\in B$ , $i\in I$ and $\sigma\in I^{*}$ . Suppose

(1)

$\delta^{\mathcal{T}}(q,i)\notin B$ ,
(2)

$\mathsf{access}^{\mathcal{T}}(q)i\in P^{\mathcal{R}}$ ,
(3)

For all $q^{\prime}\in B$ , $\mathsf{access}^{\mathcal{T}}(q^{\prime})\in P^{\mathcal{R}}$ ,
(4)

For all $q^{\prime}\in B$ , $\sigma\vdash\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q)i)\mathrel{\#% }\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))$ , where we write $\sigma=\mathsf{sep}\bm{(}\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)% i),\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))\bm{)}$ for conciseness.

We prove for all $q^{\prime}\in B$ , $\delta^{\mathcal{T}}(q,i)\mathrel{\#}q^{\prime}$ holds from either assumptions (1)-(4) or because the assumptions validate preconditions for the rebuilding rule and after applying the rule we find the required result. Suppose we have a specific $q^{\prime}\in B$ . If $\delta^{\mathcal{T}}(q,i)\mathrel{\#}q^{\prime}$ holds, we are done. From now, assume (5) $\neg(\delta^{\mathcal{T}}(q,i)\mathrel{\#}q^{\prime})$ .

From (4) we derive that (6) $\delta^{\mathcal{T}}(q,i\sigma){\uparrow}$ or $\delta^{\mathcal{T}}(q^{\prime},\sigma){\uparrow}$ . Otherwise, $\delta^{\mathcal{T}}(q,i\sigma){\mathord{\downarrow}}$ and $\delta^{\mathcal{T}}(q^{\prime},\sigma){\mathord{\downarrow}}$ which implies $\delta^{\mathcal{T}}(q,i)\mathrel{\#}q^{\prime}$ under assumption $\sigma\vdash\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q)i)\mathrel{\#% }\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))$ . However, $\delta^{\mathcal{T}}(q,i)\mathrel{\#}q^{\prime}$ contradicts (5).

From assumptions (1)-(3),(5),(6), we know rebuilding can be applied which leads to OQ $\mathsf{access}^{\mathcal{T}}(q)i\sigma$ and $\mathsf{access}^{\mathcal{T}}(q^{\prime})\sigma$ . After the OQs, we know $\delta^{\mathcal{T}}(q,i\sigma){\mathord{\downarrow}}$ and $\delta^{\mathcal{T}}(q^{\prime},\sigma){\mathord{\downarrow}}$ , combining this with $\sigma\vdash\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q)i)\mathrel{\#% }\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))$ proves that $\delta^{\mathcal{T}}(q,i)\mathrel{\#}q^{\prime}$ . Thus, for every $q^{\prime}\in B$ , $\delta^{\mathcal{T}}(q,i)\mathrel{\#}q^{\prime}$ , which is exactly the definition of isolated.

Proof of Theorem 4.1

Proof

Let $n$ be the number of equivalence classes (w.r.t. language equivalence) in the reachable part of $\mathcal{S}|_{I^{\mathcal{R}}}$ . We prove that whenever the basis does not contain $n$ elements, then there always exists an access sequence in $P^{\mathcal{R}}$ that leads to a state that can be isolated using the rebuilding rule. Using recursive reasoning, this proves that the basis contains $n$ states whenever prioritized promotion and rebuilding are not applicable anymore. Let $B,F,\mathcal{T}$ denote the current basis, frontier and observation tree. From the Theorem statement we know:

(1)

$q_{0}^{\mathcal{R}}$ matches $q_{0}^{\mathcal{S}}$ ,
(2)

States can only be promoted using prioritized promotion.

We also use the following general assumptions from the paper:

(3)

$P^{\mathcal{R}}$ is minimal,
(4)

$P^{\mathcal{R}}$ is prefix-closed,
(5)

$\mathcal{R}$ and $\mathcal{S}$ are complete.

First, we note that the state cover and separating family are computed on $\mathcal{R}|_{I^{\mathcal{S}}}$ , which means that both only contain sequences in the alphabet $I^{\mathcal{R}}\cap I^{\mathcal{S}}$ . Because of (3), we know there are $|P^{\mathcal{R}}|$ equivalence classes in the reachable part of $\mathcal{R}|_{I^{\mathcal{S}}}$ . From (1) and (5), we derive that for all $w\in(I^{\mathcal{R}}\cap I^{\mathcal{S}})^{*}$ , $\lambda^{\mathcal{R}}(w)=\lambda^{\mathcal{S}}(w)$ . This implies that $|P^{\mathcal{R}}|=n$ .

If $|B|{}={}n$ , we are done. Otherwise, $|B|{}<n$ . From (1), (3) and $|B|{}<n$ , we know that some state in $Q^{\mathcal{S}}$ , reachable with a sequence from $P^{\mathcal{R}}$ , has not been discovered yet. Because of (4), this state must be reachable from the basis with one input symbol. In other words, there must exist a basis state $q\in B$ and $i\in I$ such that $\mathsf{access}^{\mathcal{T}}(q)i\in P^{\mathcal{R}}$ and $\delta^{\mathcal{T}}(q,i){\uparrow}$ .

From (1), we know that for $\sigma=\mathsf{sep}\bm{(}\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)% i),\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))\bm{)}$ it must hold that $\sigma\vdash\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q)i)\mathrel{\#% }\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))$ because $P^{\mathcal{R}}$ and $\{W_{q}\}^{\mathcal{R}}$ are computed on $\mathcal{R}|_{I^{\mathcal{S}}}$ . From (2), we know that for each $q^{\prime}\in B$ , $\mathsf{access}^{\mathcal{T}}(q^{\prime})\in P^{\mathcal{R}}$ .

Therefore, we can apply Lemma 1 and this will lead to $\delta^{\mathcal{T}}(q,i)$ being isolated. Using the prioritized promotion rule, we can add $\delta^{\mathcal{T}}(q,i)$ to the basis, leading to $|B|{}=n+1$ and we can apply the recursive reasoning to find a new state to promote or to terminate with the required result.

Note that the precise ordering of prioritized promotion and rebuilding is irrelevant. We can never promote states that we do not want to promote. Moreover, when a state is isolated, it can never be un-isolated. Therefore, applying the rebuilding rule while prioritized promotion can be applied never leads to problems. Finally, rebuilding cannot be applied for ever (see termination proof 6.1), therefore we have to use prioritized promotion at some point.

Appendix 0.C Proofs of Section 5

Proof of Lemma 2

Proof

Let $p\in Q^{\mathcal{R}},q\in B,i\in I$ and $\sigma\in I^{*}$ . Suppose

(1)

$\delta^{\mathcal{T}}(q,i)=r\in F$ ,
(2)

$\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q))\scalebox{0.65}{${}% \overset{\surd}{=}{}$}p$ ,
(3)

For all $q^{\prime}\in B$ , $\delta^{\mathcal{R}}(p,i)\scalebox{0.65}{${}\overset{\surd}{\neq}{}$}q^{\prime}$ .

We prove that for all $q^{\prime}\in B$ , $r\mathrel{\#}q^{\prime}$ holds. Suppose we have a specific $q^{\prime}\in B$ . If $r\mathrel{\#}q^{\prime}$ already holds, we are done. From now, assume (4) $\neg(r\mathrel{\#}q^{\prime})$ . Normally, ${}\overset{\surd}{=}{}$ is not a transitive relation, however, because $\mathcal{T}$ is an observation tree for $\mathcal{S}$ , $\delta^{\mathcal{T}}(q,w)=\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q% ),w)$ for all $w\in I^{*}$ . Therefore, we can derive $\delta^{\mathcal{T}}(q)\scalebox{0.65}{${}\overset{\surd}{=}{}$}p$ from (2). From (1)-(4), we know all preconditions required for match separation hold. We apply the rule and execute OQ $\mathsf{access}^{\mathcal{T}}(q)i\sigma$ with $\sigma\vdash q^{\prime}\mathrel{\#}\delta^{\mathcal{R}}(p,i)$ . Note here that $\sigma$ with $\sigma\vdash q^{\prime}\mathrel{\#}\delta^{\mathcal{R}}(p,i)$ must exist due to (3). After the OQ, we have (5) $\delta^{\mathcal{T}}(q,i\sigma){\mathord{\downarrow}}$ and from (3) we derive (6) $\delta^{\mathcal{T}}(q^{\prime},\sigma){\mathord{\downarrow}}$ . Because $\sigma\vdash q^{\prime}\mathrel{\#}\delta^{\mathcal{R}}(p,i)$ and $\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q))\scalebox{0.65}{${}% \overset{\surd}{=}{}$}p$ , it must be that (7) $\sigma\vdash q^{\prime}\mathrel{\#}\delta^{\mathcal{S}}(\mathsf{access}^{% \mathcal{T}}(q),i)$ . Combining (5), (6) and (7) proves $r\mathrel{\#}q^{\prime}$ .

Proof of Theorem 5.1

Proof

Let $\mathcal{S}$ be the SUL and $\mathcal{R}$ the reference with $\mathcal{S}$ and $\mathcal{R}$ both complete Mealy machines. Moreover, let $\mathcal{S}$ be equivalent to $\mathcal{R}$ but $\mathcal{S}$ possibly has a different initial state. From this, we derive that (1) there exists a state $p\in Q^{\mathcal{R}}$ such that $q_{0}^{\mathcal{S}}$ is language equivalent to $p$ . Let $n$ be the number of states in the reachable part of $\mathcal{S}$ . Let $B,F,\mathcal{T}$ denote the current basis, frontier and observation tree.

We prove that if $|B|{}<n$ , then we can add some state to the basis after applying match refinement, match separation, promotion, extension until none of them are applicable anymore. This trivially terminates when we reach $|B|{}=n$ .

Suppose $|B|{}<n$ . There must be some $q\in B$ and $i\in I$ such that (2) $\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q),i)$ represents an equivalence class that is different from the equivalence classes $\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))$ for all $q^{\prime}\in B$ . We perform a case distinction on the location of $\delta^{\mathcal{T}}(q,i)$ in the current observation tree.

•

Suppose $\delta^{\mathcal{T}}(q,i)\in B$ , this immediately contradicts (2).
•

Suppose $\delta^{\mathcal{T}}(q,i){\uparrow}$ , then we can apply the extension rule resulting in $\delta^{\mathcal{T}}(q,i){\mathord{\downarrow}}$ .
•

Suppose $\delta^{\mathcal{T}}(q,i){\mathord{\downarrow}}$ and $\delta^{\mathcal{T}}(q,i)$ is isolated, then we can apply promotion.
•
Suppose (3) $\delta^{\mathcal{T}}(q,i){\mathord{\downarrow}}$ and $\delta^{\mathcal{T}}(q,i)$ is not isolated. Moreover, from (1) we derive that there exists a state $p^{\prime}\in Q^{\mathcal{R}}$ such that (4) $\delta^{\mathcal{R}}(p,\mathsf{access}^{\mathcal{T}}(q))=p^{\prime}$ and (5) $p^{\prime}$ is language equivalent to $\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q))$ . We perform a case distinction based on whether $\delta^{\mathcal{R}}(p^{\prime},i)\scalebox{0.65}{${}\overset{\surd}{=}{}$}q^{\prime}$ for some $q^{\prime}\in B$ and show that for each case we can derive a contradiction or apply a rule to make progress.
- –
  
  Suppose (6) there exists a $q^{\prime}\in B$ such that $\delta^{\mathcal{R}}(p^{\prime},i)\scalebox{0.65}{${}\overset{\surd}{=}{}$}q^{\prime}$ . From (1) we derive that (7) there must exist some state $p^{\prime\prime}\in Q^{\mathcal{R}}$ that is language equivalent to $\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))$ .
  
  We derive that state (8) $\delta^{\mathcal{R}}(p^{\prime},i)$ is language equivalent to an equivalence class that is different from the equivalence classes $\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))$ for all $q^{\prime}\in B$ because $\delta^{\mathcal{R}}(p^{\prime},i)$ is language equivalent to $\delta^{\mathcal{S}}(\mathsf{access}^{\mathcal{T}}(q),i)$ (derived from (5)) and (2). Moreover, (9) $p^{\prime\prime}$ is language equivalent to a state already in the basis (7). By combining (8) and (9), we find that $\delta^{\mathcal{R}}(p^{\prime},i)\mathrel{\#}p^{\prime\prime}$ . Moreover, because $\delta^{\mathcal{R}}(p^{\prime},i)$ and $p^{\prime\prime}$ are in $Q^{\mathcal{R}}$ and they represent different equivalence classes, sequence $\sigma=\mathsf{sep}(\delta^{\mathcal{R}}(p^{\prime},i),p^{\prime\prime})$ exists. This means we can apply match refinement with $\delta^{\mathcal{R}}(p^{\prime},i)$ and $p^{\prime\prime}$ , resulting in $\delta^{\mathcal{R}}(p^{\prime},i)\scalebox{0.65}{${}\overset{\surd}{\neq}{}$}% q^{\prime}$ because otherwise (7) leads to a contradiction.
  This reasoning can be applied for any $q^{\prime}\in B$ such that $\delta^{\mathcal{R}}(p^{\prime},i)\scalebox{0.65}{${}\overset{\surd}{=}{}$}q^{\prime}$ , resulting in $\delta^{\mathcal{R}}(p^{\prime},i)\scalebox{0.65}{${}\overset{\surd}{\neq}{}$}% q^{\prime}$ for all $q^{\prime}\in B$ after multiple applications of match refinement. In this case, we can continue with the case below.
- –
  
  Suppose there does not exist a $q^{\prime}\in B$ such that $\delta^{\mathcal{R}}(p^{\prime},i)\scalebox{0.65}{${}\overset{\surd}{=}{}$}q^{\prime}$ . In this case, we can apply Lemma 2 with $p^{\prime},q,i$ . This results in $\delta^{\mathcal{T}}(q,i)$ being isolated. We can apply promotion which increases the size of the basis.
•

Suppose $\delta^{\mathcal{T}}(q,i)\notin B$ and $\delta^{\mathcal{T}}(q,i)\notin F$ , this contradicts the assumption that $q\in B$ and $i\in I$ .

Note that the precise ordering of the promotion, extension, match refinement and match separation is irrelevant. We discuss the reasoning for each rule.

Promotion: States that are isolated can never become un-isolated, therefore, applying other rules before promotion can never lead to problems.
Extension: If we apply rules before applying extension then either extension is not necessary anymore or we can still apply it but both lead to $\delta^{\mathcal{T}}(q,i){\mathord{\downarrow}}$ .
Match refinement: The only goal of match refinement is to refine the matching. If two reference states match a basis state, we can perform an OQ that leads to one of the reference states no longer being a match. If we apply one of the other rules before match refinement which already leads to this result, then we do not have to perform match refinement but obtain the same result.
Match separation: In this Theorem, the match separation rule always leads to a new apartness pair between the frontier state and the basis. If some other rule already shows the required apartness pair, we do not have to apply match separation but obtain the same result.

Proof of Lemma 3

Proof

Let $\mathcal{T}$ be an observation tree, $\mathcal{R}$ a reference model, $q\in Q^{\mathcal{T}}$ and $p\in Q^{\mathcal{R}}$ . Suppose $\mathsf{mdeg}(q,p)=1$ . In particular, for all $w\in(I^{\mathcal{T}}\cap I^{\mathcal{R}})^{*}$ and $i\in I^{R}\cap I^{\mathcal{T}}$ such that $\delta^{\mathcal{T}}(q,wi){\mathord{\downarrow}}$ ,

\lambda^{\mathcal{T}}(\delta^{\mathcal{T}}(q,w),i)=\lambda^{\mathcal{R}}(% \delta^{\mathcal{R}}(p,w),i).

This is equivalent to for all $v\in(I^{\mathcal{T}}\cap I^{\mathcal{R}})^{*}$ such that $\delta^{\mathcal{T}}(q,v){\mathord{\downarrow}}$

\lambda^{\mathcal{T}}(q,v)=\lambda^{\mathcal{R}}(p,v)

Because we assume reference models are complete w.r.t. their own alphabet $I^{R}$ , the reference is complete w.r.t. $I^{R}\cap I^{\mathcal{T}}$ , this implies for all $v\in(I^{\mathcal{T}}\cap I^{\mathcal{R}})^{*}$ such that $\delta^{\mathcal{T}}(q,v){\mathord{\downarrow}}$ and $\delta^{\mathcal{R}}(p,v){\mathord{\downarrow}}$

\lambda^{\mathcal{T}}(q,v)=\lambda^{\mathcal{R}}(p,v)

which is precisely $q\scalebox{0.65}{${}\overset{\surd}{=}{}$}p$ .

Appendix 0.D Proofs of Section 6

Before we prove the complexity for $AL^{\#}$ , we define and prove an additional termination Theorem. We prove termination of $AL^{\#}$ by proving that each rule lowers the ranking function. To keep consistent with the $L^{\#}$ complexity proof [23], we actually prove that each rule increases some norm and the norm is bounded by the SUL. Specifically, we use norm $N(\mathcal{T})$ :

N(\mathcal{T})=N_{L^{\#}}(\mathcal{T})+|N_{(B\times Q^{\mathcal{R}}\times Q^{% \mathcal{R}})\mathord{\downarrow}}(\mathcal{T})|~{}+~{}|N_{F\mathrel{\#}Q^{% \mathcal{R}}}(\mathcal{T})|~{}+~{}|N_{(B\times F)\mathord{\downarrow}}(% \mathcal{T})|

(1)

where $N_{L^{\#}}(\mathcal{T})$ indicates the slightly adapted norm from [23]. The abbreviations for the summands are defined as follows.

	$\displaystyle N_{L^{\#}}(\mathcal{T})=\|B\|(\|B\|+1)+\|\{(q,i)\in B\times I\mid% \delta(q,i)\mathord{\downarrow}\}\|+\|\{(q,r)\in B\times F\mid q\mathrel{\#}r\}\|$
	$\displaystyle N_{(B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}})\mathord{% \downarrow}}(\mathcal{T})=\{(q,p,p^{\prime})\in B\times Q^{\mathcal{R}}\times Q% ^{\mathcal{R}}\mid\delta(q,\sigma)\mathord{\downarrow}\text{ with }\sigma=% \mathsf{sep}(p,p^{\prime})\}$
	$\displaystyle N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T})=\{(q,p)\in% (B\cup F)\times Q^{\mathcal{R}}\mid q\mathrel{\#}p\}$
	$\displaystyle N_{(B\times F)\mathord{\downarrow}}(\mathcal{T})=\{(q,r)\in B% \times F\mid~{}\delta(q,\sigma)\mathord{\downarrow}\land\delta(r,\sigma)% \mathord{\downarrow}\text{ with }$
	$\displaystyle\qquad\qquad\qquad\qquad\qquad\sigma=\mathsf{sep}\bm{(}\delta^{% \mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)),\delta^{\mathcal{R}}(\mathsf{% access}^{\mathcal{T}}(r))\bm{)}\}$

The summand $N_{(B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}})\mathord{\downarrow}}(% \mathcal{T})$ keeps track of which separating sequences of the reference model have been applied to basis states in the new observation tree. The summand $N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T})$ keeps track of unmatched states between states in the basis or frontier and the reference model. The summand $N_{(B\times F)\mathord{\downarrow}}(\mathcal{T})$ keeps track of separating sequences from the reference model applied to pairs of basis and frontier states. These summands are motivated by the postconditions in Table 1.

Theorem 0.D.1

Every rule application in $AL^{\#}$ increases norm $N(\mathcal{T})$ .

Proof

Let $B,F,\mathcal{T}$ denote the values before and $B^{\prime},F^{\prime},\mathcal{T}^{\prime}$ denote the values after the respective rule application. Let $\mathcal{R}$ denote the reference model. We reuse abbreviations from [23] and the norm definition above

	$\displaystyle N_{Q}(\mathcal{T})=\|B\|\cdot(\|B\|+1)$
	$\displaystyle N_{\mathord{\downarrow}}(\mathcal{T})=\{(q,i)\in B\times I\mid% \delta(q,i)\mathord{\downarrow}\}$
	$\displaystyle N_{\mathrel{\#}}(\mathcal{T})=\{(q,r)\in B\times F\mid q\mathrel% {\#}r\}$

The proof that the rules promotion, extension, separation and equivalence increase $N_{Q}(\mathcal{T})+|N_{\mathord{\downarrow}}(\mathcal{T})|+|N_{\mathrel{\#}}(% \mathcal{T})|$ is similar to the proof in [23]. However, we slightly adapted $N_{Q}(\mathcal{T})$ which has influence on the proof for promotion and separation and we assume stronger guarantees for equivalence. It remains to show that combined with the new summands the total norm still increases. Therefore, we include the proofs for the $L^{\#}$ rules here.

Rebuilding

Let $q,q^{\prime}\in B$ , $i\in I$ and $\sigma\in I^{*}$ . We assume

1.

$\delta^{\mathcal{T}}(q,i)\notin B$
2.

$\neg(q^{\prime}\mathrel{\#}\delta^{\mathcal{T}}(q,i))$ ,
3.

$\mathsf{access}^{\mathcal{T}}(q)$ , $\mathsf{access}^{\mathcal{T}}(q)i\in P^{\mathcal{R}}$ ,
4.

$\delta^{\mathcal{T}}(q,i\sigma){\uparrow}$ or $\delta^{\mathcal{T}}(q^{\prime},\sigma){\uparrow}$
5.

$\sigma=\mathsf{sep}\bm{(}\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)% i),\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q^{\prime}))\bm{)}$ .

The algorithm performs two queries OQs $\mathsf{access}^{\mathcal{T}}(q)i\sigma$ and $\mathsf{access}^{\mathcal{T}}(q^{\prime})\sigma$ . After these OQs, the traces $\delta^{\mathcal{T}}(q,i\sigma)$ and $\delta^{\mathcal{T}}(q^{\prime},\sigma)$ are defined. Particularly, because of assumption (1) and $q\in B$ , we know $\delta^{\mathcal{T}}(q,i)\in F$ after the OQs. Combining this with (4), we find

N_{(B\times F)\mathord{\downarrow}}(\mathcal{T}^{\prime})\supseteq N_{(B\times F% )\mathord{\downarrow}}(\mathcal{T})\cup\{(q^{\prime},\delta^{\mathcal{T}}(q,i))\}

Note that we implicitly use (3) and (5) to ensure that $\sigma$ exists. In some cases we might find that $\delta^{\mathcal{T}}(q,i)\mathrel{\#}q^{\prime}$ which, together with (2), indicates

N_{\mathrel{\#}}(\mathcal{T}^{\prime})\supseteq N_{\mathrel{\#}}(\mathcal{T})% \cup\{(q^{\prime},\delta^{\mathcal{T}}(q,i))\}

Otherwise $N_{\mathrel{\#}}(\mathcal{T}^{\prime})\supseteq N_{\mathrel{\#}}(\mathcal{T})$ . Additionally,

N_{Q}(\mathcal{T}^{\prime})=N_{Q}(\mathcal{T})

N_{\downarrow}(\mathcal{T}^{\prime})\supseteq N_{\downarrow}(\mathcal{T})

N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T}^{\prime})\supseteq N_{(B% \cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T})

N_{(B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}})\mathord{\downarrow}}(% \mathcal{T}^{\prime})\supseteq N_{(B\times Q^{\mathcal{R}}\times Q^{\mathcal{R% }})\mathord{\downarrow}}(\mathcal{T})

Thus, $N(\mathcal{T}^{\prime})\geq N(\mathcal{T})+1$ .

Prioritized promotion

Let $r\in F$ . Suppose (1) $r$ is isolated and suppose (2) $\mathsf{access}^{\mathcal{T}}(r)\in P^{\mathcal{R}}$ . State $r$ is moved from $F$ to $B$ , i.e. $B^{\prime}:=B\cup\{r\}$ , then we have

	$\displaystyle N_{Q}(\mathcal{T}^{\prime})$	$\displaystyle=\|B^{\prime}\|\cdot(\|B^{\prime}\|+1)=(\|B\|+1)\cdot(\|B\|+1+1)$
		$\displaystyle=(\|B\|+1)\cdot\|B\|+2(\|B\|+1)=N_{Q}(\mathcal{T})+2\|B\|+2$

Because we move something from the frontier to the basis, we find

N_{\mathrel{\#}}(\mathcal{T}^{\prime})~{}~{}\supseteq~{}~{}N_{\mathrel{\#}}(% \mathcal{T})\setminus(B\times\{r\})

N_{(B\times F)\mathord{\downarrow}}(\mathcal{T})~{}~{}\supseteq~{}~{}N_{(B% \times F)\mathord{\downarrow}}(\mathcal{T})\setminus(B\times\{r\})

and thus

|N_{\mathrel{\#}}(\mathcal{T}^{\prime})|~{}~{}\geq~{}~{}|N_{\mathrel{\#}}(% \mathcal{T})|-|B|

|N_{(B\times F)\mathord{\downarrow}}(\mathcal{T}^{\prime})|~{}~{}\geq~{}~{}|N_% {(B\times F)\mathord{\downarrow}}(\mathcal{T})|-|B|

Finally,

N_{\downarrow}(\mathcal{T}^{\prime})\supseteq N_{\downarrow}(\mathcal{T})

N_{B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}}}(\mathcal{T}^{\prime})% \supseteq N_{B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}}}(\mathcal{T})

N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T}^{\prime})\supseteq N_{(B% \cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T})

The total norm increases because

N(\mathcal{T}^{\prime})\geq N(\mathcal{T})+2\mid B\mid+~{}2~{}-\mid B\mid-\mid B% \mid~{}\geq N(\mathcal{T})+2

Promotion

Analogous to the proof for prioritized promotion.

Extension

Let $\delta^{\mathcal{T}}(q,i){\uparrow}$ for some $q\in B$ , $i\in I$ . After OQ $\mathsf{access}^{\mathcal{T}}(q)i$ , we get $N(\mathcal{T}^{\prime})\geq N(\mathcal{T})+1$ from [23]. Additionally,

N_{(B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}})\mathord{\downarrow}}(% \mathcal{T}^{\prime})\supseteq N_{(B\times Q^{\mathcal{R}}\times Q^{\mathcal{R% }})\mathord{\downarrow}}(\mathcal{T})

N_{(B\times F)\mathord{\downarrow}}(\mathcal{T}^{\prime})\supseteq N_{(B\times F% )\mathord{\downarrow}}(\mathcal{T})

N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T}^{\prime})\supseteq N_{(B% \cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T})

and thus $N(\mathcal{T}^{\prime})\geq N(\mathcal{T})+1$ .

Separation

Consider a state $r\in F$ and distinct $q,q^{\prime}\in B$ with $\neg(r\mathrel{\#}q)$ and $\neg(r\mathrel{\#}q^{\prime})$ . After OQ $\mathsf{access}^{\mathcal{T}}(q)\sigma$ , we have $N(\mathcal{T}^{\prime})\geq N(\mathcal{T})+1$ from [23]. Additionally,

N_{(B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}})\mathord{\downarrow}}(% \mathcal{T}^{\prime})\supseteq N_{(B\times Q^{\mathcal{R}}\times Q^{\mathcal{R% }})\mathord{\downarrow}}(\mathcal{T})

N_{(B\times F)\mathord{\downarrow}}(\mathcal{T}^{\prime})\supseteq N_{(B\times F% )\mathord{\downarrow}}(\mathcal{T})

N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T}^{\prime})\supseteq N_{(B% \cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T})

Thus, $N(\mathcal{T}^{\prime})\geq N(\mathcal{T})+1$ .

Match separation

Let $q\in B$ , $p\in Q^{\mathcal{R}},i\in I$ , $\sigma\in I^{*}$ , $\delta^{\mathcal{T}}(q,i)=r\in F$ and $\delta^{\mathcal{R}}(p,i)=p^{\prime}$ . Suppose

1.

$q\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}p$ ,
2.

$\neg(r\mathrel{\#}p^{\prime})$ ,
3.

There is no $q^{\prime\prime}\in B$ such that $p^{\prime}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}q^{\prime\prime}$ ,
4.

There exists $q^{\prime}\in B$ such that $\neg(r\mathrel{\#}q^{\prime})$ and $\sigma\vdash p^{\prime}\mathrel{\#}q^{\prime}$ .

After OQ $\mathsf{access}^{\mathcal{T}}(q)i\sigma$ we find either

r\mathrel{\#}q^{\prime}\quad\text{or}\quad r\mathrel{\#}p^{\prime}

If $r\mathrel{\#}q^{\prime}$ , then

N_{\mathrel{\#}}(\mathcal{T}^{\prime})\supseteq N_{\mathrel{\#}}(\mathcal{T})% \cup\{(q^{\prime},r)\}\qquad N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal% {T}^{\prime})\supseteq N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T})

If $r\mathrel{\#}p^{\prime}$ , then

N_{\mathrel{\#}}(\mathcal{T}^{\prime})\supseteq N_{\mathrel{\#}}(\mathcal{T})% \qquad N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T}^{\prime})\supseteq N% _{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T})\cup\{(r,p^{\prime})\}

Additionally,

N_{Q}(\mathcal{T}^{\prime})=N_{Q}(\mathcal{T})

N_{\downarrow}(\mathcal{T}^{\prime})\supseteq N_{\downarrow}(\mathcal{T})

N_{(B\times F)\mathord{\downarrow}}(\mathcal{T}^{\prime})\supseteq N_{(B\times F% )\mathord{\downarrow}}(\mathcal{T})

N_{B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}}}(\mathcal{T}^{\prime})% \supseteq N_{B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}}}(\mathcal{T})

Thus, $N(\mathcal{T}^{\prime})\geq N(\mathcal{T})+1$ .

Match refinement

Let $q\in B$ and $p,p^{\prime}\in Q^{\mathcal{R}}$ . Suppose $q\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}p$ and $q\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}p^{\prime}$ . Note that when using approximate matching this does not imply that $\neg(q\mathrel{\#}p)$ and $\neg(q\mathrel{\#}p^{\prime})$ . After OQ $\mathsf{access}^{\mathcal{T}}(q)\sigma$ with $\sigma=\mathsf{sep}(p,p^{\prime})$ , we find

N_{B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}}}(\mathcal{T}^{\prime})% \supseteq N_{B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}}}(\mathcal{T})\cup\{% (q,p,p^{\prime})\}

Additionally,

N_{\downarrow}(\mathcal{T}^{\prime})\supseteq N_{\downarrow}(\mathcal{T})

N_{\mathrel{\#}}(\mathcal{T}^{\prime})\supseteq N_{\mathrel{\#}}(\mathcal{T})

N_{(B\times F)\mathord{\downarrow}}(\mathcal{T}^{\prime})\supseteq N_{(B\times F% )\mathord{\downarrow}}(\mathcal{T})

N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T}^{\prime})\supseteq N_{(B% \cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T})

Because $N_{Q}$ remains unchanged, we have $N(\mathcal{T}^{\prime})\geq N(\mathcal{T})+1$ .

Prioritized separation

Analogous to the proof for separation, the additional condition on $\sigma$ does not change the postcondition.

Equivalence

Suppose all $r\in F$ are identified and for all $q\in B$ and $i\in I$ , $\delta^{\mathcal{T}}(q,i){\downarrow}$ . These conditions are stronger than the conditions from [23]. Therefore, we know at least $N_{L^{\#}}(\mathcal{T}^{\prime})\geq N_{L^{\#}}(\mathcal{T})+1$ holds. Additionally,

N_{(B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}})\mathord{\downarrow}}(% \mathcal{T}^{\prime})\supseteq N_{(B\times Q^{\mathcal{R}}\times Q^{\mathcal{R% }})\mathord{\downarrow}}(\mathcal{T})

N_{(B\times F)\mathord{\downarrow}}(\mathcal{T}^{\prime})\supseteq N_{(B\times F% )\mathord{\downarrow}}(\mathcal{T})

N_{(B\cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T}^{\prime})\supseteq N_{(B% \cup F)\mathrel{\#}Q^{\mathcal{R}}}(\mathcal{T})

Thus, $N(\mathcal{T}^{\prime})\geq N(\mathcal{T})+1$ .

Proof of Theorem 6.1

Proof

First, we prove that if $\mathcal{T}$ is an observation tree for $\mathcal{S}$ , then

	$\displaystyle N(\mathcal{T})$	$\displaystyle\leq n(n+1)+kn+(n-1)(kn+1)+no^{2}+(kn+1)o+n(kn+1)$
		$\displaystyle\in\mathcal{O}(kn^{2}+kno+no^{2})$

The first part $n(n+1)+kn+(n-1)(kn+1)$ follows from Theorem 3.9 in [23] with some minor adjustments. The set $B$ contains at most $n$ elements and $Q^{\mathcal{R}}$ contains at most $o$ elements. Each state in $Q^{\mathcal{R}}$ can be apart from at most $|Q^{\mathcal{R}}|-1$ other states in $Q^{\mathcal{R}}$ . Therefore,

|\{(q,p,p^{\prime})\in B\times Q^{\mathcal{R}}\times Q^{\mathcal{R}}\mid\delta% (q,\sigma)\mathord{\downarrow}\text{ with }\sigma=\mathsf{sep}(p,p^{\prime})\}% |{}\leq no(o-1)\leq no^{2}

Since the set $B\cup F$ contains at most $kn+1$ elements and each state in $B\cup F$ can be apart from at most $o$ states from $Q^{\mathcal{R}}$ , we have

\displaystyle|\{(q,q^{\prime})\in(B\cup F)\times Q^{\mathcal{R}}\mid q\mathrel% {\#}q^{\prime}\}|

\displaystyle\leq

\displaystyle o(kn+1)

The set $F$ contains at most $kn$ elements and each pair $B\times F$ has at most one $\sigma=\mathsf{sep}(\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)),% \delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(r))$ , thus we have

|N_{(B\times F)\mathord{\downarrow}}|{}\leq kn^{2}

Combining everything and simplifying it leads to

N(\mathcal{T})\in\mathcal{O}(kn^{2}+kno+no^{2})

The ordering on the rules never block the algorithm and when the norm $N(\mathcal{T})$ cannot be increased further, the only applicable rule is the equivalence rule which is guaranteed to lead to the teacher accepting the hypothesis. Therefore, the correct Mealy machine is learned within $\mathcal{O}(kn^{2}+kno+no^{2})$ rule applications.

In $AL^{\#}$ , every (non-terminating) application of the equivalence rule leads to a new basis state. Since the basis is bounded by the number of states in the SUL, which is $n$ , there can be at most $n-1$ applications of the equivalence rule. Each call to ProcCounterEx requires at most $\log m$ output queries (see Theorem 3.11 of [23]).

All rules, except for the equivalence rule, require at most two OQs per rule application. Therefore, the application of these rules requires $\mathcal{O}(kn^{2}+kno+no^{2})$ OQs. Combining everything, we find that $AL^{\#}$ requires $\mathcal{O}(kn^{2}+kno+no^{2}+n\log m)$ and at most $n-1$ EQs.

Appendix 0.E Additional Experiment Information

0.E.1 Experiment Models

In Experiments 1 and 3, we use the following six models, available here under Mealy machine benchmarks.

•

learnresult_fix
•

DropBear
•

OpenSSH
•

model1
•

NSS_3.17.4_server_regular
•

GnuTLS_3.3.8_client_full

Due to the mutations, this means that the largest model that we can learn has 62 states ( $\textit{mut}_{8}(\mathcal{S})$ ). In Experiment 2, we use the ordering for the Adaptive-OpenSSL models as implied by Fig. 5 in [6]. The ordering taken for the Adaptive-Philips is chronological. In Experiment 4, we use the client TCP models. Additionally, we use the following DTLS models.

•

ctinydtls_ecdhe_cert_req.dot
•

etinydtls_ecdhe_cert_req.dot
•

gnutls-3.6.7_all_cert_req.dot
•

mbedtls_all_cert_req.dot
•

scandium-2.0.0_ecdhe_cert_req.dot
•

scandium_latest_ecdhe_cert_req.dot
•

wolfssl-4.0.0_dhe_ecdhe_rsa_cert_req.dot

0.E.2 Mutation Explanations

In this section, we call the input Mealy machine $\mathcal{S}=(Q,I,O,q_{0},\delta,\lambda)$ . Every mutation is applied exactly once to generate the mutated model.

$\textit{mut}_{1}$ : New initial state. This mutation adds a new initial state called dummy as well as a fresh symbol $i$ to $\mathcal{S}$ . From state dummy, all $i\in I$ self loop with the output from $q_{0}$ (the previous initial state). The fresh symbol transitions from dummy to $q_{0}$ . The fresh symbol self loops in all (other) states $q\in Q$ with the output of $\lambda(q_{0},i_{0})$ where $i_{0}$ is the first input in the alphabet.

$\textit{mut}_{2}$ : Change the initial state. This mutation randomly selects one of the states in $Q$ to pick as the new initial state. Because $\mathcal{S}$ is not necessarily strongly connected, the number of states in the resulting Mealy machine might be lower.

$\textit{mut}_{3}$ : Add a state. This mutation adds a new state $q$ to $Q$ . We randomly select a state from $q^{\prime}\in Q$ and $i\in I$ and change the destination of this transition to $q$ , this ensures $q$ is reachable. For all $i\in I$ , we randomly select a destination state $p$ and use the output $\lambda(p,i)$ $80\%$ of the time or a random output $20\%$ .

$\textit{mut}_{4}$ : Remove a state. This mutation removes a non-initial state $q$ from $\mathcal{S}$ . All transitions that lead from $p$ to $q$ with input $i$ are shortcutted to $\delta(q,i)$ with output $\lambda(p,i)$ . If $\delta(q,i)=q$ , we self loop in $p$ .

$\textit{mut}_{5}$ : Divert a transition. This mutation randomly selects $q,q^{\prime}\in Q$ and $i\in I$ . We set $\delta(q,i)=q^{\prime}$ . While $\mathcal{S}$ is equivalent to the resulting Mealy machine, we choose a new $q,q^{\prime},i$ and set $\delta(q,i)=q^{\prime}$ .

$\textit{mut}_{6}$ : Change transition output. This mutation randomly selects $q\in Q$ , $i\in I$ and $o\in O$ . We set $\lambda(q,i)=o$ such that $o$ is distinct from the original $\lambda(q,i)$ .

$\textit{mut}_{7}$ : Remove a symbol. This mutation removes a symbol $i$ from the input alphabet. Consequently, all the transitions with $i$ are not contained in the resulting Mealy machine.

$\textit{mut}_{8}$ : Appending a mutated model. This mutation takes a $\mathcal{S}$ and a natural number $n$ . It first makes a second Mealy machine $\mathcal{S}^{\prime}$ by applying $\textit{mut}_{13}$ to $\mathcal{S}$ . Then it appends $\mathcal{S}^{\prime}$ to $\mathcal{S}$ at the $n^{\text{th}}$ state of $\mathcal{S}$ which we call $q$ , i.e., $\delta(q,i)=q_{0}^{\mathcal{S}^{\prime}}$ for some random $i$ . The natural numbers are chosen based on visual inspection of the models, we consistently choose a state that represents the end of a model. This end state is either the sink state or a state at the end of a very long trace in the model which transitions to the sink state.

$\textit{mut}_{9}$ : Prepending a mutated model. This mutation takes a $\mathcal{S}$ and a natural number $n$ . It first makes a second Mealy machine $\mathcal{S}^{\prime}$ by appling $\textit{mut}_{13}$ to $\mathcal{S}$ . Then it appends $\mathcal{S}$ to $\mathcal{S}^{\prime}$ at the $n^{\text{th}}$ state of $\mathcal{S}^{\prime}$ which we call $q$ , i.e., $\delta(q,i)=q_{0}^{\mathcal{S}}$ for some random $i$ .

$\textit{mut}_{10}$ : Several mutations. This mutation applies mutations $\textit{mut}_{3}$ , $\textit{mut}_{4}$ , $\textit{mut}_{5}$ and $\textit{mut}_{6}$ to $\mathcal{S}$ in this particular order.

$\textit{mut}_{11}$ : Several mutations with different initial state. This mutation applies $\textit{mut}_{2}$ , $\textit{mut}_{3}$ , $\textit{mut}_{4}$ , $\textit{mut}_{5}$ and $\textit{mut}_{6}$ to $\mathcal{S}$ in this particular order.

$\textit{mut}_{12}$ : Changing many transitions. This mutation applies $\textit{mut}_{5}$ , $\textit{mut}_{6}$ , $\textit{mut}_{5}$ , $\textit{mut}_{6}$ , $\textit{mut}_{5}$ , $\textit{mut}_{6}$ to $\mathcal{S}$ .

$\textit{mut}_{13}$ : Many mutations. This mutation applies $\textit{mut}_{10}$ three times to $\mathcal{S}$ .

$\textit{mut}_{14}$ : Union. This mutation takes $\mathcal{S}$ and makes a second Mealy machine $\mathcal{S}^{\prime}$ by applying $\textit{mut}_{13}$ to $\mathcal{S}$ . We combine $\mathcal{S}$ and $\mathcal{S}^{\prime}$ by creating one new dummy initial state with two fresh symbols for which one goes to $q_{0}^{\mathcal{S}}$ and the other to $q_{0}^{\mathcal{S}^{\prime}}$ . The fresh symbols and other transitions are handled in the same way as in $\textit{mut}_{1}$ .

0.E.3 Additional Figure Experiment 1

In Fig. 7, we show additional pairwise comparison plots from Experiment 1. Each plot compares a pair of algorithms per model and mutation, where a point $(x,y)$ represents that the algorithm on the x-axis required $x$ symbols over all seeds and the algorithm on the y-axis requires $y$ symbols. Points below the diagonal indicate that the y-algorithm outperforms the x-algorithm, points below the dashed (dotted) line indicate a factor two (ten) improvement, respectively.

0.E.4 Additional Tables Experiment 2

Tables 5 and 4 display the mean number of inputs per model and algorithm in the same style as Table 2. The reference row indicates which reference model was used for the adaptive algorithms. THe teal values indicate the lowest, and therefore, best score.

Table 4: Mean inputs for learning a Philips model with a reference.

Algorithm	model2	model3	model4	model5	model6
Reference	model1	model2	model3	model4	model5
$L^{*}$	657	2196	2196	5340	5340
KV	256	1671	1672	2128	2128
$L^{\#}$	212	862	862	1730	1730
$\partial L^{*}_{M}$	657	2325	2650	2997	4520
IKV	160	814	458	770	841
$AL^{\#}$	146	918	580	1592	1043
$L^{\#}_{\scalebox{0.7}{R}}$	161	954	749	1765	1382
$L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{=}{}$}}}$	167	956	590	1775	1178
$L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}}}$	152	920	590	1602	1178
$L^{\#}_{\scalebox{0.7}{R},\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{=}% {}$}}}$	161	954	580	1765	1043

Table 5: Mean inputs for learning an OpenSSL model with a reference.

Algorithm	097c	097e	098f	098l	098m	098s	098u	098za	100	100f	100h	100m	101	101h	101k	102	110pre1
Reference	097	097c	097e	098f	098l	098m	098s	098u	098m	100	100f	100h	100h	100	101h	101k	102
$L^{*}$	21273	21273	31608	2408	2065	2065	2506	1820	2065	2065	2506	1820	3143	1820	1477	1281	1134
KV	22434	19376	24754	6103	4267	4504	5634	3686	4267	4659	6492	3545	8423	3545	2764	2239	1178
$L^{\#}$	23512	23305	25412	6786	5145	5934	6562	3684	5145	5819	6283	3983	8938	3983	3075	2293	1452
$\partial L^{*}_{M}$	5155	5155	5155	3203	2065	2065	2317	2363	2065	2065	2317	2430	3820	2430	2115	1281	1134
IKV	3290	4872	1506	2945	2326	2875	3789	792	876	3153	3398	792	2033	831	636	977	376
$AL^{\#}$	3808	1391	1391	861	756	737	2100	638	751	737	1953	638	1778	638	514	461	665
$L^{\#}_{\scalebox{0.7}{R}}$	19632	1397	1397	843	756	737	2109	642	751	737	1963	642	1791	642	518	1545	1538
$L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{=}{}$}}}$	23346	17503	1417	7358	5458	6437	6441	5081	768	6619	7054	4804	1800	4804	532	2606	601
$L^{\#}_{\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}}}$	3558	17201	1417	2764	3856	2641	3365	657	768	3110	3974	657	1800	664	532	464	667
$L^{\#}_{\scalebox{0.7}{R},\scalebox{0.7}{\scalebox{0.65}{${}\overset{\surd}{=}% {}$}}}$	19683	1397	1391	843	756	737	2109	638	751	737	1963	638	1782	638	520	937	663

Appendix 0.F Detailed Example Run of $AL^{\#}$

In this section, we give a detailed explanation of how $\mathcal{S}$ can be learned with references $\mathcal{R}_{1}$ and $\mathcal{R}_{3}$ using $AL^{\#}$ . From the references, we derive the following state cover and separating family:

P=P^{\mathcal{R}_{1}}\cup P^{\mathcal{R}_{3}}=\{\varepsilon,c,ca\}\cup\{% \varepsilon,b,bb,bbb\}=\{\varepsilon,c,ca,b,bb,bbb\}

W_{r_{0}}=W_{r_{1}}=\{c,ac\},W_{r_{2}}=\{c\},W_{s_{0}}=W_{s_{1}}=W_{s_{2}}=W_{% s_{3}}=\{c,b,bb\}

1.

$AL^{\#}$ always start with an observation tree containing only the root node.
2.
The first rule we apply is the rebuilding rule. We apply this rule with $q=q^{\prime}=q_{0}$ (the root state) and $i=c$ because the conditions hold:
- •
  
  $\delta^{\mathcal{T}}(q_{0},c)\notin B$ because $\delta^{\mathcal{T}}(q_{0},c){\uparrow}$ ,
- •
  
  $\neg(q^{\prime}\mathrel{\#}\delta^{\mathcal{T}}(q,i))=\neg(q_{0}\mathrel{\#}% \delta^{\mathcal{T}}(q_{0},c))$ because $\delta^{\mathcal{T}}(q_{0},c){\uparrow}$ ,
- •
  
  $\mathsf{access}^{\mathcal{T}}(q,i)=\mathsf{access}^{\mathcal{T}}(q_{0},c)=c\in P$ ,
- •
  
  $\mathsf{access}^{\mathcal{T}}(q^{\prime})=\mathsf{access}^{\mathcal{T}}(q_{0})% =\varepsilon\in P$ ,
- •
  
  $\delta^{\mathcal{T}}(q_{0},cac){\uparrow}\land\delta^{\mathcal{T}}(q_{0},ac){\uparrow}$ with $\mathsf{sep}(\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)i),\delta^{% \mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q^{\prime})))=\mathsf{sep}(\delta^{% \mathcal{R}}(c),\delta^{\mathcal{R}}(\varepsilon))=ac=\sigma$ .
We execute $OQ(cac)$ and $OQ(ac)$ .
3.

We can now apply prioritized promotion with $q_{1}$ because $ac\vdash q_{0}\mathrel{\#}q_{1}$ . The resulting observation tree looks as follows:
4.

Next, we try to promote the state reached by $ca$ in $\mathcal{R}_{1}$ . Note that we cannot apply rebuilding with $q=q_{1}$ , $q^{\prime}=q_{0}$ and $i=a$ because $\mathsf{sep}(\delta^{\mathcal{R}}(ca),\delta^{\mathcal{R}}(\varepsilon))=c$ and $\delta^{\mathcal{T}}(c){\mathord{\downarrow}}$ and $\delta^{\mathcal{T}}(cac){\mathord{\downarrow}}$ .
We do apply rebuilding with $q=q_{1}$ , $q^{\prime}=q_{1}$ and $i=a$ . All the conditions hold and $\sigma=c$ . This leads to output queries $OQ(cac)$ and $OQ(cc)$ .
5.

We can apply prioritized promotion with $q_{2}$ because $c\vdash q_{1}\mathrel{\#}q_{2}$ and $c\vdash q_{0}\mathrel{\#}q_{2}$ .
6.

We again use the rebuilding rule for $q=q_{0},q^{\prime}=q_{0}$ and $i=b$ . All the conditions hold and we use $\sigma=\mathsf{sep}(\delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q)i),% \delta^{\mathcal{R}}(\mathsf{access}^{\mathcal{T}}(q^{\prime})))=\mathsf{sep}(% \delta^{\mathcal{R}}(b),\delta^{\mathcal{R}}(\varepsilon))=bb$ . We execute the queries $OQ(bbb)$ and $OQ(bb)$ . This leads to the following observation tree. Note that we cannot promote $q_{7}$ .

As stated in the Section 7, we only apply the rebuilding rule with $\mathsf{access}^{\mathcal{T}}(q,i)$ and $\mathsf{access}^{\mathcal{T}}(q^{\prime})$ from the same reference model. We have not explored $bb,bbb\in P$ yet but because these access sequences do not reach frontier states, we cannot apply the rebuilding rule any further. The prioritized promotion rule can also not be applied. Therefore, we move on to the other $AL^{\#}$ rules.
7.

Next, we apply the extension rule for all the current basis states. This means the following output queries are performed: $OQ(cb)$ , $OQ(caa)$ , $OQ(cab)$ . This results in the following observation tree.

We compute the matching table and then apply prioritized separation:

state	match	$\mathsf{mdeg}$	$r_{0}$	$r_{1}$	$r_{2}$	$s_{0}$	$s_{1}$	$s_{2}$	$s_{3}$
$q_{0}$	$r_{0}$	1.0	7/7	5/7	4/7	0/4	0/4	0/4	1/4
$q_{1}$	$r_{1}$	1.0	3/4	4/4	2/4	0/4	0/4	0/4	1/4
$q_{2}$	$r_{2}$	1.0	1/2	1/2	2/2	0/2	0/2	0/2	1/2

•

To separate $q_{4}$ from the basis, we can use the separating sequence $ac$ because $ac\in W_{r_{1}}$ and $r_{1}$ is the expected matching reference state of $q_{4}$ because $\delta^{\mathcal{T}}(q_{0},a)=q_{4}$ , $q_{0}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}r_{0}$ and $\delta^{\mathcal{R}}(r_{0},a)=r_{1}$ . Therefore, we execute $OQ(aac)$ .
•

To separate $q_{6}$ from the basis, we can use the separating sequence $ac$ because $ac\in W_{r_{0}}$ and $r_{0}$ is the expected matching reference state of $q_{6}$ because $\delta^{\mathcal{T}}(q_{1},c)=q_{6}$ , $q_{1}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}r_{1}$ and $\delta^{\mathcal{R}}(r_{1},c)=r_{0}$ . Therefore, we execute $OQ(ccac)$ .
•

To separate $q_{11}$ from the basis, we can use the separating sequence $ac$ because $ac\in W_{r_{1}}$ and $r_{1}$ is the expected matching reference state of $q_{11}$ because $\delta^{\mathcal{T}}(q_{2},a)=q_{11}$ , $q_{2}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}r_{2}$ and $\delta^{\mathcal{R}}(r_{2},c)=r_{1}$ . Therefore, we execute $OQ(caaac)$ .
•

To separate $q_{3}$ from the basis, we can use the separating sequence $c$ because $c\in W_{r_{2}}$ and $r_{2}$ is the expected matching reference state of $q_{3}$ because $\delta^{\mathcal{T}}(q_{2},c)=q_{11}$ , $q_{2}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}r_{2}$ and $\delta^{\mathcal{R}}(r_{2},c)=r_{2}$ . Therefore, we execute $OQ(cacc)$ .

9.

Next, we apply the promotion rule for $q_{4}$ because $a\vdash q_{4}\mathrel{\#}q_{0}$ , $a\vdash q_{4}\mathrel{\#}q_{1}$ and $a\vdash q_{4}\mathrel{\#}q_{2}$ . The resulting observation tree looks as follows:
10.

Next, we apply extension with $q_{4}$ and input $b$ , resulting in $OQ(ab)$ .

11.

We perform another round of prioritized separation with the following matching table:

state	match	$\mathsf{mdeg}$	$r_{0}$	$r_{1}$	$r_{2}$	$s_{0}$	$s_{1}$	$s_{2}$	$s_{3}$
$q_{0}$	$r_{0}$	0.857	12/14	8/14	7/14	2/6	2/6	1/6	3/6
$q_{1}$	$r_{1}$	1.0	5/9	9/9	5/9	0/5	0/5	0/5	1/5
$q_{2}$	$r_{2}$	1.0	3/5	2/5	5/5	0/3	0/3	0/3	1/3
$q_{4}$	$s_{0},s_{1}$	1.0	2/3	1/3	1/3	2/2	2/2	1/2	0/2

•

To separate $q_{6}$ further, we execute $OQ(ccc)$ .
•

To separate $q_{11}$ further, we execute $OQ(caac)$ .
•

To separate $q_{20}$ , we can use the separating sequence $cb$ because $nb\in W_{s_{1}}\cup W_{s_{2}}$ and $s_{1}$ and $s_{2}$ are the expected matching reference states of $q_{20}$ because $\delta^{\mathcal{T}}(q_{4},b)=q_{20}$ , $q_{4}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}s_{0}$ and $\delta^{\mathcal{R}}(s_{0},b)=s_{1}$ , $q_{4}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}s_{1}$ and $\delta^{\mathcal{R}}(s_{1},b)=s_{2}$ . Therefore, we execute $OQ(abb)$ .
•

To separate $q_{13}$ , we can use the separating sequence $b$ because $b\in W_{s_{0}}\cup W_{s_{1}}$ and $s_{0}$ and $s_{1}$ are the expected matching reference states of $q_{13}$ because $\delta^{\mathcal{T}}(q_{4},a)=q_{13}$ , $q_{4}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}s_{0}$ and $\delta^{\mathcal{R}}(s_{0},a)=s_{0}$ , $q_{4}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}s_{1}$ and $\delta^{\mathcal{R}}(s_{1},a)=s_{1}$ . Therefore, we execute $OQ(aab)$ .

The resulting observation tree looks as follows:

12.
We can no longer apply prioritized separation so we continue with standard separation. We use the separating sequences $b,c$ and $ac$ to separate states. Specifically, we perform the following output queries:
- •
  
  $OQ(bac)$ and $OQ(bc)$ to separate $q_{7}$ from the basis.
- •
  
  $OQ(cbac)$ to separate $q_{10}$ from the basis.
- •
  
  $OQ(acac)$ to separate $q_{5}$ from the basis.
- •
  
  $OQ(cabac)$ and $OQ(cabc)$ to separate $q_{12}$ from the basis.
The resulting observation tree looks as follows:

13.

All frontier states are identified and no frontier states are isolated. This means we can possibly perform match refinement or match separation. The matching table looks as follows:

state	match	$\mathsf{mdeg}$	$r_{0}$	$r_{1}$	$r_{2}$	$s_{0}$	$s_{1}$	$s_{2}$	$s_{3}$
$q_{0}$	$r_{0}$	0.834	15/18	10/18	8/18	4/9	3/9	2/9	6/9
$q_{1}$	$r_{1}$	1.0	6/11	11/11	5/11	0/7	0/7	2/7	2/7
$q_{2}$	$r_{2}$	1.0	4/6	2/6	6/6	0/4	0/4	1/4	2/4
$q_{4}$	$s_{0}$	1.0	2/5	2/5	2/5	4/4	3/4	1/4	1/4

Since every basis state is matched with exactly one reference state, match refinement is not applicable. However, we can apply match separation with $q=q_{4}$ , $q^{\prime}=q_{4}$ , $p=s_{0}$ and $i=b$ because $\delta^{\mathcal{T}}(q_{4},b)=q_{20}\in F$ , $\neg(q_{20}\mathrel{\#}q_{4})$ , $\delta^{\mathcal{R}}(s_{0},b)=s_{1}$ , $s_{0}\scalebox{0.65}{${}\overset{\surd}{\simeq}{}$}q_{4}$ , and for all $q\in B$ , $s_{1}$ is not the match. We use $\sigma=bb$ because $bb\vdash s_{1}\mathrel{\#}q_{4}$ and execute $OQ(abbb)$ . This leads to the following observation tree:

14.

The match separation led to isolation of $q_{20}$ which we can now add to the basis with the promotion rule.
15.

Additionally, we can apply promotion for state $q_{23}$ because it is the only state with output 2 for input $b$ . Notice that we have found all the states in $\mathcal{S}$ after promoting $q_{23}$ .
16.
Next, we apply the extension rule several times
- •
  
  $OQ(aba)$ and $OQ(abc)$ for $q_{20}$ ,
- •
  
  $OQ(abba)$ and $OQ(abbc)$ for $q_{23}$ .
17.
We apply the prioritized separation rule several times
- •
  
  $OQ(abac)$ and $OQ(ababb)$ for $q_{36}$ ,
- •
  
  $OQ(aabb)$ for $q_{13}$ ,
- •
  
  $OQ(abbbc)$ and $OQ(abbbb)$ for $q_{34}$ ,
- •
  
  $OQ(abbac)$ and $OQ(abbab)$ for $q_{38}$ ,

18.

Next, we apply separation a few more times.

•

$OQ(abcac)$ and $OQ(abcabb)$ for $q_{37}$ ,
•

$OQ(abbcac)$ and $OQ(abbcabb)$ for $q_{39}$ ,
•

$OQ(abbbac)$ for $q_{35}$ ,
•

$OQ(acabb)$ for $q_{5}$ ,

This leads to the following final observation tree and matching table:

state	match	$\mathsf{mdeg}$	$r_{0}$	$r_{1}$	$r_{2}$	$s_{0}$	$s_{1}$	$s_{2}$	$s_{3}$
$q_{0}$	$r_{0}$	0.834	15/18	10/18	8/18	12/18	7/18	5/18	14/18
$q_{1}$	$r_{1}$	1.0	6/11	11/11	5/11	0/7	0/7	2/7	2/7
$q_{2}$	$r_{2}$	1.0	4/6	2/6	6/6	0/4	0/4	1/4	2/4
$q_{4}$	$s_{0}$	0.924	2/5	2/5	2/5	12/13	7/13	4/13	5/13
$q_{20}$	$s_{1}$	0.889	2/5	2/5	2/5	4/9	8/9	4/9	3/9
$q_{23}$	$s_{2}$	0.8	2/5	2/5	2/5	1/5	1/5	4/5	2/5

19.

Next, the equivalence rule is used to construct a hypothesis from the observation tree. This hypothesis is correct so the algorithm terminates.

State Matching and Multiple References in Adaptive Active Automata Learning††thanks: This research is partially supported by the NWO grant No. VI.Vidi.223.096.

Abstract

1 Introduction

2 Overview

3 Preliminaries

Definition 3.1

Definition 3.2

Definition 3.3

4 L#superscript𝐿#L^{\#}italic_L start_POSTSUPERSCRIPT # end_POSTSUPERSCRIPT with Rebuilding

4.1 Observation Trees

Definition 4.1

Example 4.1

Example 4.2

4.2 The L#superscript𝐿#L^{\#}italic_L start_POSTSUPERSCRIPT # end_POSTSUPERSCRIPT Algorithm

Example 4.3

4.3 Rebuilding in L#superscript𝐿#L^{\#}italic_L start_POSTSUPERSCRIPT # end_POSTSUPERSCRIPT

Lemma 1

Example 4.4

Theorem 4.1

Corollary 1

5 L#superscript𝐿#L^{\#}italic_L start_POSTSUPERSCRIPT # end_POSTSUPERSCRIPT using State Matching

5.1 State Matching

Example 5.1

Lemma 2

Example 5.2

Theorem 5.1

5.2 Optimised Separation using State Matching

Example 5.3

5.3 Approximate State Matching

Example 5.4

Lemma 3

6 Adaptive L#superscript𝐿#L^{\#}italic_L start_POSTSUPERSCRIPT # end_POSTSUPERSCRIPT

Definition 6.1

Theorem 6.1

7 Adaptive Learning with Multiple References

Definition 7.1

Example 7.1

8 Experimental Evaluation

9 Conclusion

9.0.1 Future work.

References

Appendix 0.A Additional Definition, Figure, Table and Algorithm

Definition 0.A.1

Appendix 0.B Proofs of Section 4

Proof of Lemma 1

Proof

Proof of Theorem 4.1

Proof

Appendix 0.C Proofs of Section 5

Proof of Lemma 2

Proof

Proof of Theorem 5.1

Proof

Proof of Lemma 3

Proof

Appendix 0.D Proofs of Section 6

Theorem 0.D.1

Proof

Proof of Theorem 6.1

Proof

Appendix 0.E Additional Experiment Information

0.E.1 Experiment Models

0.E.2 Mutation Explanations

0.E.3 Additional Figure Experiment 1

0.E.4 Additional Tables Experiment 2

Appendix 0.F Detailed Example Run of A⁢L#𝐴superscript𝐿#AL^{\#}italic_A italic_L start_POSTSUPERSCRIPT # end_POSTSUPERSCRIPT

State Matching and Multiple References in Adaptive Active Automata Learning^†^†thanks: This research is partially supported by the NWO grant No. VI.Vidi.223.096.

4 $L^{\#}$ with Rebuilding

4.2 The $L^{\#}$ Algorithm

4.3 Rebuilding in $L^{\#}$

5 $L^{\#}$ using State Matching

6 Adaptive $L^{\#}$

Appendix 0.F Detailed Example Run of $AL^{\#}$