\forestset

/tikz/mandatory/.style= circle,fill=drawColor, draw=drawColor, inner sep=0.25em, /tikz/optional/.style= circle, fill=white, draw=drawColor, inner sep=0.25em, featureDiagram/.style= for tree= text depth = 0, parent anchor = south, child anchor = north, draw = drawColor, edge = draw=drawColor, , /tikz/abstract/.style= fill = blue!85!cyan!5, draw = drawColor , /tikz/concrete/.style= fill = blue!85!cyan!20, draw = drawColor , mandatory/.style= edge label=node [mandatory] , optional/.style= edge label=node [optional] , or/.style= tikz+= .parent)coordinateA) – (!u.children) coordinate (B) – (!ul.parent) coordinate (C) pic[fill=drawColor, angle radius=0.8em]angle; , /tikz/or/.style= , alternative/.style= tikz+= .parent)coordinateA) – (!u.children) coordinate (B) – (!ul.parent) coordinate (C) pic[draw=drawColor, angle radius=0.8em]angle; , /tikz/alternative/.style= , /tikz/placeholder/.style= , collapsed/.style= rounded corners, no edge, for tree= fill opacity=0, draw opacity=0, l = 0em, , /tikz/hiddenNodes/.style= midway, rounded corners, draw=drawColor, fill=white, minimum size = 1.2em, minimum width = 0.8em, scale=0.9 ,

MulTi-Wise Sampling: Trading Uniform T-Wise Feature Interaction Coverage for Smaller Samples

Tobias Pett
Karlsruhe Institutte of Technology
Karlsruhe, Germany
[email protected]
&

Sebastian Krieter
Paderborn University
Germany
[email protected]
&

Thomas Thüm
Paderborn University
Germany
thomas.thü[email protected]
&

Ina Schaefer
Karlsruhe Institutte of Technology
Karlsruhe, Germany
[email protected]

Abstract

Ensuring the functional safety of highly configurable systems often requires testing representative subsets of all possible configurations to reduce testing effort and save resources. The ratio of covered t-wise feature interactions (i.e., T-Wise Feature Interaction Coverage) is a common criterion for determining whether a subset of configurations is representative and capable of finding faults. Existing t-wise sampling algorithms uniformly cover t-wise feature interactions for all features, resulting in lengthy execution times and large sample sizes, particularly when large t-wise feature interactions are considered (i.e., high values of t). In this paper, we introduce a novel approach to t-wise feature interaction sampling, questioning the necessity of uniform coverage across all t-wise feature interactions, called MulTi-Wise Sampling. Our approach prioritizes between subsets of critical and non-critical features, considering higher t-values for subsets of critical features when generating a t-wise feature interaction sample. We evaluate our approach using subject systems from real-world applications, including BusyBox, Soletta, Fiasco, and uCLibc-ng. Our results show that sacrificing uniform t-wise feature interaction coverage between all features reduces the time needed to generate a sample and the resulting sample size. Hence, MulTi-Wise Sampling Sampling offers an alternative to existing approaches if knowledge about feature criticality is available.

Keywords t-wise coverage, software-product lines, spl testing, sampling

Section 1 Introduction

Nowadays configurable systems are highly complex, evolve frequently, and appear in safety-critical areas such as passenger transportation, leading to strict requirements for the functional safety of those systems. Automotive systems are prime examples of highly configurable systems for which functional safety must be assured throughout their life cycle. To assure the functional safety of highly configurable systems, system testing is important [1, 2]. However, system testing often mandates a trade-off between efficient test execution (i.e., the time it takes to execute all test cases) and test coverage (i.e., how many system configurations were covered by the testing procedure) [3, 4, 5, 6, 7]. This trade-off is even more severe for configurable systems since test cases must be executed on multiple system configurations. A system configuration is a selection of configuration options (i.e., features) of the configurable system [8]. Thorough testing would execute all test cases on all possible system configurations to achieve the highest possible system coverage. However, executing all test cases on all possible system configurations is not feasible in practice because of the combinatorial explosion problem [9, 10]. For instance, the analysis of JPHipster[11] has shown that executing all test cases for a system with 48 features, 15 cross-tree constraints, and 26,256 valid configurations requires 182 days. As a comparison, configurable systems from real-world applications such as BusyBox¹¹1https://www.busybox.net/ typically consist of more than 631 features and 1,312 cross-tree constraints, allowing more than 13,402 valid configuration options. Therefore, the trade-off between testing time and system coverage must also consider the number of configurations for testing.

Sample-based testing counteracts the challenge posed by the combinatorial explosion problem by generating a small but representative subset (i.e., a sample) of all possible configurations for testing [10, 12, 13]. A promising criterion to find a representative subset of configurations is to cover all possible combinations of feature tuples for size t (i.e., achieving t-wise feature interaction coverage) uniformly for all features [14, 15, 13]. Modern sampling algorithms generate samples that achieve t-wise feature interaction coverage in a short time [13]. However, many algorithms only scale to small values of t ( $t\leq 3$ ) or generate samples that are still too large for testing configurable systems in the available time [16]. For example, the sampling algorithm YASA [16] calculates a sample that achieves three-wise feature interaction coverage of size 196 for BusyBox in about 61 minutes. The YASA sampling algorithm dramatically reduces the number of configurations for testing. However, the resulting sample is still not small enough to make sample-based testing for frequently evolving configurable systems feasible.

Recent sampling approaches attempt to adapt t-wise feature interaction sampling to meet the demands of frequently evolving systems [17, 18, 19, 20, 21, 22, 23, 24]. Many approaches soften the requirement to achieve full t-wise feature interaction coverage by applying random sampling [17, 18]. Other approaches try to utilize the evolution of configurable systems to achieve t-wise feature interaction coverage incrementally over time and, therefore, lessen the test effort for each system version [20, 21]. Again, other approaches utilize the criticality of features to prioritize configurations for testing but do not guarantee t-wise feature interaction coverage of the tested set of configurations [23, 24, 22]. All of these approaches provide benefits but also have weaknesses. For instance, random and incremental sampling approaches do not guarantee that critical features are covered with enough t-wise feature interaction coverage for every system version. Prioritization approaches often do not ensure a certain degree of t-wise feature interaction coverage.

In this paper, we contribute to the ongoing research of adapting t-wise interaction sampling to the requirements of frequently evolving systems by introducing MulTi-Wise Sampling. Our approach categorizes features into various subsets based on their criticality and covers each subset with individual strengths of t-wise feature interaction coverage. The categorization of features into subsets is independent of measuring the criticality of a feature, which means that our approach supports various metrics to determine the criticality of features (e.g., risk assessments [24, 23, 25], and change impact analysis [23, 26]). We combine the concepts of feature prioritization and systematical t-wise feature interaction sampling into an algorithm to mitigate the weaknesses of both approaches and strengthen their benefits. Compared to existing approaches, MulTi-Wise Sampling reduces the number of system configurations for testing while still covering critical feature interactions.

We evaluate our approach by applying MulTi-Wise Sampling to four subject systems (i.e., BusyBox²²2https://www.busybox.net/, Fiasco³³3https://github.com/kernkonzept/fiasco, Soletta⁴⁴4https://github.com/solettaproject/soletta, uCLibc-ng⁵⁵5https://github.com/wbx-github/uclibc-ng/) from real-world applications and compare the resulting sample sizes and the time to generate a sample against a state-of-the-art t-wise sampling algorithm. Our results indicate that we can reduce the number of configurations depending on the number of critical features and the degree of t-wise feature interaction coverage with which they are covered. If many features are critical and need to be covered with high t-wise feature interaction coverage, we do not see much reduction in the sample sizes.

In summary, we make the following contributions to improving sample-based testing:

•

We propose MulTi-Wise Sampling a novel approach to systematically cover subsets of features with different strengths of t-wise feature interaction coverage.
•

We provide an open-source implementation of a sampling algorithm that utilizes our concept ⁶⁶6https://doi.org/10.5281/zenodo.11654696
•

We evaluate our concept on four real-world configurable systems ⁷⁷7https://doi.org/10.5281/zenodo.11082621

Section 2 Foundations

In this section, we describe the foundations to understand the context of this paper and the concepts presented later.

Refer to caption — Figure 1: Feature diagram of a simplified car system consisting of 11 features. The feature boxes in the diagram show the feature’s name and its literal representation in parentheses.

Section 2.1 Feature Modelling

Basic of Feature Modelling

A configurable system consists of products that share common core properties but differ in variable configuration options [8]. Customers select and deselect configuration options to customize their final product from the configurable system. Variability Models are typically used to express the variability of a configurable system by capturing the configuration options and their dependencies. According to [21], we define a variability model $\mathcal{M}=(\mathcal{F},\mathcal{D})$ as a tuple consisting of the set of all configuration options $\mathcal{F}=\{f_{0},f_{1},\dots,f_{i}\}$ (i.e., features) and the set of all dependencies between them $\mathcal{D}=\{d_{0},d_{1},\dots,d_{i}\}$ . Feature diagrams visualize the features and dependencies of variability models as a hierarchical tree structure.

Figure 1, shows the feature diagram of a simplified automotive system, which consists of the root feature Car and its ten child features. The features Carbody and Gearbox are mandatory child features of Car, which means that they must be selected if their parent is selected. The feature Radio is an optional child feature of Car, meaning it can either be selected or deselected if its parent is selected. The features Ports, Navigation, Bluetooth are optional child features of Radio. The Ports feature has two child features USB and CD in an OR-Group, meaning that if Ports is selected, at least one of the child features must be selected. The features Manual and Automatic are child features of Gearbox contained in an Alternative-Group, which means that exactly one of the features must be selected if their parent is selected. Typically, the feature diagram contains cross-tree constraints to model non-hierarchical dependencies between features of the variability model. Cross-tree constraints are visualized as a logical formula below the feature diagram. In the case of our running example, no cross-tree constraints exist.

Another way of representing a variability model is to use a logical formula in CNF format. Using this representation, each clause in the CNF formula represents a dependency between features. For instance, the clause $(\text{{Car}}\implies\text{{Carbody}})$ represents the mandatory parent-child dependency between the feature Car and Carbody. Visualizing the variability model in CNF notation requires a more space-efficient representation of features than referring to their feature name. Typically, a shorthand notation using literals (i.e., integer values) is used for this purpose [16, 27]. We define the set of all literals for a feature model $\mathcal{M}$ as $\mathcal{L}(\mathcal{M})=~{}\big{\{}~{}f_{0},f_{1},\dots,f_{n}~{}\big{|}~{}n% \in|\mathcal{F}|~{}\big{\}}$ , where the number of existing literals equals the total number of features in the feature model. In our running example, we assign feature names to literals by counting the features from the top left of the feature model to the bottom right so that literal $f_{0}=0$ represents the Car feature and $f_{10}=10$ represents feature CD. Figure 1 indicates the literal assignment for our running example by showing the literal representing a feature in brackets beside the feature name.

Configurations of a Configurable System

The variability model of a configurable system represents all product configurations $\mathcal{C}=\{C_{1},C_{2},\dots,C_{n}\}$ that can be derived from the system by selecting and deselecting features [8, 28, 6]. We define a complete configuration $C=(\mathcal{F}_{sel},\mathcal{F}_{des})$ from the configuration space $\mathcal{C}$ as a pair containing the set of selected features ( $\mathcal{F}_{sel}\subseteq\mathcal{F}$ ) and the set of deselected features ( $\mathcal{F}_{des}\subseteq\mathcal{F}$ ). We require that the sets of selected and deselected features are disjunct $\mathcal{F}_{sel}\cap\mathcal{F}_{des}=\emptyset$ , which means that the intersection between both sets results in the empty set. We also require that the union of selected and deselected features results in the set of all features in the feature model ( $\mathcal{F}=\mathcal{F}_{sel}\cup\mathcal{F}_{des}$ ). We use a shorthand notation using literals instead of feature names to express configurations and express deselected features by the negation operator ( $\neg$ ). For instance, a minimal configuration for our running example where features Car, Carbody, Gearbox, and Manual are selected is expressed in shorthand notation as $C_{example}=~{}\big{\{}~{}0,1,3,7,\neg 2,\neg 4,\neg 5,\neg 6,\neg 8,\neg 9,% \neg 10$ . A configuration is valid as long as all dependencies defined by the set of dependencies $\mathcal{D}$ from the feature model can be fulfilled by the feature selection of the configuration. For instance, our example configuration is valid because the requirement of the alternative group under Gearbox is fulfilled by $C_{example}$ . Selecting the feature Automatic (7) in addition to the feature Manual leads to an invalid configuration because the feature selection contradicts the alternative group below feature Gearbox.

Section 2.2 Configuration Testing

Product-based testing uses valid configurations to assure the functional safety of configurable systems. However, in practice, testing all valid configurations of the configurable system is often not feasible because of the combinatorial explosion problem [11]. Various approaches exist to select a representative subset (i.e., a Sample) of configurations for testing [29, 13]. T-wise feature interaction sampling is one of those approaches that consider the coverage of all valid combinations of t-wise feature tuples ( $tTuple$ ) as a quality criterion for samples [30, 31, 16]. A t-wise feature tuple, is a tuple ( $tTuple=(f_{1},f_{2},\dots,f_{t})$ ) of size t that contains features from a feature model. For instance, $(7,8)$ is a pair-wise feature tuple for the features Manual and Automatic from our running example.

Building all permutations of selected and deselected features in the t-wise feature tuple generates all possible t-wise feature interactions for the t-wise feature tuple. We define a t-wise feature interaction $I=~{}\big{(}~{}f_{1},\dots,f_{t}~{}\big{)}$ as a tuple of size t that contains selected and deselected features from the feature model $\mathcal{M}$ . For instance, $~{}\big{\{}~{}(7,8),(\neg 7,8),(7,\neg 8),(\neg 7,\neg 8)~{}\big{\}}$ are all feature interactions for the pair-wise ( $t=2$ ) feature tuple $~{}\big{(}~{}7,8~{}\big{)}$ A feature interaction of size t is valid when it can appear in at least one configuration of the configuration space $\mathcal{C}$ for a feature model. The pair-wise feature combinations $(\neg 7,8),(7,\neg 8)$ are valid for our running example, while the feature combinations $(7,8),(\neg 7,\neg 8)$ are not. We define the set of all valid t feature interactions $~{}\big{\{}~{}I_{1},\dots,I_{n}~{}\big{\}}$ for a set of features as $\mathcal{I}(t,\mathcal{M},\mathcal{F})=~{}\big{\{}~{}(f_{1},\dots f_{t})~{}% \big{|}~{}(f_{1},\dots f_{t})\in C(\mathcal{M})~{}\big{\}}$ .

A t-wise feature interaction is covered by a configuration in a sample when the interaction tuple is contained in at least one configuration of the sample. We define the set of t-wise feature interactions in a sample by $\mathcal{I}(t,\mathcal{M},\mathcal{F},S)=~{}\big{\{}~{}(f_{1},\dots f_{t})~{}% \big{|}~{}~{}\text{there exists}~{}C\in S~{}\text{such as}~{}(f_{1},\dots f_{t% })\subseteq C~{}\big{\}}$ . A sample achieves full (100%) t-wise feature interaction coverage when all t-wise feature interactions are covered by at least one configuration of the sample. Accordingly, we define the ratio of t-wise feature interaction coverage that a sample achieves by dividing the number of all valid feature interactions in the sample by the number of all valid feature interactions of the feature model.

Section 3 MulTi-Wise Sampling

Section 3.1 Problemstatement

T-wise feature interaction coverage is a prominent metric to rate the effectiveness of samples for testing product lines[13]. Testing a set of configurations that achieve full (100%) t-wise feature interaction coverage for high values of t (i.e., $t=3$ , $t=4$ , $t=5$ , etc.) promises a high chance of discovering a fault in a product line. Modern sampling algorithms such as yasa can calculate samples that achieve t-wise feature interaction coverage for various values of t (e.g., $t=1$ , $t=2$ , $t=3$ , $t=4$ , etc.) [16]. However, generating samples that achieve full t-wise feature interaction coverage for higher values of t (i.e., $t>2$ ) becomes more time-consuming and results in more configurations with an increasing t-value because exponentially more feature interactions must be considered when generating a sample. For instance, using the yasa sampling algorithm to generate a pair-wise ( $t=2$ ) sample for our running example takes only a few milliseconds, and the resulting sample contains seven configurations. In contrast, calculating a sample that achieves three-wise feature interaction coverage ( $t=3$ ) takes three seconds, and the resulting sample contains 18 configurations. While calculating a sample in three seconds and testing 18 configurations seems manageable, for larger systems, the number of configurations increases exponentially with the number of optional features in the feature model [11]. In industry branches where testing each configuration requires lots of monetary resources (i.e., safety-critical cyber-physical systems), testing an enormous number of configurations is not feasible. Therefore, the challenges of increasing the time to generate samples and the increase in the number of configurations for testing limit the application of samples, achieving full (100%) t-wise feature interaction coverage for higher values of t (i.e., $t>2$ ).

Section 3.2 Solution Idea

In practice, certain groups of features exist, for which achieving higher t-wise feature interaction coverage is potentially more valuable than for other groups of features. For instance, covering the feature interactions between a group of features that share highly interconnected implementation artefacts with higher values of t is probably more valuable than doing so for features that do not share any implementation artefacts. Other examples of more valuable feature groups include safety-critical features of a system and recently changed features. In our running example, the features Carbody, Manual, and Automatic strongly interact with each other. Therefore, they belong to a critical feature group for which the testing requirement defines pair-wise coverage. The features Car, Radio, and Gearbox are important to premium customers and therefore belong to a feature group requiring one-wise feature interaction coverage. For the remaining features, no special t-wise feature interaction coverage is required.

We aim to use the differences between testing requirements for feature groups to enable a dynamic tradeoff between large samples that achieve full (i.e., 100%) t-wise feature interaction coverage for all features equally for the same t value and smaller samples that achieve full t-wise feature interaction coverage for feature groups with specially assigned t values. To do so, we leverage the correlation that covering fewer t-wise feature interactions results in fewer configurations for testing. Compared to sampling algorithms that try to achieve full t-wise feature interaction coverage for all features equally, we reduce the number of t-wise feature interactions using two approaches. Firstly, we split the whole feature set into multiple distinct feature groups and consider only feature interactions between features in the groups when calculating a sample. For instance, all valid pair-wise feature interaction between the features Carbody, Manual, and Automatic and all valid one-wise feature interactions between the features Car, Radio, and Gearbox will be actively considered when calculating a sample with MulTi-Wise Sampling. However, t-wise feature interactions between the feature groups such as interactions between the features Car, Manual, Bluetooth, will not be considered during the sample calculation.

The second approach to reduce the t-wise feature interactions to be covered is that each feature group gets its own t value assigned. Therefore, we can specify small groups of features that will be covered with high t-wise feature interaction coverage and large groups of features that will be covered with low t-wise feature interaction coverage. For instance, in our running example, we specify pair-wise coverage for the small group of features Carbody, Manual, and Automatic, one-wise coverage for another small group of features Car, Radio, and Gearbox, and no coverage criteria for the remaining features. Doing so largely reduces the feature interactions to be covered, compared to specifying a pair-wise feature interaction coverage criterion for all features equally.

Section 3.3 Feature Grou**

MulTi-Wise Sampling uses groups of features that get a certain t value assigned as a basic concept to generate samples. We formally define such a group of features $TG=~{}\big{\{}~{}f_{1},f_{2},\dots,f_{n}~{}\big{\}}|TG\subseteq\mathcal{F}$ as a set of features from the feature model $\mathcal{M}$ . We define the set of all t-wise feature interaction groups as $\mathcal{TG}=~{}\big{\{}~{}TG_{1},\dots,TG_{n},TGD~{}\big{\}}$ . $TGD$ represents a default t-wise feature interaction group that contains all features from the feature set $\mathcal{F}$ that are not contained in any other t-wise feature interaction group. A feature $f\in\mathcal{F}$ can simultaneously be part of multiple t-wise feature interaction groups. The union, $\bigcup\mathcal{TG}=\mathcal{F}$ of all t-wise feature interaction groups, is equal to the set of all features defined by the feature model. We assign a t-wise feature interaction coverage value $t~{}$ to each $TG$ . Multiple t-wise feature interaction groups may have the same t-wise feature interaction value.

Table 1: Assignment of features from the running example to t-wise feature interaction groups.

$TG$	t-value	Features
$TG_{1}$	1	$~{}\big{\{}~{}\text{{Car},{Radio}, {Gearbox}}~{}\big{\}}$
$TG_{2}$	2	$~{}\big{\{}~{}\text{{Carbody},{Manual},{Automatic}}~{}\big{\}}$
$TGD$	0	$~{}\big{\{}~{}\text{{USB},{CD}},\texttt{Ports},\texttt{Navigation},\texttt{% Bluetooth}~{}\big{\}}$

Table 1 shows an example of t-wise feature interaction groups for our running example. We define three t-wise feature interaction groups, of which one is the default group. Each group gets assigned a t-wise feature interaction value ( $t$ ), shown in column two of Table 1. Column three shows which features from the feature model are assigned to each group. For instance, in Table 1, we see that $TG_{1}$ contains the features Car, Radio, Gearbox and that this group has a t-value of $t=1$ assigned. The features Carbody, Manual, and Automatic are in $TG_{2}$ with a t-value of $t=2$ . The default t-wise interaction group $TGD$ contains the features USB, CD, Ports, Navigation, Bluetooth, which are not assigned to any other t-wise feature interaction group in our example. We assign a t-value of zero ( $t=0$ ) to the default t-wise feature interaction group, meaning that considering interactions between those features for the resulting sample is optional.

Section 3.4 Generating MulTi-Wise Samples

In algorithm 1 we show a pseudocode algorithm to visualize the generation process for a sample using MulTi-Wise Sampling. This algorithm uses a feature model ( $\mathcal{M}$ ), as well as a set of t-wise feature interaction groups ( $\mathcal{TG}$ ) as input to generate a sample ( $S$ ). The resulting sample achieves t-wise feature interaction coverage for all given t-wise feature interaction groups for the respective t-value of each group.

Input:

\mathcal{M}=(\mathcal{F},\mathcal{D}),\mathcal{TG}=~{}\big{\{}~{}TG_{1},\dots,% TG_{n},TGD~{}\big{\}}

Data:

S=\emptyset,t=0,

Result:

S

2foreach $TG\in\mathcal{TG}$ do

t\leftarrow t(TG)

;

S^{\prime}\leftarrow\text{{\color[rgb]{0,0.53125,0.421875}\definecolor[named]{% pgfstrokecolor}{rgb}{0,0.53125,0.421875}coveringStrategy}}(\mathcal{M},TG,S)

;

S\leftarrow S\cup S^{\prime}

;

7 end foreach

Algorithm 1 MulTiWise Sampling Algorithm

Our algorithm iterates over the set of t-wise feature interaction groups provided as input (see line 1). As shown in line 2, the algorithm extracts the t-value of the respective t-wise interaction group. After that, the algorithm generates an intermediate sample $S^{\prime}$ that achieves the respective t-wise feature interaction coverage for the features in the current feature interaction group (see line 3). The final step of our algorithm (see line 4) merges the intermediate sample $S^{\prime}$ with the global result sample $S$ by adding all configurations from $S^{\prime}$ to $S$ that do not already exist in $S$ .

We use the covering strategy of the existing yasa sampling algorithm [16] to generate the intermediate sample $S^{\prime}$ of algorithm 1. yasa is an efficient t-wise sampling technique that provides various options when calculating samples that achieve t-wise feature interaction coverage [16] For instance, the yasa algorithm can generate a sample that achieves t-wise feature interaction coverage for only a subset of features from the feature model. We use exactly this functionality of yasa to calculate an intermediate sample for each t-wise feature interaction group.

Input:

\mathcal{M}=~{}\big{(}~{}\mathcal{F},\mathcal{D}~{}\big{)}

t\geq 0

TG=~{}\big{\{}~{}f_{1},\dots,f_{n}~{}\big{\}}

S=~{}\big{\{}~{}C_{1},\dots,C_{n}~{}\big{\}}

Data:

\mathcal{I}(t,\mathcal{M},TG)=\emptyset

Result:

S=\emptyset

\mathcal{I}\leftarrow\mathcal{I}(t,\mathcal{M},TG)

;

5 foreach $I\in\mathcal{I}(t,\mathcal{M},TG)$ do

6 if $\not\exists C\in S:I\in C$ then

7 foreach $C\in S$ do

C^{\prime}\leftarrow C\cup I

;

9 if $\text{valid}(C^{\prime},\mathcal{M})$ then

S\leftarrow(S\setminus~{}\big{\{}~{}C~{}\big{\}})\cup~{}\big{\{}~{}C^{\prime}~% {}\big{\}}

;

11 return $S$ ;

13 end if

15 end foreach

17 end if

S\leftarrow S\cup I

;

20 end foreach

S\leftarrow\text{completeConfigurations}(S,\mathcal{M})

;

22 return $S$ ;

Algorithm 2 Basic Sampling Algorithm [16]

We present the basic covering strategy of the yasa sampling algorithm in algorithm 2. The algorithm takes the feature model $\mathcal{M}$ , a t-wise coverage value $t$ , and a set of features $TG$ as input and generates a sample $S$ as output. The algorithm starts by generating a set of all valid feature interaction tuples $\mathcal{I}(t,\mathcal{M},TG)$ of size $t$ for the feature set $TG$ (see line 1).

Table 2: Valid feature t-wise interactions for the t-wise feature interaction groups from Table 1.

$TG$	$\mathcal{I}(t,\mathcal{M},TG)$
$TG_{1}$	$~{}\big{\{}~{}~{}\big{(}~{}0~{}\big{)},~{}\big{(}~{}3~{}\big{)},~{}\big{(}~{}2% ~{}\big{)},~{}\big{(}~{}\neg 2~{}\big{)}~{}\big{\}}$
$TG_{2}$	$~{}\big{\{}~{}~{}\big{(}~{}1,7~{}\big{)},~{}\big{(}~{}1,\neg 7~{}\big{)},~{}% \big{(}~{}1,8~{}\big{)},~{}\big{(}~{}1,\neg 8~{}\big{)},~{}\big{(}~{}7,\neg 8~% {}\big{)},~{}\big{(}~{}\neg 7,8~{}\big{)}~{}\big{\}}$
$TGD$	$\emptyset$

Table 2 shows the valid interaction tuples for the t-wise feature interaction groups presented in Table 1. We present the features of our running example in their literal notation to keep a concise representation. Generating feature interaction tuples for $TG_{1}$ results in four feature tuples of size one. The features Car (i.e., 1) and Gearbox (i.e., 3) are mandatory in our running example. Therefore, the feature interaction tuples $\neg 1$ and $\neg 3$ are excluded from Table 2. Two one-wise feature interaction tuples (e.g., $2$ and $\neg 2$ ) are valid For feature Radio because it is an optional feature in our running example. algorithm 2 generates six valid pair-wise feature interaction tuples for $TG_{2}$ , resulting from the specified t-value $t=2$ and the constraints of the feature model. The set of generated feature interaction tuples for $TGD$ is empty because our example specifies a t-value of $t=0$ for this group, which means that no specific t-wise feature interactions between the features in this group must be considered when generating a sample.

After generating the t-wise feature interaction tuples, the algorithm iterates over all tuples (see line 2) and checks for each interaction tuple $I_{n}$ whether this tuple is already covered in any configuration of the sample $S$ . The algorithm proceeds with the next interaction tuple $I_{n+1}$ if a configuration contains the interaction tuple $I_{n}$ . Otherwise, it iterates over the set of existing configurations in $S$ and tries adding $I_{n}$ to the currently selected configuration $C^{\prime}$ so that the result is still valid with regard to the feature model (see lines 3 to 5). If the resulting configuration $C^{\prime}$ is still valid, $C^{\prime}$ replaces the original configuration $C$ in the sample, and the algorithm continues with the next interaction tuple (see lines 6 to 9). Otherwise, the interaction tuple $I$ is added to the sample $S$ as a new configuration (see line 13). Finally, the algorithm completes each configuration in $S$ by selecting or deselecting all undecided feature options.

Table 3: Resulting sample from using MulTi-Wise Sampling.

	$C$	Feature Literals
$\text{After}TG_{1}$	$C_{1}$	$~{}\big{\{}~{}0,1,\neg 2,3,\neg 4,\neg 5,\neg 6,7,\neg 8,\neg 9,\neg 10~{}\big% {\}}$
	$C_{2}$	$~{}\big{\{}~{}0,1,2,3,\neg 4,\neg 5,\neg 6,\neg 7,8,\neg 9,\neg 10~{}\big{\}}$
$\text{After}TG_{2}$	$C_{1}$	$~{}\big{\{}~{}0,1,\neg 2,3,\neg 4,\neg 5,\neg 6,7,\neg 8,\neg 9,\neg 10~{}\big% {\}}$
	$C_{2}$	$~{}\big{\{}~{}0,1,2,3,\neg 4,\neg 5,\neg 6,\neg 7,8,\neg 9,\neg 10~{}\big{\}}$
$\text{After}TGD$	$C_{1}$	$~{}\big{\{}~{}0,1,\neg 2,3,\neg 4,\neg 5,\neg 6,7,\neg 8,\neg 9,\neg 10~{}\big% {\}}$
	$C_{2}$	$~{}\big{\{}~{}0,1,2,3,\neg 4,\neg 5,\neg 6,\neg 7,8,\neg 9,\neg 10~{}\big{\}}$

In Table 3, we show the stepwise process of generating a sample with MulTi-Wise Sampling. As we show in algorithm 1, MulTi-Wise Sampling is an iterative process that generates an intermediate sample for t-wise feature interaction groups and uses them further in the sampling process. In the case of our running example, MulTi-Wise Sampling starts with calculating an intermediate sample for $TG_{1}$ . Covering all feature tuples for $TG_{1}$ (see the first row of Table 2) requires two configurations (visualized with green colour in Table 3). After the sample for $TG_{1}$ is generated, we generate an intermediate sample that covers the feature interactions identified for $TG_{2}$ (see the second row of Table 3). We use the intermediate sample generated for $TG_{1}$ as input. The previously generated configurations $C_{1}$ and $C_{2}$ already cover three of the six feature interaction tuples for $TG_{2}$ (visualized with green colour). Our procedure generates a third configuration to cover the remaining feature interaction tuples of $TG_{2}$ to finalize the intermediate sample for this iteration. The default t-wise feature interaction group $TGD$ of our running example does not introduce any t-wise feature interaction tuple that needs to be covered by MulTi-Wise Sampling. Therefore, our procedure generates no more configurations and returns a final sample containing the configuration $C_{1}$ , $C_{2}$ , and $C_{3}$ visualized in the last row of Table 3.

Section 4 Evaluation

In our evaluation, we investigate whether MulTi-Wise Sampling enables a tradeoff between large samples that achieve full t-wise feature interaction coverage for all features equally and small samples that achieve t-wise feature interaction coverage for specified groups of features. Our investigation focuses on the robustness and performance of our approach to show that MulTi-Wise Sampling is feasible in practice. We analyze these aspects of our approach by answering the following research questions.

(Robustness) RQ1: How does the size of feature groups influence sampling metrics of MulTi-Wise Sampling?

In RQ1, we investigate how changing the size of feature groups with assigned t-values influences sampling metrics such as sample size, sampling time, and achieved percentage of t-wise feature interaction coverage. In particular, we are interested in investigating how gradual changes in the size between two feature interaction groups with different t values influence our sampling metrics. Therefore we specify multiple experiment setups, where we distirbute all features from a subject system between two t-wise feature interaction groups. In the extreme cases of those experiment setups, all features (i.e., 100%) are assigned to only one t-wise feature interaction group. We define the other experiment setups by gradually distributing a percentage of features from one t-wise feature interaction group to the other. We execute MulTi-Wise Sampling for all experiment setups and measure sample size, sampling time, and achieved t-wise feature interaction coverage. We assume that the sample size, the sampling time, and the achieved t-wise feature interaction coverage will increase when more features are assigned to a t-wise feature interaction group with a higher t value.

(Performance) RQ2: How does MulTi-Wise Sampling compare to state-of-the-art t-wise sampling algorithms?

With RQ2, we investigate the performance of MulTi-Wise Sampling compared to state-of-the-art sampling algorithms. In particular, we are interested in how changing the size of t-wise feature interaction groups changes the sample size, the sampling time, and the achieved t-wise feature interaction coverage compared to a baseline algorithm. We choose the yasasampling algorithm as the baseline for our comparison because in previous evaluations [16] it shows the best performance values compared to other sampling algorithms proposed in the literature. yasagenerates samples that achieve t-wise feature interaction coverage for all features equally, which is the same as assigning all features to only one feature interaction group in our MulTi-Wise Sampling setting (i.e., the extreme cases from RQ1). We execute yasafor the two t values assigned to the t-wise feature interaction groups from RQ1, and measure the sample size, the sampling time and the achieved t-wise feature interaction coverage. We then compare the measured results with those measured in the experiments for RQ1. We assume that MulTi-Wise Sampling achieves comparable results to yasafor the extreme setups of our experiments. Therefore we also assume, that the findings for the experiment setups in between the extreme cases will be comparable to those of RQ1.

Section 4.1 Experiment Setup

Subject Systems

We perform our evaluation on the subject systems BusyBox, Fiasco, Soletta, and uCLibc-ng ⁸⁸8BusyBox: https://github.com/TUBS-ISF/busybox-case_study, Fiasco: https://github.com/TUBS-ISF/fiasco-case-study, Soletta:https://github.com/TUBS-ISF/soletta-case-study, uCLibc-ng: https://github.com/TUBS-ISF/uclibc-case-study. Pett et al. [21] already used extensive feature model histories of those subjects to evaluate their Continuous T-Wise Sampling approach. In our experiment, we do not need to analyze an extensive feature model history but only the most recent feature model of each subject system.

Table 4: Overview Subject Systems

Subject System	Features	Constraints
BusyBox	631	1312
Fiasco	253	1795
Soletta	457	2319
uCLibc-ng	235	1905

Table 4 shows the number of features and the number of constraints for the most recent feature models in our subject systems’ available feature model history. Those values reveal that our subject systems vary from a large feature model with over 631 features and 1312 constraints to medium-sized feature models with about 253 features and 1795 constraints.

Experiment Setups

We evaluate MulTi-Wise Sampling with a static subset of two feature interaction groups with the t-values two and three. In doing so, our experiments include a group with a lower t-value (i.e., t=2) and one group with a higher t-value (i.e., t=3 ).

Table 5: Experiment Setups.

Experiments	$TG$	t-value	percentage of features
Exp1	Pair-Wise YASA Sampling ( $t=2$ )
Exp2	$TG_{t2}$	2	100
	$TG_{t3}$	3	0
Exp3	$TG_{t2}$	2	75
	$TG_{t3}$	3	25
Exp4	$TG_{t2}$	2	50
	$TG_{t3}$	3	50
Exp5	$TG_{t2}$	2	25
	$TG_{t3}$	3	75
Exp6	$TG_{t2}$	2	0
	$TG_{t3}$	3	100
Exp7	Three-Wise YASA Sampling ( $t=3$ )

Table 5 shows seven experiment setups, from which two are baseline setups (visualized by grey background), and five are setups for MulTi-Wise Sampling. As baseline setups, we use two setups of the yasasampling algorithm one to achieve full pair-wise feature interaction coverage (i.e., Exp1) and one to achieve full three-wise feature interaction coverage (i.e., Exp7). The experiment setups of MulTi-Wise Sampling define two t-wise feature interaction groups, one with a t-value of two (i.e., $TG_{t2}$ ) and one with a t-value of three (i.e., $TG_{t3}$ ). The varying factor between the setups is the number of features assigned to each t-wise feature interaction group. Column four of Table 5 shows the feature distributions per t-wise feature interaction group in per cent. The first experiment setup of MulTi-Wise Sampling (i.e., Exp2) assigns 100% of the features to feature group $TG_{t2}$ and 0% of the features to feature group $TG_{t3}$ . The following experiment setups (i.e., Exp3 to Exp6), reduce the number of features assigned to $TG_{t2}$ by 25% while increasing the number assigned to $TG_{t3}$ by the same percentage, until Exp6 assigns 0% of features to $TG_{t2}$ and 100% to $TG_{t3}$ .

Evaluation Hardware

We implement MulTi-Wise Sampling as a Java command line tool. The tooling integrates functionality from the FeatureIDE library⁹⁹9https://featureide.github.io/. All of our tooling is available as part of our replication package¹⁰¹⁰10https://doi.org/10.5281/zenodo.11654696 We execute the tooling on a virtual server running Ubuntu 20.04 as an operating system and with an OpenJDK version 1.1.8.0_292 as the running version of Java. The server has a processor with eight cores running at 2400MHz and is equipped with 16GB of physical memory, from which we use 12GB as virtual memory to execute our experiments. For the review version of this paper, we cannot provide the source code of our tooling because it may contain hints of the author’s identities, which contradicts the double-blind review regulations. We will include the source as part of the replication package in the final version of this paper.

Section 4.2 Experiment Execution

In Table 5 we specify two baseline experiment setups (i.e., Exp1 and Exp7). Each experiment setup represents an experiment in which the yasa sampling algorithm is executed to either compute a sample that achieves pair-wise coverage (i.e., Exp1) or three-wise coverage (i.e., Exp7). For our evaluation, we use the implementation of the yasa sampling algorithm that is included in the FeatureIDE library version 3.10.We execute each experiment on all subject systems specified in Table 5 ten times, to mitigate random influences on our results. During the experiment executions, we measure the time it takes to compute the sample (i.e., sampling time), by setting timestamps before and after executing the yasa sampling algorithm. We measure the sample size and the achieved pair-wise and three-wise coverage of the resulting samples, by using utility functionality that is included in FeatureIDE library version 3.10.

In Table 5 we specify five experiment setups (i.e., Exp2 to Exp6) each representing an experiment in which the MulTi-Wise Sampling sampling algorithm is executed. We execute each experiment setup ten times on each subject system specified in Table 4. In each execution, we assign the specified number of features (see Table 5) to the respective feature groups by randomly choosing which feature gets assigned to which group. By doing so we reduce the experiment bias of assigning the same features to the same t-wise feature interaction groups, while still ensuring that the relative number of features in each group stays the same throughout each experiment execution. for each experiment execution, we measure the same metrics (i.e., sampling time, sample time, pair-wise coverage, and three-wise coverage) as for executing the yasa sampling algorithm using the same measuring methods.

Section 4.3 Results

We use a series of boxplots to visualize the results of measuring Coverage t2 (see Figure 2), Coverage t3 (see Figure 3), Sample Time (see Figure 4), and Sample Size (see Figure 5). Each boxplot accumulates the results measured for all subject systems. We show the accumulated results to keep this paper concise and provide a detailed overview of our results in our results package online ¹¹¹¹11https://doi.org/10.5281/zenodo.11082621 for reference. The y-axis of each boxplot represents the respective statistic (i.e., Coverage t2, Coverage t3, Calculation Time, and Sample Size). For instance, in Figure 2, the y-axis represents the pair-wise coverage in percent. The y-axis for the results of measuring Coverage t2 and Coverage t3 are in linear scale, while the y-axis for the results of measuring Calculation Time and Sample Size are in logarithmic scale. We use a logarithmic scale for visualizing Calculation Time and Sample Size because the results of our experiment setups differ largely for both of these metrics. Each boxplot contains seven boxes, representing the experiment setups shown in Table 5. The x-axis of each boxplot shows the identifiers of those setups. Each boxplot starts with Exp1 on the far left, which represents the results for the sampling algorithm yasaconfigured to achieve full pair-wise ( $t=2$ ) coverage. Thereafter, the results for Exp2, Exp3, Exp4, Exp5 Exp6 follow. The last entry on the x-axis always represents results measured for the yasasampling algorithm configured to achieve full three-wise ( $t=3$ ) coverage (i.e., Exp7).

Pair-wise feature interaction coverage

In Figure 2, we present the results of measuring the pair-wise feature interaction coverage ratio. The measured median value for our experiments varies between 96% and 100%. We identify the lowest median coverage (i.e., 96%) for Exp4 and the highest median coverage ratio of 100% for Exp1, Exp2, Exp6, and Exp7. The experiment setup Exp3 achieves a higher median value of pair-wie coverage (about 98%) as Exp4). The results for Exp5 show a higher median value than Exp3.

Three-wise feature interaction coverage

In Figure 3, we present the results of measuring the three-wise feature interaction coverage ratio. Overall experiment setups, the median values differ between a maximum of 100% (achieved by Exp6 and Exp7) and a minimum value of 90% (achieved by Exp3 and Exp4). The experiment setups Exp1 and Exp2 achieve the second highest three-wise coverage ratio with a median value of 96%. Exp5 achieves a coverage ratio of 94% in median.

Sampling time

In Figure 4, we show the time it takes to compute a sample in our experiment setups (i.e., sampling time). We measure the shortest median sampling time (about 5 seconds) for Exp1 and Exp2. The median sampling time grows exponentially from Exp2about 5 seconds) to Exp6. The exponential growth appears as a linear growth in Figure 4 because of the logarithmic scale of the y-axis. For Exp7, we measure the same median sampling time (about 2000 seconds) as for Exp6

Sample Size

In Figure 5, we show the results of measuring the sample size of the samples resulting from executing our experiment setups on all subject systems. We measure the smallest sample sizes for the experiment setups Exp1 and Exp2, with a median value of 120. From Exp2 to Exp7, the sample size grows exponentially from a median value of 120 to a median value of 800. In Figure 5, the exponential growth in sample size appears linear because of the logarithmic scale of the y-axis.

Section 4.4 Discussion

RQ1: How does the size of feature groups influence sampling metrics of MulTi-Wise Sampling?

Our results show exponential growth for sample time (see Figure 4) and sample size (see Figure 5) when more features of the feature model are assigned to t-wise feature interaction groups with higher t values. Those results align with the theoretical assumption and general observation that achieving t-wise feature interaction coverage for higher values of t requires more configurations and longer computation times because more t-wise feature interaction tuples must be covered. Therefore, our results meet our expectations.

Our results of measuring the achieved pair-wise feature interaction coverage reveal that assigning 100% of features to a pair-wise feature interaction group (i.e., Exp2 and assigning 100% of features to a three-wise feature interaction group (i.e., Exp3) achieves 100% pair-wise feature interaction coverage. The results confirm our expectation that a sample that covers 100% of three-wise feature interactions always covers 100% of pair-wise feature interactions. We observe in our results that shifting 25% of features from the pair-wise feature interaction group to the three-wise feature interaction group (i.e., Exp2) reduces the achieved pair-wise feature interaction coverage. We explain this observation by how MulTi-Wise Sampling covers t-wise feature interactions. Considering the two t-wise feature interaction groups of Exp2 as an example, MulTi-Wise Sampling generates all valid pair-wise feature interaction tuples for the features contained in $TG_{t2}$ (i.e., consisting of 75% of features) and all three-wise feature interaction tuples for the features contained in $TG_{t3}$ (i.e., consisting of 25% of features), and covers them in the resulting sample. However, MulTi-Wise Sampling does not actively generate and cover valid feature interaction tuples between both groups (i.e., feature tuples for 25% of features). Hence, we observe a reduced pair-wise feature interaction coverage when comparing Exp2 and Exp3. Observing the achieved pair-wise feature interaction coverage for Exp4 and Exp5 further supports our explanation. In Exp4, we shift another set 25% of features from the pair-wise interaction group $TG_{t2}$ to the three-wise feature interaction group $TG_{t}3$ , leading to 50% features that are not actively considered by MulTi-Wise Sampling. Therefore, we expect a reduced pair-wise feature interaction coverage compared to Exp3 which is visible in our results (see Figure 2). In Exp5, we reduce the number of features contained in $TG_{t2}$ by another 25% and respectively increase the number of features contained in $TG_{t3}$ by this amount, leading to a smaller set (i.e., 25%) of not actively considered features. According to our expectation, the achieved pair-wise feature interaction coverage of the samples for Exp5hould increase compared to those of Exp4 which is visible in our results.

Our results of measuring the ratio of achieved three-wise feature interaction coverage (see Figure 3) reveal that MulTi-Wise Sampling computes samples that achieve 100% three-wise coverage when 100% of all features are assigned to a three-wise feature interaction group (i.e., Exp6). We expected those results because MulTi-Wise Sampling internally uses the yasasampling algorithm, which reliably computes samples that achieve full t-wise feature interaction coverage for given t-values. The results for our experiment setups Exp2, Exp3, Exp4, and Exp5 show the same pattern that we have observed in the results of measuring pair-wise feature interaction coverage. To explain this observation, we use the same reasoning as for measuring pair-wise feature interaction coverage.

We answer RQ1 based on our discussions as follows: Changing the percentual assignment of features to feature groups with different values of t influences the achieved coverage, sample time, and the resulting sample size. Assigning more features to t-wise feature interaction groups with higher values of t increases the sample size and the sampling time exponentially. MulTi-Wise Sampling assures that all valid t-wise feature interaction tuples of the different t-wise interaction groups are covered in the resulting sample because it uses yasa sampling algorithm to calculate the intermediate samples for the t-wise feature interaction groups. However, splitting the total number of features between two feature interaction groups reduces the number of feature interaction tuples that are actively considered by MulTi-Wise Sampling, reducing the overall achieved t-wise feature interaction coverage.

RQ2: How does MulTi-Wise Sampling compare to state-of-the-art t-wise sampling algorithms?

Our results of measuring pair-wise and three-wise feature interaction coverage show that Exp1 and Exp2 achieve the same pair-wise and three-wise coverage for our subject systems. We observe the same for the experiment setups Exp6 and Exp7. Therefore, we assume that our application of the yasa sampling algorithm for MulTi-Wise Sampling is equivalent to the original algorithm with respect to computing samples that achieve t-wise feature interaction coverage. Measuring the size of the resulting samples reveals that Exp1 (i.e., yasa) and Exp2 (i.e.,MulTi-Wise Sampling) need the same number of configurations to achieve pair-wise coverage. We observe the same for achieving three-wise feature interaction coverage when comparing the sample size for Exp6 and Exp7. Those results meet our expectations and show that MulTi-Wise Sampling achieves comparable results to the yasa sampling algorithm with respect to sample size. The results of measuring sampling time show that Exp1 (i.e., yasa) and Exp2 (i.e., MulTi-Wise Sampling) need the same time to compute samples that achieve pair-wise feature interaction coverage. We see the same when comparing the sampling time of yasa (i.e., Exp7) and MulTi-Wise Sampling (i.e., Exp6) for computing a sample that achieves three-wise feature interaction coverage. We expected the sampling time for MulTi-Wise Sampling sampling to be higher than the sampling time of yasa because MulTi-Wise Sampling performs preprocessing operations (e.g., assignment of features to feature interaction groups) on all features of the feature model before calling the yasasampling algorithm to compute samples. We argue that the effect of this pre-processing is not visible for our subject systems because they are small enough so that the pre-processing does not influence the sampling time. For larger feature models we expect that the sampling time of MulTi-Wise Sampling is slightly larger than that of the yasa sampling algorithm.

We answer RQ2 based on our discussion as follows: Compared to the state-of-the-art sampling algorithm YASA, MulTi-Wise Sampling achieves equal results with respect to t-wise feature interaction coverage and sample size. Our results even reveal that MulTi-Wise Sampling and yasa have comparable sampling times. However, for large subject systems, we expect a longer sampling time of MulTi-Wise Sampling because of its pre-processing operations. We conclude that MulTi-Wise Sampling may be used as an alternative to yasa when all features of the feature model will be in the same t-wise feature interaction group. However, for larger subject systems an overhead in sampling time is to be expected.

Section 4.5 Threats to Validity

Internal Validity

A threat to internal validity is based on our choice of preprocessed models for our evaluation. The feature models of our subject systems were extracted from kConfig files. There are various procedures to transform a kConfig variability model into a feature model, such as using the tseitin transformation [32]. Recently, Kuiter et al. [32] discovered that the results of analyzing a feature model of a subject system depend on the transformation of the kConfig file into a feature model file. We use preprocessed feature models in our experiments, which were used to show the validity of different feature model analysis procedures [7, 33, 21]. Therefore, we argue that our results align with existing research results, even if the feature models of our subject systems are influenced by the transformation from kConfig models. Another threat to internal validity is that our implementation uses the FeatureIDE library¹²¹²12https://featureide.github.io/ to perform basic feature model analysis tasks. For instance, we use the implementation of the yasasampling algorithm as the base for our MulTi-Wise Sampling. FeatureIDE is widely used in the product-line community, and many existing research tools use the FeatureIDE library as the basis for their experiments [21, 27, 34]. Therefore, we argue that the product-line community accepts the correctness of the FeatureIDE library.

External Validity

A threat to external validity is that we assign features randomly to one of the t-wise feature interaction groups of our experiment setups for our evaluation. This random assignment may influence the measured sample time and sample size. We execute every experiment setup of our evaluation ten times to mitigate the resulting bias. Another threat to external validity is that we consider only five subject systems in our evaluation. We cannot assure that performing our evaluation on other subject systems will reveal the same results. However, the product-line community widely uses the subject systems in our evaluation to evaluate new concepts [7, 21, 16]. Hence, our results align with existing work, even if the choice of subject systems introduces a bias.

Section 5 Related Work

Existing literature presents many approaches to efficiently generate samples that achieve high t-wise feature interaction coverage [35, 36, 37, 29, 13] including sampling algorithms that generate samples that always achieve full t-wise feature interaction coverage [38, 39, 30, 40, 41, 42, 43, 16, 27] and algorithms that do not guarantee full t-wise feature interaction coverage [44, 45, 46, 47, 48, 49]. In the following, we discuss the relation of MulTi-Wise Sampling to sampling algorithms from both categories.

Full T-Wise Feature Interaction Coverage

T-wise feature interaction sampling approaches aim to cover each valid t-wise feature interaction tuple of the configurable system in at least one configuration of the resulting sample while kee** the number of configurations as small as possible [13]. Cohen et al. [39] show that finding a minimal set of configurations that covers all valid t-wise feature interaction tuples of a configurable system is an instance of the set-covering problem. Chvatal’s Algorithm [38] is a procedure to solve the general set-covering problem using a greedy heuristic but building and verifying all possible solutions, as required by Chvatal’s Algorithm [38] is not feasible in practice. Johansen et al. [41] show that Boolean satisfiability solvers (SAT-solvers) make generating samples with Chvatal’s algorithm possible for real-world configurable systems by providing their sampling algorithm ICPL [42]. ICPL introduces removing core and dead features, removing invalid t-wise feature interaction tuples, and parallelization to generate samples faster compared to Chvatal’s algorithms. Over the past decades, many authors proposed various tweaks to adapt Chvatal’s algorithm to reduce the runtime or sample size [29, 13]. Currently, the yasa sampling algorithm proposed by Krieter et al. [16, 27] is the best-performing algorithm with respect to sample size and sampling time. In addition, yasa provides functionalities such as generating samples that achieve t-wise feature interaction coverage for only a subset of features.

Close to Full T-Wise Feature Interaction Coverage

Recently, many sampling algorithms were proposed that trade full t-wise feature interaction coverage against reduced algorithm runtime and comparable sample sizes [49]. Oh et al. [44] present a procedure to generate samples that achieves partial but not full t-wise feature interaction coverage. Their procedure uses a #SAT-solver to compute the number of valid configurations of the configurable system, which makes uniform random sampling in a tractable time possible. The evaluation of Oh et al. [44] shows that their uniform random sampling approach does not achieve full t-wise feature interaction coverage. Recent work [45, 46, 47, 49] identifies that t-wise feature interaction tuples are not uniformly distributed throughout the configuration space because of constraints in the feature model. Therefore, uniform random sampling approaches need guidance to find uncovered t-wise feature interaction tuples with fewer configurations. Baranov et al. [45, 47] propose an adaptive weighted sampling algorithm Baital, which achieves high t-wise feature interaction coverage values. In contrast to typical uniform random sampling algorithms, Baital incrementally adapts the weights of selectable literals based on previously selected configurations, thus counteracting the non-uniform distribution of t-wise feature interaction tuples. Lou et al. [46] present LS-Sampling, a local search-based sampling approach that achieves high t-wise feature interaction coverage values. LS-Sampling iteratively selects configurations from the configuration space using a local search framework and updates the internal probability of selecting certain feature tuples based on the previously chosen configuration so that t-wise feature interaction coverage of the resulting sample can be maximized faster.

Advancement Beyond State-Of-the-Art

None of the existing sampling algorithms considers covering subsets of features with different t-wise feature interaction coverage values to reduce the resulting sample size. MulTi-Wise Sampling fills this gap in the existing research, providing the possibility to sample subsets of features from a configurable system with different t-values. We are also the first to show that the sample size can be reduced by covering only a small subset of features with high t-values while trading off full t-wise feature interaction coverage.

Section 6 Conclusion and Future Work

In this paper, we question the necessity of achieving full t-wise feature interaction coverage for all features equally and present MulTi-Wise Sampling as a novel sampling approach that enables a trade-off between equally achieving full t-wise feature interaction coverage for all features and achieving full t-wise feature interaction coverage for specified feature groups. In practice, sample-based testing for highly configurable systems strives to save testing resources by reducing the number of configurations for testing. However, achieving full t-wise feature interaction coverage for higher values of t still requires many configurations because of the enormous number of t-wise feature interaction tuples that must be covered. As a solution idea, we propose to reduce the number of feature interactions to be covered by defining groups of features for which high t-wise feature interaction coverage will be achieved and groups of features for which t-wise feature interaction coverage is neglectable. We present MulTi-Wise Sampling as a sampling approach that considers those feature groups during sample generation, to enable a trade-off between a large sample that achieves full t-wise feature interaction coverage for all features and smaller samples that do so only for certain groups of features. We evaluate our approach on four subject systems from real-world applications (i.e., BusyBox Fiasco Soletta uCLibc-ng. Our results show that MulTi-Wise Sampling reduces the calculation time and sample size when only a small number of features are assigned to critical feature groups with high t values. As a tradeoff for reduced sample size and sampling time, MulTi-Wise Sampling does not achieve full coverage for all features equally. The performance of MulTi-Wise Sampling is comparable to the state-of-the-art sampling algorithm yasa. Hence, we argue that MulTi-Wise Sampling is an alternative to established sampling approaches if the criticality of features is known.

As future work, we aim to evaluate MulTi-Wise Sampling on more feature models typically used by the community¹³¹³13https://github.com/SoftVarE-Group/feature-model-benchmark to show the feasibility of our approach on a broader set of feature models. We expect high synergies between solution-space sampling [50] and MulTi-Wise Sampling because critical feature groups can be directly identified from attributes of implementation artefacts. Therefore, we aim to connect both research areas in the future.

References

Ammann and Offutt [2016] Paul Ammann and Jeff Offutt. Introduction to software testing. Cambridge University Press, 2016.
McGregor [2010] John McGregor. Testing a Software Product Line. In Testing Techniques in Software Engineering, pages 104–140. 2010.
Duvall et al. [2007] Paul M Duvall, Steve Matyas, and Andrew Glover. Continuous integration: improving software quality and reducing risk. Pearson Education, 2007.
Fowler and Foemmel [2006] Martin Fowler and Matthew Foemmel. Continuous integration, 2006.
Meyer [2014] Mathias Meyer. Continuous integration and its tools. IEEE Software, 31(3):14–16, 2014.
Pohl et al. [2005] Klaus Pohl, Günter Böckle, and Frank J. van der Linden. Software Product Line Engineering: Foundations, Principles and Techniques. 2005.
Pett et al. [2019] Tobias Pett, Thomas Thüm, Tobias Runge, Sebastian Krieter, Malte Lochau, and Ina Schaefer. Product Sampling for Product Lines: The Scalability Challenge. In Proceedings of the 23rd International Systems and Software Product Line Conference - Volume A, SPLC ’19, pages 78–83, New York, NY, USA, September 2019. Association for Computing Machinery. ISBN 9781450371384. doi:10.1145/3336294.3336322. URL https://dl.acm.org/doi/10.1145/3336294.3336322.
Apel et al. [2013] Sven Apel, Don Batory, Christian Kästner, and Gunter Saake. Feature-Oriented Software Product Lines. 2013.
Engström and Runeson [2011] Emelie Engström and Per Runeson. Software Product Line Testing - A Systematic Map** Study. 53:2–13, January 2011. ISSN 0950-5849. doi:http://dx.doi.org/10.1016/j.infsof.2010.05.011.
Lee et al. [2012] Jihyun Lee, Sungwon Kang, and Danhyung Lee. A Survey on Software Product Line Testing. pages 31–40, 2012.
Halin et al. [2019] Axel Halin, Alexandre Nuttinck, Mathieu Acher, Xavier Devroey, Gilles Perrouin, and Benoit Baudry. Test them all, is it worth it? Assessing configuration sampling on the JHipster Web development stack. Empirical Software Engineering, 24(2):674–717, April 2019. ISSN 1573-7616. doi:10.1007/s10664-018-9635-4. URL https://doi.org/10.1007/s10664-018-9635-4.
Medeiros et al. [2015] Flávio Medeiros, Christian Kästner, Márcio Ribeiro, Sarah Nadi, and Rohit Gheyi. The Love/Hate Relationship with the C Preprocessor: An Interview Study. In 29th European Conference on Object-Oriented Programming (ECOOP 2015), pages 495–518. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, 2015.
Varshosaz et al. [2018] Mahsa Varshosaz, Mustafa Al-Hajjaji, Thomas Thüm, Tobias Runge, Mohammad Reza Mousavi, and Ina Schaefer. A Classification of Product Sampling for Software Product Lines. pages 1–13, 2018.
Cohen et al. [2008] Myra B. Cohen, Matthew B. Dwyer, and Jiangfan Shi. Constructing Interaction Test Suites for Highly-Configurable Systems in the Presence of Constraints: A Greedy Approach. 34(5):633–650, 2008.
Marijan et al. [2013] Dusica Marijan, Arnaud Gotlieb, Sagar Sen, and Aymeric Hervieu. Practical Pairwise Testing for Software Product Lines. pages 227–235, 2013.
Krieter et al. [2020] Sebastian Krieter, Thomas Thüm, Sandro Schulze, Gunter Saake, and Thomas Leich. YASA: Yet Another Sampling Algorithm. 2020.
Oh et al. [2017] Jeho Oh, Don Batory, Margaret Myers, and Norbert Siegmund. Finding Near-Optimal Configurations in Product Lines by Random Sampling. pages 61–71, August 2017. doi:10.1145/3106237.3106273.
Munoz et al. [2019] Daniel-Jesus Munoz, Jeho Oh, Mónica Pinto, Lidia Fuentes, and Don Batory. Uniform Random Sampling Product Configurations of Feature Models That Have Numerical Features. pages 289–301, 2019.
Luo et al. [2023] Chuan Luo, Jian** Song, Qiyuan Zhao, Yibei Li, Shaowei Cai, and Chunming Hu. Generating pairwise covering arrays for highly configurable software systems. In Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A, volume A of SPLC ’23, page 261–267, New York, NY, USA, August 2023. Association for Computing Machinery. ISBN 9798400700910. doi:10.1145/3579027.3608998. URL https://doi.org/10.1145/3579027.3608998.
Bombarda et al. [2023] Andrea Bombarda, Silvia Bonfanti, and Angelo Gargantini. On the reuse of existing configurations for testing evolving feature models. In Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume B, SPLC ’23, page 67–76, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9798400700927. doi:10.1145/3579028.3609017. URL https://doi.org/10.1145/3579028.3609017.
Pett et al. [2023] Tobias Pett, Tobias Heß, Sebastian Krieter, Thomas Thüm, and Ina Schaefer. Continuous T-Wise Coverage. In Proceedings of the 27th ACM International Systems and Software Product Line Conference - Volume A, volume A of SPLC ’23, pages 87–98, New York, NY, USA, August 2023. Association for Computing Machinery. ISBN 9798400700910. doi:10.1145/3579027.3608980. URL https://doi.org/10.1145/3579027.3608980.
Al-Hajjaji et al. [2014] Mustafa Al-Hajjaji, Thomas Thüm, Jens Meinicke, Malte Lochau, and Gunter Saake. Similarity-Based Prioritization in Software Product-Line Testing. pages 197–206, September 2014. ISBN 978-1-4503-2740-4. doi:10.1145/2648511.2648532.
Pett et al. [2020] Tobias Pett, Domenik Eichhorn, and Ina Schaefer. Risk-based compatibility analysis in automotive systems engineering. In Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, MODELS ’20, pages 1–10, New York, NY, USA, October 2020. Association for Computing Machinery. ISBN 9781450381352. doi:10.1145/3417990.3421263. URL https://dl.acm.org/doi/10.1145/3417990.3421263.
Lachmann et al. [2017] Remo Lachmann, Simon Beddig, Sascha Lity, Sandro Schulze, and Ina Schaefer. Risk-based integration testing of software product lines. In Proceedings of the 11th International Workshop on Variability Modelling of Software-Intensive Systems, VaMoS ’17, pages 52–59, New York, NY, USA, February 2017. Association for Computing Machinery. ISBN 9781450348119. doi:10.1145/3023956.3023958. URL https://dl.acm.org/doi/10.1145/3023956.3023958.
Amland [2000] Ståle Amland. Risk-based testing:: Risk analysis fundamentals and metrics for software testing including a financial application case study. Journal of Systems and Software, 53(3):287–295, September 2000. ISSN 0164-1212. doi:10.1016/S0164-1212(00)00019-4. URL https://www.sciencedirect.com/science/article/pii/S0164121200000194.
Michalik and Weyns [2011] Bartosz Michalik and Danny Weyns. Towards a Solution for Change Impact Analysis of Software Product Line Products. In 2011 Ninth Working IEEE/IFIP Conference on Software Architecture, pages 290–293, June 2011. doi:10.1109/WICSA.2011.45. URL https://ieeexplore.ieee.org/document/5959729.
Krieter [2020] Sebastian Krieter. Large-scale T-wise interaction sampling using YASA. In Proceedings of the 24th ACM Conference on Systems and Software Product Line: Volume A - Volume A, SPLC ’20, pages 1–4, New York, NY, USA, October 2020. Association for Computing Machinery. ISBN 9781450375696. doi:10.1145/3382025.3414989. URL https://dl.acm.org/doi/10.1145/3382025.3414989.
Clements and Northrop [2001] Paul Clements and Linda Northrop. Software Product Lines: Practices and Patterns. 2001.
Medeiros et al. [2016] Flávio Medeiros, Christian Kästner, Márcio Ribeiro, Rohit Gheyi, and Sven Apel. A comparison of 10 sampling algorithms for configurable systems. In Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pages 643–654, New York, NY, USA, May 2016. Association for Computing Machinery. ISBN 9781450339001. doi:10.1145/2884781.2884793. URL https://dl.acm.org/doi/10.1145/2884781.2884793.
Cohen et al. [2007] Myra B. Cohen, Matthew B. Dwyer, and Jiangfan Shi. Interaction testing of highly-configurable systems in the presence of constraints. In Proceedings of the 2007 international symposium on Software testing and analysis, ISSTA ’07, pages 129–139, New York, NY, USA, July 2007. Association for Computing Machinery. ISBN 9781595937346. doi:10.1145/1273463.1273482. URL https://dl.acm.org/doi/10.1145/1273463.1273482.
Johansen et al. [2012a] Martin Fagereng Johansen, Øystein Haugen, Franck Fleurey, Anne Grete Eldegard, and Torbjørn Syversen. Generating Better Partial Covering Arrays by Modeling Weights on Sub-Product Lines. pages 269–284. 2012a.
Kuiter et al. [2023] Elias Kuiter, Sebastian Krieter, Chico Sundermann, Thomas Thüm, and Gunter Saake. Tseitin or not tseitin? the impact of cnf transformations on feature-model analyses. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, ASE ’22, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9781450394758. doi:10.1145/3551349.3556938. URL https://doi.org/10.1145/3551349.3556938.
Pett et al. [2021] Tobias Pett, Sebastian Krieter, Tobias Runge, Thomas Thüm, Malte Lochau, and Ina Schaefer. Stability of Product-Line Samplingin Continuous Integration. In Proceedings of the 15th International Working Conference on Variability Modelling of Software-Intensive Systems, VaMoS ’21, pages 1–9, New York, NY, USA, February 2021. Association for Computing Machinery. ISBN 9781450388245. doi:10.1145/3442391.3442410. URL https://dl.acm.org/doi/10.1145/3442391.3442410.
Hentze et al. [2022a] Marc Hentze, Chico Sundermann, Thomas Thüm, and Ina Schaefer. Quantifying the variability mismatch between problem and solution space. In Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems, MODELS ’22, pages 322–333, New York, NY, USA, October 2022a. Association for Computing Machinery. ISBN 9781450394666. doi:10.1145/3550355.3552411. URL https://dl.acm.org/doi/10.1145/3550355.3552411.
Kuhn et al. [2013] D Richard Kuhn, Raghu N Kacker, and Yu Lei. Introduction to combinatorial testing. CRC press, 2013.
do Carmo Machado et al. [2014] Ivan do Carmo Machado, John D. McGregor, Yguaratã Cerqueira Cavalcanti, and Eduardo Santana de Almeida. On strategies for testing software product lines: A systematic literature review. Information and Software Technology, 56(10):1183–1199, 2014. ISSN 0950-5849. doi:https://doi.org/10.1016/j.infsof.2014.04.002. URL https://www.sciencedirect.com/science/article/pii/S0950584914000834.
Lopez-Herrejon et al. [2015] Roberto E. Lopez-Herrejon, Stefan Fischer, Rudolf Ramler, and Aalexander Egyed. A first systematic map** study on combinatorial interaction testing for software product lines. In 2015 IEEE Eighth International Conference on Software Testing, Verification and Validation Workshops (ICSTW), pages 1–10, April 2015. doi:10.1109/ICSTW.2015.7107435. URL https://ieeexplore.ieee.org/document/7107435.
Chvatal [1979] Vasek Chvatal. A Greedy Heuristic for the Set-Covering Problem. Mathematics of operations research, 4(3):233–235, August 1979. ISSN 0364-765X. doi:10.1287/moor.4.3.233. URL https://pubsonline.informs.org/doi/abs/10.1287/moor.4.3.233.
Cohen et al. [2006] Myra B. Cohen, Matthew B. Dwyer, and Jiangfan Shi. Coverage and adequacy in software product line testing. In Proceedings of the ISSTA 2006 workshop on Role of software architecture for testing and analysis, ROSATEA ’06, pages 53–63, New York, NY, USA, July 2006. Association for Computing Machinery. ISBN 9781595934598. doi:10.1145/1147249.1147257. URL https://dl.acm.org/doi/10.1145/1147249.1147257.
Perrouin et al. [2010] Gilles Perrouin, Sagar Sen, Jacques Klein, Benoit Baudry, and Yves le Traon. Automated and Scalable T-wise Test Case Generation Strategies for Software Product Lines. In Verification and Validation 2010 Third International Conference on Software Testing, pages 459–468, April 2010. doi:10.1109/ICST.2010.43. URL https://ieeexplore.ieee.org/document/5477055. ISSN: 2159-4848.
Johansen et al. [2011] Martin Fagereng Johansen, Øystein Haugen, and Franck Fleurey. Properties of Realistic Feature Models Make Combinatorial Testing of Product Lines Feasible. In Jon Whittle, Tony Clark, and Thomas Kühne, editors, Model Driven Engineering Languages and Systems, pages 638–652, Berlin, Heidelberg, 2011. Springer. ISBN 9783642244858. doi:10.1007/978-3-642-24485-8_47.
Johansen et al. [2012b] Martin Fagereng Johansen, Øystein Haugen, and Franck Fleurey. An algorithm for generating t-wise covering arrays from large feature models. In Proceedings of the 16th International Software Product Line Conference - Volume 1, SPLC ’12, pages 46–55, New York, NY, USA, September 2012b. Association for Computing Machinery. ISBN 9781450310949. doi:10.1145/2362536.2362547. URL https://dl.acm.org/doi/10.1145/2362536.2362547.
Al-Hajjaji et al. [2016] Mustafa Al-Hajjaji, Sebastian Krieter, Thomas Thüm, Malte Lochau, and Gunter Saake. IncLing: Efficient Product-line Testing Using Incremental Pairwise Sampling. In Proceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2016, pages 144–155, New York, NY, USA, October 2016. ISBN 9781450344463. doi:10.1145/2993236.2993253. URL https://dl.acm.org/doi/10.1145/2993236.2993253.
Oh et al. [2019] Jeho Oh, Paul Gazzillo, and Don Batory. t-wise Coverage by Uniform Sampling. In Proceedings of the 23rd International Systems and Software Product Line Conference - Volume A, SPLC ’19, pages 84–87, New York, NY, USA, September 2019. Association for Computing Machinery. ISBN 9781450371384. doi:10.1145/3336294.3342359. URL https://dl.acm.org/doi/10.1145/3336294.3342359.
Baranov et al. [2020] Eduard Baranov, Axel Legay, and Kuldeep S. Meel. Baital: An Adaptive Weighted Sampling Approach for Improved t-Wise Coverage, page 1114–1126. Association for Computing Machinery, New York, NY, USA, 2020. ISBN 9781450370431. URL https://doi.org/10.1145/3368089.3409744.
Luo et al. [2021] Chuan Luo, Binqi Sun, Bo Qiao, Junjie Chen, Hongyu Zhang, **kun Lin, Qingwei Lin, and Dongmei Zhang. LS-sampling: an effective local search based sampling approach for achieving high t-wise coverage. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021, pages 1081–1092, New York, NY, USA, August 2021. Association for Computing Machinery. ISBN 9781450385626. doi:10.1145/3468264.3468622. URL https://dl.acm.org/doi/10.1145/3468264.3468622.
Baranov and Legay [2022] Eduard Baranov and Axel Legay. Baital: an adaptive weighted sampling platform for configurable systems. In Proceedings of the 26th ACM International Systems and Software Product Line Conference - Volume B, volume B of SPLC ’22, pages 46–49, New York, NY, USA, September 2022. Association for Computing Machinery. ISBN 9781450392068. doi:10.1145/3503229.3547030. URL https://dl.acm.org/doi/10.1145/3503229.3547030.
Heradio et al. [2022] Ruben Heradio, David Fernandez-Amoros, José A. Galindo, David Benavides, and Don Batory. Uniform and scalable sampling of highly configurable systems. Empirical Software Engineering, 27(2):44, January 2022. ISSN 1573-7616. doi:10.1007/s10664-021-10102-5. URL https://doi.org/10.1007/s10664-021-10102-5.
Heß et al. [2024] Tobias Heß, Tim Jannik Schmidt, Lukas Ostheimer, Sebastian Krieter, and Thomas Thüm. UnWise: High T-Wise Coverage from Uniform Sampling. In Proceedings of the 18th International Working Conference on Variability Modelling of Software-Intensive Systems, VaMoS ’24, pages 37–45, New York, NY, USA, February 2024. Association for Computing Machinery. ISBN 9798400708770. doi:10.1145/3634713.3634716. URL https://dl.acm.org/doi/10.1145/3634713.3634716.
Hentze et al. [2022b] Marc Hentze, Tobias Pett, Chico Sundermann, Sebastian Krieter, Thomas Thüm, and Ina Schaefer. Generic Solution-Space Sampling for Multi-domain Product Lines. In Proceedings of the 21st ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences, GPCE 2022, pages 135–147, New York, NY, USA, December 2022b. Association for Computing Machinery. ISBN 9781450399203. doi:10.1145/3564719.3568695. URL https://dl.acm.org/doi/10.1145/3564719.3568695.