-
$k$-Universality of Regular Languages
Authors:
Duncan Adamson,
Pamela Fleischmann,
Annika Huch,
Tore Koß,
Florin Manea,
Dirk Nowotka
Abstract:
A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \dots w[i_{k}]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \lvert w\rvert$. A word $w$ is $k$-subsequence universal over an alphabet $Σ$ if every word in $Σ^k$ appears in $w$ as a subsequence. In this paper, we study the intersection between the set of $k$-subsequence universal words over some alphabet $Σ$ an…
▽ More
A subsequence of a word $w$ is a word $u$ such that $u = w[i_1] w[i_2] \dots w[i_{k}]$, for some set of indices $1 \leq i_1 < i_2 < \dots < i_k \leq \lvert w\rvert$. A word $w$ is $k$-subsequence universal over an alphabet $Σ$ if every word in $Σ^k$ appears in $w$ as a subsequence. In this paper, we study the intersection between the set of $k$-subsequence universal words over some alphabet $Σ$ and regular languages over $Σ$. We call a regular language $L$ \emph{$k$-$\exists$-subsequence universal} if there exists a $k$-subsequence universal word in $L$, and \emph{$k$-$\forall$-subsequence universal} if every word of $L$ is $k$-subsequence universal. We give algorithms solving the problems of deciding if a given regular language, represented by a finite automaton recognising it, is \emph{$k$-$\exists$-subsequence universal} and, respectively, if it is \emph{$k$-$\forall$-subsequence universal}, for a given $k$. The algorithms are FPT w.r.t.~the size of the input alphabet, and their run-time does not depend on $k$; they run in polynomial time in the number $n$ of states of the input automaton when the size of the input alphabet is $O(\log n)$. Moreover, we show that the problem of deciding if a given regular language is \emph{$k$-$\exists$-subsequence universal} is NP-complete, when the language is over a large alphabet. Further, we provide algorithms for counting the number of $k$-subsequence universal words (paths) accepted by a given deterministic (respectively, nondeterministic) finite automaton, and ranking an input word (path) within the set of $k$-subsequence universal words accepted by a given finite automaton.
△ Less
Submitted 17 November, 2023;
originally announced November 2023.
-
Matching Patterns with Variables Under Simon's Congruence
Authors:
Pamela Fleischmann,
Sungmin Kim,
Tore Koß,
Florin Manea,
Dirk Nowotka,
Stefan Siemer,
Max Wiedenhöft
Abstract:
We introduce and investigate a series of matching problems for patterns with variables under Simon's congruence. Our results provide a thorough picture of these problems' computational complexity.
We introduce and investigate a series of matching problems for patterns with variables under Simon's congruence. Our results provide a thorough picture of these problems' computational complexity.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
$α$-$β$-Factorization and the Binary Case of Simon's Congruence
Authors:
Pamela Fleischmann,
Jonas Höfer,
Annika Huch,
Dirk Nowotka
Abstract:
In 1991 Hébrard introduced a factorization of words that turned out to be a powerful tool for the investigation of a word's scattered factors (also known as (scattered) subwords or subsequences). Based on this, first Karandikar and Schnoebelen introduced the notion of $k$-richness and later on Barker et al. the notion of $k$-universality. In 2022 Fleischmann et al. presented a generalization of th…
▽ More
In 1991 Hébrard introduced a factorization of words that turned out to be a powerful tool for the investigation of a word's scattered factors (also known as (scattered) subwords or subsequences). Based on this, first Karandikar and Schnoebelen introduced the notion of $k$-richness and later on Barker et al. the notion of $k$-universality. In 2022 Fleischmann et al. presented a generalization of the arch factorization by intersecting the arch factorization of a word and its reverse. While the authors merely used this factorization for the investigation of shortest absent scattered factors, in this work we investigate this new $α$-$β$-factorization as such. We characterize the famous Simon congruence of $k$-universal words in terms of $1$-universal words. Moreover, we apply these results to binary words. In this special case, we obtain a full characterization of the classes and calculate the index of the congruence. Lastly, we start investigating the ternary case, present a full list of possibilities for $αβα$-factors, and characterize their congruence.
△ Less
Submitted 11 September, 2023; v1 submitted 25 June, 2023;
originally announced June 2023.
-
Decision Problems on Copying and Shuffling
Authors:
Vesa Halava,
Tero Harju,
Dirk Nowotka,
Esa Sahla
Abstract:
We study decision problems of the form: given a regular or linear context-free language $L$, is there a word of a given fixed form in $L$, where given fixed forms are based on word operations copy, marked copy, shuffle and their combinations.
We study decision problems of the form: given a regular or linear context-free language $L$, is there a word of a given fixed form in $L$, where given fixed forms are based on word operations copy, marked copy, shuffle and their combinations.
△ Less
Submitted 26 September, 2023; v1 submitted 13 February, 2023;
originally announced February 2023.
-
A Generic Information Extraction System for String Constraints
Authors:
Joel D. Day,
Adrian Kröger,
Mitja Kulczynski,
Florin Manea,
Dirk Nowotka,
Danny Bøgsted Poulsen
Abstract:
String constraint solving, and the underlying theory of word equations, are highly interesting research topics both for practitioners and theoreticians working in the wide area of satisfiability modulo theories. As string constraint solving algorithms, a.k.a. string solvers, gained a more prominent role in the formal analysis of string-heavy programs, especially in connection to symbolic code exec…
▽ More
String constraint solving, and the underlying theory of word equations, are highly interesting research topics both for practitioners and theoreticians working in the wide area of satisfiability modulo theories. As string constraint solving algorithms, a.k.a. string solvers, gained a more prominent role in the formal analysis of string-heavy programs, especially in connection to symbolic code execution and security protocol verification, we can witness an ever-growing number of benchmarks collecting string solving instances from real-world applications as well as an ever-growing need for more efficient and reliable solvers, especially for the aforementioned real-world instances. Thus, it seems that the string solving area (and the developers, theoreticians, and end-users active in it) could greatly benefit from a better understanding and processing of the existing string solving benchmarks. In this context, we propose SMTQUERY: an SMT-LIB benchmark analysis tool for string constraints. SMTQUERY is implemented in Python 3, and offers a collection of analysis and information extraction tools for a comprehensive data base of string benchmarks (presented in SMT-LIB format), based on an SQL-centred language called QLANG.
△ Less
Submitted 18 August, 2022;
originally announced August 2022.
-
On the Self Shuffle Language
Authors:
Pamela Fleischmann,
Tero Harju,
Lukas Haschke,
Jonas Höfer,
Dirk Nowotka
Abstract:
The shuffle product \(u\shuffle v\) of two words \(u\) and \(v\) is the set of all words which can be obtained by interleaving \(u\) and \(v\). Motivated by the paper \emph{The Shuffle Product: New Research Directions} by Restivo (2015) we investigate a special case of the shuffle product. In this work we consider the shuffle of a word with itself called the \emph{self shuffle} or \emph{shuffle sq…
▽ More
The shuffle product \(u\shuffle v\) of two words \(u\) and \(v\) is the set of all words which can be obtained by interleaving \(u\) and \(v\). Motivated by the paper \emph{The Shuffle Product: New Research Directions} by Restivo (2015) we investigate a special case of the shuffle product. In this work we consider the shuffle of a word with itself called the \emph{self shuffle} or \emph{shuffle square}, showing first that the self shuffle language and the shuffle of the language are in general different sets. We prove that the language of all words arising as a self shuffle of some word is context sensitive but not context free. Furthermore, we show that the self shuffle \(w \shuffle w\) uniquely determines \(w\).
△ Less
Submitted 2 March, 2022; v1 submitted 16 February, 2022;
originally announced February 2022.
-
m-Nearly k-Universal Words -- Investigating Simon Congruence
Authors:
Pamela Fleischmann,
Lukas Haschke,
Annika Huch,
Annika Mayrock,
Dirk Nowotka
Abstract:
Determining the index of the Simon congruence is a long outstanding open problem. Two words $u$ and $v$ are called Simon congruent if they have the same set of scattered factors, which are parts of the word in the correct order but not necessarily consecutive, e.g., $\mathtt{oath}$ is a scattered factor of $\mathtt{logarithm}$. Following the idea of scattered factor $k$-universality, we investigat…
▽ More
Determining the index of the Simon congruence is a long outstanding open problem. Two words $u$ and $v$ are called Simon congruent if they have the same set of scattered factors, which are parts of the word in the correct order but not necessarily consecutive, e.g., $\mathtt{oath}$ is a scattered factor of $\mathtt{logarithm}$. Following the idea of scattered factor $k$-universality, we investigate $m$-nearly $k$-universality, i.e., words where $m$ scattered factors of length $k$ are absent, w.r.t. Simon congruence. We present a full characterisation as well as the index of the congruence for $m=1$. For $m\neq 1$, we show some results if in addition $w$ is $(k-1)$-universal as well as some further insights for different $m$.
△ Less
Submitted 16 February, 2022;
originally announced February 2022.
-
Analysis of Source Code Using UPPAAL
Authors:
Mitja Kulczynski,
Axel Legay,
Dirk Nowotka,
Danny Bøgsted Poulsen
Abstract:
In recent years there has been a considerable effort in optimising formal methods for application to code. This has been driven by tools such as CPAChecker, DIVINE, and CBMC. At the same time tools such as Uppaal have been massively expanding the realm of more traditional model checking technologies to include strategy synthesis algorithms - an aspect becoming more and more needed as software beco…
▽ More
In recent years there has been a considerable effort in optimising formal methods for application to code. This has been driven by tools such as CPAChecker, DIVINE, and CBMC. At the same time tools such as Uppaal have been massively expanding the realm of more traditional model checking technologies to include strategy synthesis algorithms - an aspect becoming more and more needed as software becomes increasingly parallel. Instead of reimplementing the advances made by Uppaal in this area, we suggest in this paper to develop a bridge between the source code and the engine of Uppaal. Our approach uses the widespread intermediate language LLVM and makes recent advances of the Uppaal ecosystem readily available to analysis of source code.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Responsible and Regulatory Conform Machine Learning for Medicine: A Survey of Challenges and Solutions
Authors:
Eike Petersen,
Yannik Potdevin,
Esfandiar Mohammadi,
Stephan Zidowitz,
Sabrina Breyer,
Dirk Nowotka,
Sandra Henn,
Ludwig Pechmann,
Martin Leucker,
Philipp Rostalski,
Christian Herzog
Abstract:
Machine learning is expected to fuel significant improvements in medical care. To ensure that fundamental principles such as beneficence, respect for human autonomy, prevention of harm, justice, privacy, and transparency are respected, medical machine learning systems must be developed responsibly. Many high-level declarations of ethical principles have been put forth for this purpose, but there i…
▽ More
Machine learning is expected to fuel significant improvements in medical care. To ensure that fundamental principles such as beneficence, respect for human autonomy, prevention of harm, justice, privacy, and transparency are respected, medical machine learning systems must be developed responsibly. Many high-level declarations of ethical principles have been put forth for this purpose, but there is a severe lack of technical guidelines explicating the practical consequences for medical machine learning. Similarly, there is currently considerable uncertainty regarding the exact regulatory requirements placed upon medical machine learning systems. This survey provides an overview of the technical and procedural challenges involved in creating medical machine learning systems responsibly and in conformity with existing regulations, as well as possible solutions to address these challenges. First, a brief review of existing regulations affecting medical machine learning is provided, showing that properties such as safety, robustness, reliability, privacy, security, transparency, explainability, and nondiscrimination are all demanded already by existing law and regulations - albeit, in many cases, to an uncertain degree. Next, the key technical obstacles to achieving these desirable properties are discussed, as well as important techniques to overcome these obstacles in the medical context. We notice that distribution shift, spurious correlations, model underspecification, uncertainty quantification, and data scarcity represent severe challenges in the medical context. Promising solution approaches include the use of large and representative datasets and federated learning as a means to that end, the careful exploitation of domain knowledge, the use of inherently transparent models, comprehensive out-of-distribution model testing and verification, as well as algorithmic impact assessments.
△ Less
Submitted 9 June, 2022; v1 submitted 20 July, 2021;
originally announced July 2021.
-
The Show Must Go On -- Examination During a Pandemic
Authors:
Pamela Fleischmann,
Mitja Kulczynski,
Dirk Nowotka
Abstract:
When unexpected incidents occur, new innovative and flexible solutions are required. If this event is something such radical and dramatic like the COVID-19 pandemic, these solutions must aim to guarantee as much normality as possible while protecting lives. After a moment of shock our university decided that the students have to be able to pursue their studies for guaranteeing a degree in the expe…
▽ More
When unexpected incidents occur, new innovative and flexible solutions are required. If this event is something such radical and dramatic like the COVID-19 pandemic, these solutions must aim to guarantee as much normality as possible while protecting lives. After a moment of shock our university decided that the students have to be able to pursue their studies for guaranteeing a degree in the expected time since most of them faced immediate financial problems due to the loss of their student jobs. This implied, for us as teachers, that we had to reorganise not only the teaching methods from nearly one day to the next, but we also had to come up with an adjusted way of examinations which had to take place in person with pen and paper under strict hygiene rules. On the other hand the correction should avoid personal contacts. We developed a framework which allowed us to correct the digitalised exams safely at home while providing the high standards given by the general data protection regulation of our country. Moreover, the time spent in the offices could be reduced to a minimum thanks to automatically generated exam sheets, automatically re-digitalised and sorted worked-on exams.
△ Less
Submitted 25 June, 2021;
originally announced July 2021.
-
String Theories involving Regular Membership Predicates: From Practice to Theory and Back
Authors:
Murphy Berzish,
Joel D. Day,
Vijay Ganesh,
Mitja Kulczynski,
Florin Manea,
Federico Mora,
Dirk Nowotka
Abstract:
Widespread use of string solvers in formal analysis of string-heavy programs has led to a growing demand for more efficient and reliable techniques which can be applied in this context, especially for real-world cases. Designing an algorithm for the (generally undecidable) satisfiability problem for systems of string constraints requires a thorough understanding of the structure of constraints pre…
▽ More
Widespread use of string solvers in formal analysis of string-heavy programs has led to a growing demand for more efficient and reliable techniques which can be applied in this context, especially for real-world cases. Designing an algorithm for the (generally undecidable) satisfiability problem for systems of string constraints requires a thorough understanding of the structure of constraints present in the targeted cases. In this paper, we investigate benchmarks presented in the literature containing regular expression membership predicates, extract different first order logic theories, and prove their decidability, resp. undecidability. Notably, the most common theories in real-world benchmarks are PSPACE-complete and directly lead to the implementation of a more efficient algorithm to solving string constraints.
△ Less
Submitted 15 May, 2021;
originally announced May 2021.
-
Scattered Factor Universality -- The Power of the Remainder
Authors:
Pamela Fleischmann,
Sebastian Bernhard Germann,
Dirk Nowotka
Abstract:
Scattered factor (circular) universality was firstly introduced by Barker et al. in 2020. A word $w$ is called $k$-universal for some natural number $k$, if every word of length $k$ of $w$'s alphabet occurs as a scattered factor in $w$; it is called circular $k$-universal if a conjugate of $w$ is $k$-universal. Here, a word $u=u_1\cdots u_n$ is called a scattered factor of $w$ if $u$ is obtained f…
▽ More
Scattered factor (circular) universality was firstly introduced by Barker et al. in 2020. A word $w$ is called $k$-universal for some natural number $k$, if every word of length $k$ of $w$'s alphabet occurs as a scattered factor in $w$; it is called circular $k$-universal if a conjugate of $w$ is $k$-universal. Here, a word $u=u_1\cdots u_n$ is called a scattered factor of $w$ if $u$ is obtained from $w$ by deleting parts of $w$, i.e. there exists (possibly empty) words $v_1,\dots,v_{n+1}$ with $w=v_1u_1v_2\cdots v_nu_nv_{n+1}$. In this work, we prove two problems, left open in the aforementioned paper, namely a generalisation of one of their main theorems to arbitrary alphabets and a slight modification of another theorem such that we characterise the circular universality by the universality. On the way, we present deep insights into the behaviour of the remainder of the so called arch factorisation by Hebrard when repetitions of words are considered.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
An SMT Solver for Regular Expressions and Linear Arithmetic over String Length
Authors:
Murphy Berzish,
Mitja Kulczynski,
Federico Mora,
Florin Manea,
Joel D. Day,
Dirk Nowotka,
Vijay Ganesh
Abstract:
We present a novel length-aware solving algorithm for the quantifier-free first-order theory over regex membership predicate and linear arithmetic over string length. We implement and evaluate this algorithm and related heuristics in the Z3 theorem prover. A crucial insight that underpins our algorithm is that real-world instances contain a wealth of information about upper and lower bounds on len…
▽ More
We present a novel length-aware solving algorithm for the quantifier-free first-order theory over regex membership predicate and linear arithmetic over string length. We implement and evaluate this algorithm and related heuristics in the Z3 theorem prover. A crucial insight that underpins our algorithm is that real-world instances contain a wealth of information about upper and lower bounds on lengths of strings under constraints, and such information can be used very effectively to simplify operations on automata representing regular expressions. Additionally, we present a number of novel general heuristics, such as the prefix/suffix method, that can be used in conjunction with a variety of regex solving algorithms, making them more efficient. We showcase the power of our algorithm and heuristics via an extensive empirical evaluation over a large and diverse benchmark of 57256 regex-heavy instances, almost 75% of which are derived from industrial applications or contributed by other solver developers. Our solver outperforms five other state-of-the-art string solvers, namely, CVC4, OSTRICH, Z3seq, Z3str3, and Z3-Trau, over this benchmark, in particular achieving a 2.4x speedup over CVC4, 4.4x speedup over Z3seq, 6.4x speedup over Z3-Trau, 9.1x speedup over Z3str3, and 13x speedup over OSTRICH.
△ Less
Submitted 7 May, 2021; v1 submitted 14 October, 2020;
originally announced October 2020.
-
Blocksequences of k-local Words
Authors:
Pamela Fleischmann,
Lukas Haschke,
Florin Manea,
Dirk Nowotka,
Cedric Tsatia Tsida,
Judith Wiedenbeck
Abstract:
The locality of words is a relatively young structural complexity measure, introduced by Day et al. in 2017 in order to define classes of patterns with variables which can be matched in polynomial time. The main tool used to compute the locality of a word is called marking sequence: an ordering of the distinct letters occurring in the respective order. Once a marking sequence is defined, the lette…
▽ More
The locality of words is a relatively young structural complexity measure, introduced by Day et al. in 2017 in order to define classes of patterns with variables which can be matched in polynomial time. The main tool used to compute the locality of a word is called marking sequence: an ordering of the distinct letters occurring in the respective order. Once a marking sequence is defined, the letters of the word are marked in steps: in the ith marking step, all occurrences of the ith letter of the marking sequence are marked. As such, after each marking step, the word can be seen as a sequence of blocks of marked letters separated by blocks of non-marked letters. By kee** track of the evolution of the marked blocks of the word through the marking defined by a marking sequence, one defines the blocksequence of the respective marking sequence. We first show that the words sharing the same blocksequence are only loosely connected, so we consider the stronger notion of extended blocksequence, which stores additional information on the form of each single marked block. In this context, we present a series of combinatorial results for words sharing the extended blocksequence.
△ Less
Submitted 17 August, 2020; v1 submitted 8 August, 2020;
originally announced August 2020.
-
Automatic Verification of LLVM Code
Authors:
Axel Legay,
Dirk Nowotka,
Danny Bøgsted Poulsen
Abstract:
In this work we present our work in develo** a software verification tool for llvm-code - Lodin - that incorporates both explicit-state model checking, statistical model checking and symbolic state model checking algorithms.
In this work we present our work in develo** a software verification tool for llvm-code - Lodin - that incorporates both explicit-state model checking, statistical model checking and symbolic state model checking algorithms.
△ Less
Submitted 4 June, 2020;
originally announced June 2020.
-
Weighted Prefix Normal Words: Mind the Gap
Authors:
Yannik Eikmeier,
Pamela Fleischmann,
Mitja Kulczynski,
Dirk Nowotka
Abstract:
A prefix normal word is a binary word whose prefixes contain at least as many 1s as any of its factors of the same length. Introduced by Fici and Lipták in 2011 the notion of prefix normality is so far only defined for words over the binary alphabet. In this work we investigate a generalisation for finite words over arbitrary finite alphabets, namely weighted prefix normality. We prove that weight…
▽ More
A prefix normal word is a binary word whose prefixes contain at least as many 1s as any of its factors of the same length. Introduced by Fici and Lipták in 2011 the notion of prefix normality is so far only defined for words over the binary alphabet. In this work we investigate a generalisation for finite words over arbitrary finite alphabets, namely weighted prefix normality. We prove that weighted prefix normality is more expressive than binary prefix normality. Furthermore, we investigate the existence of a weighted prefix normal form since weighted prefix normality comes with several new peculiarities that did not already occur in the binary case. We characterise these issues and finally present a standard technique to obtain a generalised prefix normal form for all words overarbitrary, finite alphabets.
△ Less
Submitted 19 April, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Estimating End-to-End Latencies in Automotive Cyber-physical Systems
Authors:
Max J. Friese,
Dirk Nowotka
Abstract:
Controller networks in today's automotive systems consist of more than 100 ECUs connected by various bus protocols. Seamless operation of the entire system requires a well-orchestrated interaction of these ECUs. Consequently, to ensure safety and comfort, a performance analysis is an inherent part of the engineering process. Conducting such an analysis manually is expensive, slow, and error pr…
▽ More
Controller networks in today's automotive systems consist of more than 100 ECUs connected by various bus protocols. Seamless operation of the entire system requires a well-orchestrated interaction of these ECUs. Consequently, to ensure safety and comfort, a performance analysis is an inherent part of the engineering process. Conducting such an analysis manually is expensive, slow, and error prone. Tool support is therefore crucial, and a number of approaches have been presented. However, most work is limited to either network latencies or software latencies which results in an analysis gap at the transition between different layers of the communication stack. The work presented here introduces an approach to close this gap. Furthermore, we discuss the integration of different methods to obtain an end-to-end latency analysis.
△ Less
Submitted 28 April, 2020;
originally announced April 2020.
-
Scattered Factor-Universality of Words
Authors:
Laura Barker,
Pamela Fleischmann,
Katharina Harwardt,
Florin Manea,
Dirk Nowotka
Abstract:
A word $u=u_1\dots u_n$ is a scattered factor of a word $w$ if $u$ can be obtained from $w$ by deleting some of its letters: there exist the (potentially empty) words $v_0,v_1,..,v_n$ such that $w = v_0u_1v_1...u_nv_n$. The set of all scattered factors up to length $k$ of a word is called its full $k$-spectrum. Firstly, we show an algorithm deciding whether the $k$-spectra for given $k$ of two wor…
▽ More
A word $u=u_1\dots u_n$ is a scattered factor of a word $w$ if $u$ can be obtained from $w$ by deleting some of its letters: there exist the (potentially empty) words $v_0,v_1,..,v_n$ such that $w = v_0u_1v_1...u_nv_n$. The set of all scattered factors up to length $k$ of a word is called its full $k$-spectrum. Firstly, we show an algorithm deciding whether the $k$-spectra for given $k$ of two words are equal or not, running in optimal time. Secondly, we consider a notion of scattered-factors universality: the word $w$, with $\letters(w)=Σ$, is called $k$-universal if its $k$-spectrum includes all words of length $k$ over the alphabet $Σ$; we extend this notion to $k$-circular universality. After a series of preliminary combinatorial results, we present an algorithm computing, for a given $k'$-universal word $w$ the minimal $i$ such that $w^i$ is $k$-universal for some $k>k'$. Several other connected problems~are~also~considered.
△ Less
Submitted 10 March, 2020;
originally announced March 2020.
-
Reconstructing Words from Right-Bounded-Block Words
Authors:
Pamela Fleischmann,
Marie Lejeune,
Florin Manea,
Dirk Nowotka,
Michel Rigo
Abstract:
A reconstruction problem of words from scattered factors asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word $w \in \{a, b\}^{*}$ can be reconstructed from the number of occurrences of at most $\min(|w|_a, |w|_b)+ 1$ scattered factors o…
▽ More
A reconstruction problem of words from scattered factors asks for the minimal information, like multisets of scattered factors of a given length or the number of occurrences of scattered factors from a given set, necessary to uniquely determine a word. We show that a word $w \in \{a, b\}^{*}$ can be reconstructed from the number of occurrences of at most $\min(|w|_a, |w|_b)+ 1$ scattered factors of the form $a^{i} b$. Moreover, we generalize the result to alphabets of the form $\{1,\ldots,q\}$ by showing that at most $ \sum^{q-1}_{i=1} |w|_i (q-i+1)$ scattered factors suffices to reconstruct $w$. Both results improve on the upper bounds known so far. Complexity time bounds on reconstruction algorithms are also considered here.
△ Less
Submitted 16 March, 2020; v1 submitted 30 January, 2020;
originally announced January 2020.
-
An Empirical Investigation of Randomized Defenses against Adversarial Attacks
Authors:
Yannik Potdevin,
Dirk Nowotka,
Vijay Ganesh
Abstract:
In recent years, Deep Neural Networks (DNNs) have had a dramatic impact on a variety of problems that were long considered very difficult, e. g., image classification and automatic language translation to name just a few. The accuracy of modern DNNs in classification tasks is remarkable indeed. At the same time, attackers have devised powerful methods to construct specially-crafted malicious input…
▽ More
In recent years, Deep Neural Networks (DNNs) have had a dramatic impact on a variety of problems that were long considered very difficult, e. g., image classification and automatic language translation to name just a few. The accuracy of modern DNNs in classification tasks is remarkable indeed. At the same time, attackers have devised powerful methods to construct specially-crafted malicious inputs (often referred to as adversarial examples) that can trick DNNs into mis-classifying them. What is worse is that despite the many defense mechanisms proposed to protect DNNs against adversarial attacks, attackers are often able to circumvent these defenses, rendering them useless. This state of affairs is extremely worrying, especially since machine learning systems get adopted at scale.
In this paper, we propose a scientific evaluation methodology aimed at assessing the quality, efficacy, robustness and efficiency of randomized defenses to protect DNNs against adversarial examples. Using this methodology, we evaluate a variety of defense mechanisms. In addition, we also propose a defense mechanism we call Randomly Perturbed Ensemble Neural Networks (RPENNs). We provide a thorough and comprehensive evaluation of the considered defense mechanisms against a white-box attacker model, six different adversarial attack methods and using the ILSVRC2012 validation data set.
△ Less
Submitted 12 September, 2019;
originally announced September 2019.
-
On Solving Word Equations Using SAT
Authors:
Joel D. Day,
Thorsten Ehlers,
Mitja Kulczynski,
Florin Manea,
Dirk Nowotka,
Danny Bøgsted Poulsen
Abstract:
We present Woorpje, a string solver for bounded word equations (i.e., equations where the length of each variable is upper bounded by a given integer). Our algorithm works by reformulating the satisfiability of bounded word equations as a reachability problem for nondeterministic finite automata, and then carefully encoding this as a propositional satisfiability problem, which we then solve using…
▽ More
We present Woorpje, a string solver for bounded word equations (i.e., equations where the length of each variable is upper bounded by a given integer). Our algorithm works by reformulating the satisfiability of bounded word equations as a reachability problem for nondeterministic finite automata, and then carefully encoding this as a propositional satisfiability problem, which we then solve using the well-known Glucose SAT-solver. This approach has the advantage of allowing for the natural inclusion of additional linear length constraints. Our solver obtains reliable and competitive results and, remarkably, discovered several cases where state-of-the-art solvers exhibit a faulty behaviour.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
On Modelling the Avoidability of Patterns as CSP
Authors:
Thorsten Ehlers,
Florin Manea,
Dirk Nowotka,
Kamellia Reshadi
Abstract:
Solving avoidability problems in the area of string combinatorics often requires, in an initial step, the construction, via a computer program, of a very long word that does not contain any word that matches a given pattern. It is well known that this is a computationally hard task. Despite being rather straightforward that, ultimately, all such tasks can be formalized as constraints satisfaction…
▽ More
Solving avoidability problems in the area of string combinatorics often requires, in an initial step, the construction, via a computer program, of a very long word that does not contain any word that matches a given pattern. It is well known that this is a computationally hard task. Despite being rather straightforward that, ultimately, all such tasks can be formalized as constraints satisfaction problems, no unified approach to solving them was proposed so far, and very diverse ad-hoc methods were used. We aim to fill this gap: we show how several relevant avoidability problems can be modelled, and consequently solved, in an uniform way as constraint satisfaction problems, using the framework of MiniZinc. The main advantage of this approach is that one is now required only to formulate the avoidability problem in the MiniZinc language, and then the actual search for a solution does not have to be implemented ad-hoc, being instead carried out by a standard CSP-solver.
△ Less
Submitted 3 June, 2019;
originally announced June 2019.
-
On Collapsing Prefix Normal Words
Authors:
Pamela Fleischmann,
Mitja Kulczynski,
Dirk Nowotka
Abstract:
Prefix normal words are binary words in which each prefix has at least the same number of $\so$s as any factor of the same length. Firstly introduced by Fici and Lipták in 2011, the problem of determining the index of the prefix equivalence relation is still open. In this paper, we investigate two aspects of the problem, namely prefix normal palindromes and so-called collapsing words (extending th…
▽ More
Prefix normal words are binary words in which each prefix has at least the same number of $\so$s as any factor of the same length. Firstly introduced by Fici and Lipták in 2011, the problem of determining the index of the prefix equivalence relation is still open. In this paper, we investigate two aspects of the problem, namely prefix normal palindromes and so-called collapsing words (extending the notion of critical words). We prove characterizations for both the palindromes and the collapsing words and show their connection. Based on this, we show that still open problems regarding prefix normal words can be split into certain subproblems.
△ Less
Submitted 19 May, 2020; v1 submitted 28 May, 2019;
originally announced May 2019.
-
k-Spectra of weakly-c-Balanced Words
Authors:
Joel D. Day,
Pamela Fleischmann,
Florin Manea,
Dirk Nowotka
Abstract:
A word $u$ is a scattered factor of $w$ if $u$ can be obtained from $w$ by deleting some of its letters. That is, there exist the (potentially empty) words $u_1,u_2,..., u_n$, and $v_0,v_1,..,v_n$ such that $u = u_1u_2...u_n$ and $w = v_0u_1v_1u_2v_2...u_nv_n$. We consider the set of length-$k$ scattered factors of a given word w, called here $k$-spectrum and denoted $\ScatFact_k(w)$. We prove a s…
▽ More
A word $u$ is a scattered factor of $w$ if $u$ can be obtained from $w$ by deleting some of its letters. That is, there exist the (potentially empty) words $u_1,u_2,..., u_n$, and $v_0,v_1,..,v_n$ such that $u = u_1u_2...u_n$ and $w = v_0u_1v_1u_2v_2...u_nv_n$. We consider the set of length-$k$ scattered factors of a given word w, called here $k$-spectrum and denoted $\ScatFact_k(w)$. We prove a series of properties of the sets $\ScatFact_k(w)$ for binary strictly balanced and, respectively, $c$-balanced words $w$, i.e., words over a two-letter alphabet where the number of occurrences of each letter is the same, or, respectively, one letter has $c$-more occurrences than the other. In particular, we consider the question which cardinalities $n= |\ScatFact_k(w)|$ are obtainable, for a positive integer $k$, when $w$ is either a strictly balanced binary word of length $2k$, or a $c$-balanced binary word of length $2k-c$. We also consider the problem of reconstructing words from their $k$-spectra.
△ Less
Submitted 24 May, 2019; v1 submitted 19 April, 2019;
originally announced April 2019.
-
An optimal bound on the solution sets of one-variable word equations and its consequences
Authors:
Dirk Nowotka,
Aleksi Saarela
Abstract:
We solve two long-standing open problems on word equations. Firstly, we prove that a one-variable word equation with constants has either at most three or an infinite number of solutions. The existence of such a bound had been conjectured, and the bound three is optimal. Secondly, we consider independent systems of three-variable word equations without constants. If such a system has a nonperiodic…
▽ More
We solve two long-standing open problems on word equations. Firstly, we prove that a one-variable word equation with constants has either at most three or an infinite number of solutions. The existence of such a bound had been conjectured, and the bound three is optimal. Secondly, we consider independent systems of three-variable word equations without constants. If such a system has a nonperiodic solution, then this system of equations is at most of size 17. Although probably not optimal, this is the first finite bound found. However, the conjecture of that bound being actually two still remains open.
△ Less
Submitted 24 May, 2018;
originally announced May 2018.
-
Estimating Latencies of Task Sequences in Multi-Core Automotive ECUs
Authors:
Max J. Friese,
Thorsten Ehlers,
Dirk Nowotka
Abstract:
The computation of a cyber-physical system's reaction to a stimulus typically involves the execution of several tasks. The delay between stimulus and reaction thus depends on the interaction of these tasks and is subject to timing constraints. Such constraints exist for a number of reasons and range from possible impacts on customer experiences to safety requirements. We present a technique to det…
▽ More
The computation of a cyber-physical system's reaction to a stimulus typically involves the execution of several tasks. The delay between stimulus and reaction thus depends on the interaction of these tasks and is subject to timing constraints. Such constraints exist for a number of reasons and range from possible impacts on customer experiences to safety requirements. We present a technique to determine end-to-end latencies of such task sequences. The technique is demonstrated on the example of electronic control units (ECUs) in automotive embedded real-time systems. Our approach is able to deal with multi-core architectures and supports four different activation patterns, including interrupts. It is the first formal analysis approach making use of load assumptions in order to exclude infeasible data propagation paths without the knowledge of worst-case execution times or worst-case response times. We employ a constraint programming solver to compute bounds on end-to-end latencies.
△ Less
Submitted 4 April, 2018;
originally announced April 2018.
-
The Satisfiability of Extended Word Equations: The Boundary Between Decidability and Undecidability
Authors:
Joel Day,
Vijay Ganesh,
Paul He,
Florin Manea,
Dirk Nowotka
Abstract:
The study of word equations (or the existential theory of equations over free monoids) is a central topic in mathematics and theoretical computer science. The problem of deciding whether a given word equation has a solution was shown to be decidable by Makanin in the late 1970s, and since then considerable work has been done on this topic. In recent years, this decidability question has gained cri…
▽ More
The study of word equations (or the existential theory of equations over free monoids) is a central topic in mathematics and theoretical computer science. The problem of deciding whether a given word equation has a solution was shown to be decidable by Makanin in the late 1970s, and since then considerable work has been done on this topic. In recent years, this decidability question has gained critical importance in the context of string SMT solvers for security analysis. Further, many extensions (e.g., quantifier-free word equations with linear arithmetic over the length function) and fragments (e.g., restrictions on the number of variables) of this theory are important from a theoretical point of view, as well as for program analysis applications. Motivated by these considerations, we prove several new results and thus shed light on the boundary between decidability and undecidability for many fragments and extensions of the first order theory of word equations.
△ Less
Submitted 1 February, 2018;
originally announced February 2018.
-
Rollercoasters and Caterpillars
Authors:
Therese Biedl,
Ahmad Biniaz,
Robert Cummings,
Anna Lubiw,
Florin Manea,
Dirk Nowotka,
Jeffrey Shallit
Abstract:
A rollercoaster is a sequence of real numbers for which every maximal contiguous subsequence, that is increasing or decreasing, has length at least three. By translating this sequence to a set of points in the plane, a rollercoaster can be defined as a polygonal path for which every maximal sub-path, with positive- or negative-slope edges, has at least three points. Given a sequence of distinct re…
▽ More
A rollercoaster is a sequence of real numbers for which every maximal contiguous subsequence, that is increasing or decreasing, has length at least three. By translating this sequence to a set of points in the plane, a rollercoaster can be defined as a polygonal path for which every maximal sub-path, with positive- or negative-slope edges, has at least three points. Given a sequence of distinct real numbers, the rollercoaster problem asks for a maximum-length subsequence that is a rollercoaster. It was conjectured that every sequence of $n$ distinct real numbers contains a rollercoaster of length at least $\lceil n/2\rceil$ for $n>7$, while the best known lower bound is $Ω(n/\log n)$. In this paper we prove this conjecture. Our proof is constructive and implies a linear-time algorithm for computing a rollercoaster of this length. Extending the $O(n\log n)$-time algorithm for computing a longest increasing subsequence, we show how to compute a maximum-length rollercoaster within the same time bound. A maximum-length rollercoaster in a permutation of $\{1,\dots,n\}$ can be computed in $O(n \log \log n)$ time.
The search for rollercoasters was motivated by orthogeodesic point-set embedding of caterpillars. A caterpillar is a tree such that deleting the leaves gives a path, called the spine. A top-view caterpillar is one of degree 4 such that the two leaves adjacent to each vertex lie on opposite sides of the spine. As an application of our result on rollercoasters, we are able to find a planar drawing of every $n$-node top-view caterpillar on every set of $\frac{25}{3}n$ points in the plane, such that each edge is an orthogonal path with one bend. This improves the previous best known upper bound on the number of required points, which is $O(n \log n)$. We also show that such a drawing can be obtained in linear time, provided that the points are given in sorted order.
△ Less
Submitted 25 January, 2018;
originally announced January 2018.
-
Lagrange's Theorem for Binary Squares
Authors:
Parthasarathy Madhusudan,
Dirk Nowotka,
Aayush Rajasekaran,
Jeffrey Shallit
Abstract:
We show how to prove theorems in additive number theory using a decision procedure based on finite automata. Among other things, we obtain the following analogue of Lagrange's theorem: every natural number > 686 is the sum of at most 4 natural numbers whose canonical base-2 representation is a binary square, that is, a string of the form xx for some block of bits x. Here the number 4 is optimal. W…
▽ More
We show how to prove theorems in additive number theory using a decision procedure based on finite automata. Among other things, we obtain the following analogue of Lagrange's theorem: every natural number > 686 is the sum of at most 4 natural numbers whose canonical base-2 representation is a binary square, that is, a string of the form xx for some block of bits x. Here the number 4 is optimal. While we cannot embed this theorem itself in a decidable theory, we show that stronger lemmas that imply that the theorem can be embedded in decidable theories, and show how automated methods can be used to search for these stronger lemmas.
△ Less
Submitted 22 June, 2018; v1 submitted 11 October, 2017;
originally announced October 2017.
-
The Hardness of Solving Simple Word Equations
Authors:
Joel D. Day,
Florin Manea,
Dirk Nowotka
Abstract:
We investigate the class of regular-ordered word equations. In such equations, each variable occurs at most once in each side and the order of the variables occurring in both sides is the preserved (the variables can be, however, separated by potentially distinct constant factors). Surprisingly, we obtain that solving such simple equations, even when the sides contain exactly the same variables, i…
▽ More
We investigate the class of regular-ordered word equations. In such equations, each variable occurs at most once in each side and the order of the variables occurring in both sides is the preserved (the variables can be, however, separated by potentially distinct constant factors). Surprisingly, we obtain that solving such simple equations, even when the sides contain exactly the same variables, is NP-hard. By considerations regarding the combinatorial structure of the minimal solutions of the more general quadratic equations we obtain that the satisfiability problem for regular-ordered equations is in NP. Finally, we also show that a related class of simple word equations, that generalises one-variable equations, is in P.
△ Less
Submitted 28 February, 2017; v1 submitted 25 February, 2017;
originally announced February 2017.
-
Detecting One-variable Patterns
Authors:
Dmitry Kosolobov,
Florin Manea,
Dirk Nowotka
Abstract:
Given a pattern $p = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r$ such that $x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}$, where $x$ is a variable and $\overset{{}_{\leftarrow}}{x}$ its reversal, and $s_1,s_2,\ldots,s_r$ are strings that contain no variables, we describe an algorithm that constructs in $O(rn)$ time a compact representation of all $P$ instances of $p$ in an input string of…
▽ More
Given a pattern $p = s_1x_1s_2x_2\cdots s_{r-1}x_{r-1}s_r$ such that $x_1,x_2,\ldots,x_{r-1}\in\{x,\overset{{}_{\leftarrow}}{x}\}$, where $x$ is a variable and $\overset{{}_{\leftarrow}}{x}$ its reversal, and $s_1,s_2,\ldots,s_r$ are strings that contain no variables, we describe an algorithm that constructs in $O(rn)$ time a compact representation of all $P$ instances of $p$ in an input string of length $n$ over a polynomially bounded integer alphabet, so that one can report those instances in $O(P)$ time.
△ Less
Submitted 18 July, 2017; v1 submitted 31 March, 2016;
originally announced April 2016.
-
A note on Thue games
Authors:
Robert Mercaş,
Dirk Nowotka
Abstract:
In this work we improve on a result from~\cite{GryKosZma15}. In particular, we investigate the situation where a word is constructed jointly by two players who alternately append letters to the end of an existing word. One of the players (Ann) tries to avoid (non-trivial) repetitions, while the other one (Ben) tries to enforce them. We show a construction that is closer to the lower bound showed i…
▽ More
In this work we improve on a result from~\cite{GryKosZma15}. In particular, we investigate the situation where a word is constructed jointly by two players who alternately append letters to the end of an existing word. One of the players (Ann) tries to avoid (non-trivial) repetitions, while the other one (Ben) tries to enforce them. We show a construction that is closer to the lower bound showed in~\cite{GryKozMic13} using entropy compression, and building on the probabilistic arguments based on a version of the Lovász Local Lemma from~\cite{Peg11}. We provide an explicit strategy for Ann to avoid (non-trivial) repetitions over a $7$-letter alphabet.
△ Less
Submitted 11 January, 2016;
originally announced January 2016.
-
Pattern Avoidability with Involution
Authors:
Bastian Bischoff,
Dirk Nowotka
Abstract:
An infinte word w avoids a pattern p with the involution t if there is no substitution for the variables in p and no involution t such that the resulting word is a factor of w. We investigate the avoidance of patterns with respect to the size of the alphabet. For example, it is shown that the pattern a t(a) a can be avoided over three letters but not two letters, whereas it is well…
▽ More
An infinte word w avoids a pattern p with the involution t if there is no substitution for the variables in p and no involution t such that the resulting word is a factor of w. We investigate the avoidance of patterns with respect to the size of the alphabet. For example, it is shown that the pattern a t(a) a can be avoided over three letters but not two letters, whereas it is well known that a a a is avoidable over two letters.
△ Less
Submitted 17 August, 2011;
originally announced August 2011.
-
Maximal Intersection Queries in Randomized Input Models
Authors:
Benjamin Hoffmann,
Mikhail Lifshits,
Yury Lifshits,
Dirk Nowotka
Abstract:
Consider a family of sets and a single set, called the query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and…
▽ More
Consider a family of sets and a single set, called the query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and distributing on-line advertisements. In general, maximal intersection queries are computationally expensive. We investigate two well-motivated distributions over all families of sets and propose an algorithm for each of them. We show that with very high probability an almost optimal solution is found in time which is logarithmic in the size of the family. Moreover, we point out a threshold phenomenon on the probabilities of intersecting sets in each of our two input models which leads to the efficient algorithms mentioned above.
△ Less
Submitted 1 April, 2010;
originally announced April 2010.
-
On a Non-Context-Free Extension of PDL
Authors:
Stefan Göller,
Dirk Nowotka
Abstract:
Over the last 25 years, a lot of work has been done on seeking for decidable non-regular extensions of Propositional Dynamic Logic (PDL). Only recently, an expressive extension of PDL, allowing visibly pushdown automata (VPAs) as a formalism to describe programs, was introduced and proven to have a satisfiability problem complete for deterministic double exponential time. Lately, the VPA formali…
▽ More
Over the last 25 years, a lot of work has been done on seeking for decidable non-regular extensions of Propositional Dynamic Logic (PDL). Only recently, an expressive extension of PDL, allowing visibly pushdown automata (VPAs) as a formalism to describe programs, was introduced and proven to have a satisfiability problem complete for deterministic double exponential time. Lately, the VPA formalism was extended to so called k-phase multi-stack visibly pushdown automata (k-MVPAs). Similarly to VPAs, it has been shown that the language of k-MVPAs have desirable effective closure properties and that the emptiness problem is decidable. On the occasion of introducing k-MVPAs, it has been asked whether the extension of PDL with k-MVPAs still leads to a decidable logic. This question is answered negatively here. We prove that already for the extension of PDL with 2-phase MVPAs with two stacks satisfiability becomes Σ_1^1-complete.
△ Less
Submitted 18 July, 2007; v1 submitted 4 July, 2007;
originally announced July 2007.
-
Periodicity and Unbordered Words: A Proof of the Extended Duval Conjecture
Authors:
Tero Harju,
Dirk Nowotka
Abstract:
The relationship between the length of a word and the maximum length of its unbordered factors is investigated in this paper. Consider a finite word w of length n. We call a word bordered, if it has a proper prefix which is also a suffix of that word. Let f(w) denote the maximum length of all unbordered factors of w, and let p(w) denote the period of w. Clearly, f(w) < p(w)+1.
We establish tha…
▽ More
The relationship between the length of a word and the maximum length of its unbordered factors is investigated in this paper. Consider a finite word w of length n. We call a word bordered, if it has a proper prefix which is also a suffix of that word. Let f(w) denote the maximum length of all unbordered factors of w, and let p(w) denote the period of w. Clearly, f(w) < p(w)+1.
We establish that f(w) = p(w), if w has an unbordered prefix of length f(w) and n > 2f(w)-2. This bound is tight and solves the stronger version of a 21 years old conjecture by Duval. It follows from this result that, in general, n > 3f(w)-3 implies f(w) = p(w) which gives an improved bound for the question asked by Ehrenfeucht and Silberger in 1979.
△ Less
Submitted 25 May, 2003; v1 submitted 23 May, 2003;
originally announced May 2003.