Search | arXiv e-print repository

Regular Expressions with Backreferences on Multiple Context-Free Languages, and the Closed-Star Condition

Abstract: Backreference is a well-known practical extension of regular expressions and most modern programming languages, such as Java, Python, JavaScript and more, support regular expressions with backreferences (rewb) in their standard libraries for string processing. A difficulty of backreference is non-regularity: unlike some other extensions, backreference strictly enhances the expressive power of regu… ▽ More Backreference is a well-known practical extension of regular expressions and most modern programming languages, such as Java, Python, JavaScript and more, support regular expressions with backreferences (rewb) in their standard libraries for string processing. A difficulty of backreference is non-regularity: unlike some other extensions, backreference strictly enhances the expressive power of regular expressions and thus rewbs can describe non-regular (in fact, even non-context-free) languages. In this paper, we investigate the expressive power of rewbs by comparing rewbs to multiple context-free languages (MCFL) and parallel multiple context-free languages (PMCFL). First, we prove that the language class of rewbs is a proper subclass of unary-PMCFLs. The class of unary-PMCFLs coincides with that of EDT0L languages, and our result strictly improves the known upper bound of rewbs. Additionally, we show that, however, the language class of rewbs is not contained in that of MCFLs even when restricted to rewbs with only one capturing group and no captured references. Therefore, in general, the parallelism seems essential for rewbs. Backed by these results, we define a novel syntactic condition on rewbs that we call closed-star and observe that it provides an upper bound on the number of times a rewb references the same captured string. The closed-star condition allows dispensing with the parallelism: that is, we prove that the language class of closed-star rewbs falls inside the class of unary-MCFLs, which is equivalent to that of EDT0L systems of finite index. Furthermore, as additional evidence for the robustness of the condition, we show that the language class of closed-star rewbs also falls inside the class of nonerasing stack languages (NESL). △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 26 pages

arXiv:2311.09324 [pdf]

Thermal Magnetoelectrics in all Inorganic Quasi-Two-Dimensional Halide Perovskites

Authors: Tong Zhu, Xuezeng Lu, Takuya Aoyama, Koji Fujita, Yusuke Nambu, Takashi Saito, Hiroshi Takatsu, Tatsushi Kawasaki, Takumi Terauchi, Shunsuke Kurosawa, Akihiro Yamaji, Hao-Bo Li, Cedric Tassel, Kenya Ohgushi, James M. Rondinelli, Hiroshi Kageyama

Abstract: From lithium-ion batteries to high-temperature superconductors, oxide materials have been widely used in electronic devices. However, demands of future technologies require materials beyond oxides, as anion chemistries distinct from oxygen can expand the palette of mechanisms and phenomena, to achieve superior functionalities. Examples include nitride-based wide bandgap semiconductors and halide p… ▽ More From lithium-ion batteries to high-temperature superconductors, oxide materials have been widely used in electronic devices. However, demands of future technologies require materials beyond oxides, as anion chemistries distinct from oxygen can expand the palette of mechanisms and phenomena, to achieve superior functionalities. Examples include nitride-based wide bandgap semiconductors and halide perovskite solar cells, with MAPbBr3 being a representation revolutionizing photovoltaics research. Here, we demonstrate magnetoelectric behaviour in quasi-two-dimensional halides (K,Rb)3Mn2Cl7 through simultaneous thermal control of electric and magnetic polarizations by exploiting a polar-to-antipolar displacive transition. Additionally, our calculations indicate a possible polarization switching path including a strong magnetoelectric coupling, indicating halides can be excellent platforms to design future multiferroic and ferroelectric devices. We expect our findings to broaden the exploration of multiferroics to non-oxide materials and open access to novel mechanisms, beyond conventional electric/magnetic control, for coupling ferroic orders. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: 20 pages, 4 figures

arXiv:2307.15463 [pdf, other]

Answer Refinement Modification: Refinement Type System for Algebraic Effects and Handlers

Authors: Fuga Kawamata, Hiroshi Unno, Taro Sekiyama, Tachio Terauchi

Abstract: Algebraic effects and handlers are a mechanism to structure programs with computational effects in a modular way. They are recently gaining popularity and being adopted in practical languages, such as OCaml. Meanwhile, there has been substantial progress in program verification via refinement type systems. However, thus far there has not been a satisfactory refinement type system for algebraic eff… ▽ More Algebraic effects and handlers are a mechanism to structure programs with computational effects in a modular way. They are recently gaining popularity and being adopted in practical languages, such as OCaml. Meanwhile, there has been substantial progress in program verification via refinement type systems. However, thus far there has not been a satisfactory refinement type system for algebraic effects and handlers. In this paper, we fill the void by proposing a novel refinement type system for algebraic effects and handlers. The expressivity and usefulness of algebraic effects and handlers come from their ability to manipulate delimited continuations, but delimited continuations also complicate programs' control flow and make their verification harder. To address the complexity, we introduce a novel concept that we call answer refinement modification (ARM for short), which allows the refinement type system to precisely track what effects occur and in what order when a program is executed, and reflect such information as modifications to the refinements in the types of delimited continuations. We formalize our type system that supports ARM (as well as answer type modification, or ATM) and prove its soundness. Additionally, as a proof of concept, we have implemented a corresponding type checking and inference algorithm for a subset of OCaml 5, and evaluated it on a number of benchmark programs. The evaluation demonstrates that ARM is conceptually simple and practically useful. Finally, a natural alternative to directly reasoning about a program with delimited continuations is to apply a continuation passing style (CPS) transformation that transforms the program to a pure program. We investigate this alternative, and show that the approach is indeed possible by proposing a novel CPS transformation for algebraic effects and handlers that enjoys bidirectional (refinement-)type-preservation. △ Less

Submitted 17 November, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: 81 pages

arXiv:2307.08531 [pdf, other]

On the Expressive Power of Regular Expressions with Backreferences

Authors: Taisei Nogami, Tachio Terauchi

Abstract: A rewb is a regular expression extended with a feature called backreference. It is broadly known that backreference is a practical extension of regular expressions, and is supported by most modern regular expression engines, such as those in the standard libraries of Java, Python, and more. Meanwhile, indexed languages are the languages generated by indexed grammars, a formal grammar class propose… ▽ More A rewb is a regular expression extended with a feature called backreference. It is broadly known that backreference is a practical extension of regular expressions, and is supported by most modern regular expression engines, such as those in the standard libraries of Java, Python, and more. Meanwhile, indexed languages are the languages generated by indexed grammars, a formal grammar class proposed by A.V.Aho. We show that these two models' expressive powers are related in the following way: every language described by a rewb is an indexed language. As the smallest formal grammar class previously known to contain rewbs is the class of context sensitive languages, our result strictly improves the known upper-bound. Moreover, we prove the following two claims: there exists a rewb whose language does not belong to the class of stack languages, which is a proper subclass of indexed languages, and the language described by a rewb without a captured reference is in the class of nonerasing stack languages, which is a proper subclass of stack languages. Finally, we show that the hierarchy investigated in a prior study, which separates the expressive power of rewbs by the notion of nested levels, is within the class of nonerasing stack languages. △ Less

Submitted 8 August, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

Comments: 20 pages, the full version of the paper to appear in MFCS 2023

arXiv:2106.02628 [pdf, other]

Constraint-based Relational Verification

Authors: Hiroshi Unno, Tachio Terauchi, Eric Koskinen

Abstract: In recent years they have been numerous works that aim to automate relational verification. Meanwhile, although Constrained Horn Clauses (CHCs) empower a wide range of verification techniques and tools, they lack the ability to express hyperproperties beyond $k$-safety such as generalized non-interference and co-termination. This paper describes a novel and fully automated constraint-based appro… ▽ More In recent years they have been numerous works that aim to automate relational verification. Meanwhile, although Constrained Horn Clauses (CHCs) empower a wide range of verification techniques and tools, they lack the ability to express hyperproperties beyond $k$-safety such as generalized non-interference and co-termination. This paper describes a novel and fully automated constraint-based approach to relational verification. We first introduce a new class of predicate Constraint Satisfaction Problems called pfwCSP where constraints are represented as clauses modulo first-order theories over predicate variables of three kinds: ordinary, well-founded, or functional. This generalization over CHCs permits arbitrary (i.e., possibly non-Horn) clauses, well-foundedness constraints, functionality constraints, and is capable of expressing these relational verification problems. Our approach enables us to express and automatically verify problem instances that require non-trivial (i.e., non-sequential and non-lock-step) self-composition by automatically inferring appropriate schedulers (or alignment) that dictate when and which program copies move. To solve problems in this new language, we present a constraint solving method for pfwCSP based on stratified CounterExample-Guided Inductive Synthesis (CEGIS) of ordinary, well-founded, and functional predicates. We have implemented the proposed framework and obtained promising results on diverse relational verification problems that are beyond the scope of the previous verification frameworks. △ Less

Submitted 4 June, 2021; originally announced June 2021.

arXiv:2010.12450 [pdf, ps, other]

Repairing DoS Vulnerability of Real-World Regexes

Authors: Nariyoshi Chida, Tachio Terauchi

Abstract: There has been much work on synthesizing and repairing regular expressions (regexes for short) from examples. These programming-by-example (PBE) methods help the users write regexes by letting them reflect their intention by examples. However, the existing methods may generate regexes whose matching may take super-linear time and are vulnerable to regex denial of service (ReDoS) attacks. This pape… ▽ More There has been much work on synthesizing and repairing regular expressions (regexes for short) from examples. These programming-by-example (PBE) methods help the users write regexes by letting them reflect their intention by examples. However, the existing methods may generate regexes whose matching may take super-linear time and are vulnerable to regex denial of service (ReDoS) attacks. This paper presents the first PBE repair method that is guaranteed to generate only invulnerable regexes. Importantly, our method can handle real-world regexes containing lookarounds and backreferences. Due to the extensions, the existing formal definitions of ReDoS vulnerabilities that only consider pure regexes are insufficient. Therefore, we first give a novel formal semantics and complexity of backtracking matching algorithms for real-world regexes, and with them, give the first formal definition of ReDoS vulnerability for real-world regexes. Next, we present a novel condition called real-world strong 1-unambiguity that is sufficient for guaranteeing the invulnerability of real-world regexes, and formalize the corresponding PBE repair problem. Finally, we present an algorithm that solves the repair problem. The algorithm builds on and extends the previous PBE methods to handle the real-world extensions and with constraints to enforce the real-world strong 1-unambiguity condition. △ Less

Submitted 20 August, 2022; v1 submitted 23 October, 2020; originally announced October 2020.

Journal ref: IEEE Symposium on Security & Privacy, May 2022

arXiv:2007.03656 [pdf, other]

Program Verification via Predicate Constraint Satisfiability Modulo Theories

Authors: Hiroshi Unno, Yuki Satake, Tachio Terauchi, Eric Koskinen

Abstract: This paper presents a verification framework based on a new class of predicate Constraint Satisfaction Problems called pCSP where constraints are represented as clauses modulo first-order theories over function variables and predicate variables that may represent well-founded predicates. The verification framework generalizes an existing one based on Constrained Horn Clauses (CHCs) to arbitrary cl… ▽ More This paper presents a verification framework based on a new class of predicate Constraint Satisfaction Problems called pCSP where constraints are represented as clauses modulo first-order theories over function variables and predicate variables that may represent well-founded predicates. The verification framework generalizes an existing one based on Constrained Horn Clauses (CHCs) to arbitrary clauses, function variables, and well-foundedness constraints. While it is known that the satisfiability of CHCs and the validity of queries for Constrained Logic Programs (CLP) are inter-reducible, we show that, thanks to the added expressiveness, pCSP is expressive enough to express muCLP queries. muCLP itself is a new extension of CLP that we propose in this paper. It extends CLP with arbitrarily nested inductive and co-inductive predicates and is equi-expressive as first-order fixpoint logic. We show that muCLP can naturally encode a wide variety of verification problems including but not limited to termination/non-termination verification and even full modal mu-calculus model checking of programs written in various languages. To establish our verification framework, we present (1) a sound and complete reduction algorithm from muCLP to pCSP and (2) a constraint solving method for pCSP based on stratified CounterExample-Guided Inductive Synthesis (CEGIS) of (co-)inductive invariants, ranking functions, and Skolem functions witnessing existential quantifiers. Stratified CEGIS combines CEGIS with stratified families of templates to achieve relative completeness and faster and stable convergence of CEGIS by avoiding the overfitting problem. We have implemented the proposed framework and obtained promising results on diverse verification problems that are beyond the scope of the previous verification frameworks based on CHCs. △ Less

Submitted 7 July, 2020; originally announced July 2020.

arXiv:1610.05603 [pdf, ps, other]

Compositional Synthesis of Leakage Resilient Programs

Authors: Arthur Blot, Masaki Yamamoto, Tachio Terauchi

Abstract: A promising approach to defend against side channel attacks is to build programs that are leakage resilient, in a formal sense. One such formal notion of leakage resilience is the n-threshold-probing model proposed in the seminal work by Ishai et al. In a recent work, Eldib and Wang have proposed a method for automatically synthesizing programs that are leakage resilient according to this model, f… ▽ More A promising approach to defend against side channel attacks is to build programs that are leakage resilient, in a formal sense. One such formal notion of leakage resilience is the n-threshold-probing model proposed in the seminal work by Ishai et al. In a recent work, Eldib and Wang have proposed a method for automatically synthesizing programs that are leakage resilient according to this model, for the case n=1. In this paper, we show that the n-threshold-probing model of leakage resilience enjoys a certain compositionality property that can be exploited for synthesis. We use the property to design a synthesis method that efficiently synthesizes leakage-resilient programs in a compositional manner, for the general case of n > 1. We have implemented a prototype of the synthesis algorithm, and we demonstrate its effectiveness by synthesizing leakage-resilient versions of benchmarks taken from the literature. △ Less

Submitted 28 October, 2016; v1 submitted 18 October, 2016; originally announced October 2016.

arXiv:1207.0871 [pdf, ps, other]

doi 10.4204/EPTCS.85.6

Quantitative Information Flow as Safety and Liveness Hyperproperties

Authors: Hirotoshi Yasuoka, Tachio Terauchi

Abstract: We employ Clarkson and Schneider's "hyperproperties" to classify various verification problems of quantitative information flow. The results of this paper unify and extend the previous results on the hardness of checking and inferring quantitative information flow. In particular, we identify a subclass of liveness hyperproperties, which we call "k-observable hyperproperties", that can be checked… ▽ More We employ Clarkson and Schneider's "hyperproperties" to classify various verification problems of quantitative information flow. The results of this paper unify and extend the previous results on the hardness of checking and inferring quantitative information flow. In particular, we identify a subclass of liveness hyperproperties, which we call "k-observable hyperproperties", that can be checked relative to a reachability oracle via self composition. △ Less

Submitted 3 July, 2012; originally announced July 2012.

Comments: In Proceedings QAPL 2012, arXiv:1207.0559

Journal ref: EPTCS 85, 2012, pp. 77-91

arXiv:1112.4237 [pdf, ps, other]

On Bounding Problems of Quantitative Information Flow

Authors: Hirotoshi Yasuoka, Tachio Terauchi

Abstract: Researchers have proposed formal definitions of quantitative information flow based on information theoretic notions such as the Shannon entropy, the min entropy, the guessing entropy, belief, and channel capacity. This paper investigates the hardness of precisely checking the quantitative information flow of a program according to such definitions. More precisely, we study the "bounding problem"… ▽ More Researchers have proposed formal definitions of quantitative information flow based on information theoretic notions such as the Shannon entropy, the min entropy, the guessing entropy, belief, and channel capacity. This paper investigates the hardness of precisely checking the quantitative information flow of a program according to such definitions. More precisely, we study the "bounding problem" of quantitative information flow, defined as follows: Given a program M and a positive real number q, decide if the quantitative information flow of M is less than or equal to q. We prove that the bounding problem is not a k-safety property for any k (even when q is fixed, for the Shannon-entropy-based definition with the uniform distribution), and therefore is not amenable to the self-composition technique that has been successfully applied to checking non-interference. We also prove complexity theoretic hardness results for the case when the program is restricted to loop-free boolean programs. Specifically, we show that the problem is PP-hard for all definitions, showing a gap with non-interference which is coNP-complete for the same class of programs. The paper also compares the results with the recently proved results on the comparison problems of quantitative information flow. △ Less

Submitted 18 December, 2011; originally announced December 2011.

Comments: To appear in Journal of Computer Security, IOS Press. arXiv admin note: substantial text overlap with arXiv:1004.0062

arXiv:1004.0062 [pdf, ps, other]

Quantitative Information Flow - Verification Hardness and Possibilities

Authors: Hirotoshi Yasuoka, Tachio Terauchi

Abstract: Researchers have proposed formal definitions of quantitative information flow based on information theoretic notions such as the Shannon entropy, the min entropy, the guessing entropy, and channel capacity. This paper investigates the hardness and possibilities of precisely checking and inferring quantitative information flow according to such definitions. We prove that, even for just comparing… ▽ More Researchers have proposed formal definitions of quantitative information flow based on information theoretic notions such as the Shannon entropy, the min entropy, the guessing entropy, and channel capacity. This paper investigates the hardness and possibilities of precisely checking and inferring quantitative information flow according to such definitions. We prove that, even for just comparing two programs on which has the larger flow, none of the definitions is a k-safety property for any k, and therefore is not amenable to the self-composition technique that has been successfully applied to precisely checking non-interference. We also show a complexity theoretic gap with non-interference by proving that, for loop-free boolean programs whose non-interference is coNP-complete, the comparison problem is #P-hard for all of the definitions. For positive results, we show that universally quantifying the distribution in the comparison problem, that is, comparing two programs according to the entropy based definitions on which has the larger flow for all distributions, is a 2-safety problem in general and is coNP-complete when restricted for loop-free boolean programs. We prove this by showing that the problem is equivalent to a simple relation naturally expressing the fact that one program is more secure than the other. We prove that the relation also refines the channel-capacity based definition, and that it can be precisely checked via the self-composition as well as the "interleaved" self-composition technique. △ Less

Submitted 1 April, 2010; originally announced April 2010.

Comments: To appear in Computer Security Foundations 2010

Showing 1–11 of 11 results for author: Terauchi, T