-
Low-Latency Sliding Window Algorithms for Formal Languages
Authors:
Moses Ganardi,
Louis Jachiet,
Markus Lohrey,
Thomas Schwentick
Abstract:
Low-latency sliding window algorithms for regular and context-free languages are studied, where latency refers to the worst-case time spent for a single window update or query. For every regular language $L$ it is shown that there exists a constant-latency solution that supports adding and removing symbols independently on both ends of the window (the so-called two-way variable-size model). We pro…
▽ More
Low-latency sliding window algorithms for regular and context-free languages are studied, where latency refers to the worst-case time spent for a single window update or query. For every regular language $L$ it is shown that there exists a constant-latency solution that supports adding and removing symbols independently on both ends of the window (the so-called two-way variable-size model). We prove that this result extends to all visibly pushdown languages. For deterministic 1-counter languages we present a $\mathcal{O}(\log n)$ latency sliding window algorithm for the two-way variable-size model where $n$ refers to the window size. We complement these results with a conditional lower bound: there exists a fixed real-time deterministic context-free language $L$ such that, assuming the OMV (online matrix vector multiplication) conjecture, there is no sliding window algorithm for $L$ with latency $n^{1/2-ε}$ for any $ε>0$, even in the most restricted sliding window model (one-way fixed-size model). The above mentioned results all refer to the unit-cost RAM model with logarithmic word size. For regular languages we also present a refined picture using word sizes $\mathcal{O}(1)$, $\mathcal{O}(\log\log n)$, and $\mathcal{O}(\log n)$.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
Which arithmetic operations can be performed in constant time in the RAM model with addition?
Authors:
Étienne Grandjean,
Louis Jachiet
Abstract:
In the literature of algorithms, the specific computation model is often not explicit as it is assumed that the model of computation is the RAM (Random Access Machine) model. However, the RAM model itself is ill-founded in the literature, with disparate definitions and no unified results.
The ambition of this paper is to found the RAM model from scratch by exhibiting a RAM model that enjoys inte…
▽ More
In the literature of algorithms, the specific computation model is often not explicit as it is assumed that the model of computation is the RAM (Random Access Machine) model. However, the RAM model itself is ill-founded in the literature, with disparate definitions and no unified results.
The ambition of this paper is to found the RAM model from scratch by exhibiting a RAM model that enjoys interesting algorithmic properties and the robustness of its complexity classes, notably LIN, the class of linear-time computable problems, or the now well-known CONST-DELAY-lin class of enumeration problems computable with constant delay after linear-time preprocessing,
The computation model that we define is a RAM whose contents and addresses of registers are $O(N)$, where $N$ is the size (number of registers) of the input, and where the time cost of each instruction is 1 (unit cost criterion). The key to the foundation of our RAM model will be to prove that even if addition is the only primitive operation, such a RAM can still compute all the basic arithmetic operations in constant time after a linear-time preprocessing. Moreover, while the RAM handles only $O(N)$ integers in each register, we will show that our RAM can handle $O(N^d)$ integers, for any fixed d, by storing them on $O(d)$ registers and we will have surprising algorithms that computes many operations acting on these "polynomial" integers -- addition, subtraction, multiplication, division, exponential, integer logarithm, integer square root (or $c$-th root, for any integer $c$), bitwise logical operations, and, more generally, any operation computable in linear time on a cellular automaton -- in constant time after a linear-time preprocessing.
△ Less
Submitted 21 February, 2023; v1 submitted 28 June, 2022;
originally announced June 2022.
-
Tailored vertex ordering for faster triangle listing in large graphs
Authors:
Fabrice Lécuyer,
Louis Jachiet,
Clémence Magnien,
Lionel Tabourier
Abstract:
Listing triangles is a fundamental graph problem with many applications, and large graphs require fast algorithms. Vertex ordering allows the orientation of edges from lower to higher vertex indices, and state-of-the-art triangle listing algorithms use this to accelerate their execution and to bound their time complexity. Yet, only basic orderings have been tested. In this paper, we show that stud…
▽ More
Listing triangles is a fundamental graph problem with many applications, and large graphs require fast algorithms. Vertex ordering allows the orientation of edges from lower to higher vertex indices, and state-of-the-art triangle listing algorithms use this to accelerate their execution and to bound their time complexity. Yet, only basic orderings have been tested. In this paper, we show that studying the precise cost of algorithms instead of their bounded complexity leads to faster solutions. We introduce cost functions that link ordering properties with the running time of a given algorithm. We prove that their minimization is NP-hard and propose heuristics to obtain new orderings with different trade-offs between cost reduction and ordering time. Using datasets with up to two billion edges, we show that our heuristics accelerate the listing of triangles by an average of 38% when the ordering is already given as an input, and 16% when the ordering time is included.
△ Less
Submitted 2 November, 2022; v1 submitted 9 March, 2022;
originally announced March 2022.
-
Efficient Enumeration Algorithms for Annotated Grammars
Authors:
Antoine Amarilli,
Louis Jachiet,
Martín Muñoz,
Cristian Riveros
Abstract:
We introduce annotated grammars, an extension of context-free grammars which allows annotations on terminals. Our model extends the standard notion of regular spanners, and is more expressive than the extraction grammars recently introduced by Peterfreund. We study the enumeration problem for annotated grammars: fixing a grammar, and given a string as input, enumerate all annotations of the string…
▽ More
We introduce annotated grammars, an extension of context-free grammars which allows annotations on terminals. Our model extends the standard notion of regular spanners, and is more expressive than the extraction grammars recently introduced by Peterfreund. We study the enumeration problem for annotated grammars: fixing a grammar, and given a string as input, enumerate all annotations of the string that form a word derivable from the grammar. Our first result is an algorithm for unambiguous annotated grammars, which preprocesses the input string in cubic time and enumerates all annotations with output-linear delay. This improves over Peterfreund's result, which needs quintic time preprocessing to achieve this delay bound. We then study how we can reduce the preprocessing time while kee** the same delay bound, by making additional assumptions on the grammar. Specifically, we present a class of grammars which only have one derivation shape for all outputs, for which we can enumerate with quadratic time preprocessing. We also give classes that generalize regular spanners for which linear time preprocessing suffices.
△ Less
Submitted 17 May, 2022; v1 submitted 3 January, 2022;
originally announced January 2022.
-
Dynamic Membership for Regular Languages
Authors:
Antoine Amarilli,
Louis Jachiet,
Charles Paperman
Abstract:
We study the dynamic membership problem for regular languages: fix a language L, read a word w, build in time O(|w|) a data structure indicating if w is in L, and maintain this structure efficiently under letter substitutions on w. We consider this problem on the unit cost RAM model with logarithmic word length, where the problem always has a solution in O(log |w| / log log |w|) per operation.
W…
▽ More
We study the dynamic membership problem for regular languages: fix a language L, read a word w, build in time O(|w|) a data structure indicating if w is in L, and maintain this structure efficiently under letter substitutions on w. We consider this problem on the unit cost RAM model with logarithmic word length, where the problem always has a solution in O(log |w| / log log |w|) per operation.
We show that the problem is in O(log log |w|) for languages in an algebraically-defined, decidable class QSG, and that it is in O(1) for another such class QLZG. We show that languages not in QSG admit a reduction from the prefix problem for a cyclic group, so that they require Ω(log |w| / log log |w|) operations in the worst case; and that QSG languages not in QLZG admit a reduction from the prefix problem for the multiplicative monoid U 1 = {0, 1}, which we conjecture cannot be maintained in O(1). This yields a conditional trichotomy. We also investigate intermediate cases between O(1) and O(log log |w|).
Our results are shown via the dynamic word problem for monoids and semigroups, for which we also give a classification. We thus solve open problems of the paper of Skovbjerg Frandsen, Miltersen, and Skyum [30] on the dynamic word problem, and additionally cover regular languages.
△ Less
Submitted 4 June, 2021; v1 submitted 15 February, 2021;
originally announced February 2021.
-
Ranked enumeration of MSO logic on words
Authors:
Pierre Bourhis,
Alejandro Grez,
Louis Jachiet,
Cristian Riveros
Abstract:
In the last years, enumeration algorithms with bounded delay have attracted a lot of attention for several data management tasks. Given a query and the data, the task is to preprocess the data and then enumerate all the answers to the query one by one and without repetitions. This enumeration scheme is typically useful when the solutions are treated on the fly or when we want to stop the enumerati…
▽ More
In the last years, enumeration algorithms with bounded delay have attracted a lot of attention for several data management tasks. Given a query and the data, the task is to preprocess the data and then enumerate all the answers to the query one by one and without repetitions. This enumeration scheme is typically useful when the solutions are treated on the fly or when we want to stop the enumeration once the pertinent solutions have been found. However, with the current schemes, there is no restriction on the order how the solutions are given and this order usually depends on the techniques used and not on the relevance for the user.
In this paper we study the enumeration of monadic second order logic (MSO) over words when the solutions are ranked. We present a framework based on MSO cost functions that allows to express MSO formulae on words with a cost associated with each solution. We then demonstrate the generality of our framework which subsumes, for instance, document spanners and regular complex event processing queries and adds ranking to them. The main technical result of the paper is an algorithm for enumerating all the solutions of formulae in increasing order of cost efficiently, namely, with a linear preprocessing phase and logarithmic delay between solutions. The novelty of this algorithm is based on using functional data structures, in particular, by extending functional Brodal queues to suit with the ranked enumeration of MSO on words.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Balancing expressiveness and inexpressiveness in view design
Authors:
Michael Benedikt,
Pierre Bourhis,
Louis Jachiet,
Efthymia Tsamoura
Abstract:
We study the design of data publishing mechanisms that allow a collection of autonomous distributed datasources to collaborate to support queries. A common mechanism for data publishing is via views: functions that expose derived data to users, usually specified as declarative queries. Our autonomy assumption is that the views must be on individual sources, but with the intention of supporting int…
▽ More
We study the design of data publishing mechanisms that allow a collection of autonomous distributed datasources to collaborate to support queries. A common mechanism for data publishing is via views: functions that expose derived data to users, usually specified as declarative queries. Our autonomy assumption is that the views must be on individual sources, but with the intention of supporting integrated queries. In deciding what data to expose to users, two considerations must be balanced. The views must be sufficiently expressive to support queries that users want to ask -- the utility of the publishing mechanism. But there may also be some expressiveness restriction. Here we consider two restrictions, a minimal information requirement, saying that the views should reveal as little as possible while supporting the utility query, and a non-disclosure requirement, formalizing the need to prevent external users from computing information that data owners do not want revealed.
We investigate the problem of designing views that satisfy both an expressiveness and an inexpressiveness requirement, for views in a restricted declarative language (conjunctive queries), and for arbitrary views.
△ Less
Submitted 27 June, 2020; v1 submitted 1 June, 2020;
originally announced June 2020.
-
Reasoning about disclosure in data integration in the presence of source constraints
Authors:
Michael Benedikt,
Pierre Bourhis,
Louis Jachiet,
Michaël Thomazo
Abstract:
Data integration systems allow users to access data sitting in multiple sources by means of queries over a global schema, related to the sources via map**s. Data sources often contain sensitive information, and thus an analysis is needed to verify that a schema satisfies a privacy policy, given as a set of queries whose answers should not be accessible to users. Such an analysis should take into…
▽ More
Data integration systems allow users to access data sitting in multiple sources by means of queries over a global schema, related to the sources via map**s. Data sources often contain sensitive information, and thus an analysis is needed to verify that a schema satisfies a privacy policy, given as a set of queries whose answers should not be accessible to users. Such an analysis should take into account not only knowledge that an attacker may have about the map**s, but also what they may know about the semantics of the sources. In this paper, we show that source constraints can have a dramatic impact on disclosure analysis. We study the problem of determining whether a given data integration system discloses a source query to an attacker in the presence of constraints, providing both lower and upper bounds on source-aware disclosure analysis.
△ Less
Submitted 14 December, 2020; v1 submitted 3 June, 2019;
originally announced June 2019.
-
A Circuit-Based Approach to Efficient Enumeration
Authors:
Antoine Amarilli,
Pierre Bourhis,
Louis Jachiet,
Stefan Mengel
Abstract:
We study the problem of enumerating the satisfying valuations of a circuit while bounding the delay, i.e., the time needed to compute each successive valuation. We focus on the class of structured d-DNNF circuits originally introduced in knowledge compilation, a sub-area of artificial intelligence. We propose an algorithm for these circuits that enumerates valuations with linear preprocessing and…
▽ More
We study the problem of enumerating the satisfying valuations of a circuit while bounding the delay, i.e., the time needed to compute each successive valuation. We focus on the class of structured d-DNNF circuits originally introduced in knowledge compilation, a sub-area of artificial intelligence. We propose an algorithm for these circuits that enumerates valuations with linear preprocessing and delay linear in the Hamming weight of each valuation. Moreover, valuations of constant Hamming weight can be enumerated with linear preprocessing and constant delay.
Our results yield a framework for efficient enumeration that applies to all problems whose solutions can be compiled to structured d-DNNFs. In particular, we use it to recapture classical results in database theory, for factorized database representations and for MSO evaluation. This gives an independent proof of constant-delay enumeration for MSO formulae with first-order free variables on bounded-treewidth structures.
△ Less
Submitted 5 May, 2017; v1 submitted 18 February, 2017;
originally announced February 2017.