-
Recursed is not Recursive: A Jarring Result
Authors:
Erik Demaine,
Justin Kopinsky,
Jayson Lynch
Abstract:
Recursed is a 2D puzzle platform video game featuring treasure chests that, when jumped into, instantiate a room that can later be exited (similar to function calls), optionally generating a jar that returns back to that room (similar to continuations). We prove that Recursed is RE-complete and thus undecidable (not recursive) by a reduction from the Post Correspondence Problem. Our reduction is "…
▽ More
Recursed is a 2D puzzle platform video game featuring treasure chests that, when jumped into, instantiate a room that can later be exited (similar to function calls), optionally generating a jar that returns back to that room (similar to continuations). We prove that Recursed is RE-complete and thus undecidable (not recursive) by a reduction from the Post Correspondence Problem. Our reduction is "practical": the reduction from PCP results in fully playable levels that abide by all constraints governing levels (including the 15x20 room size) designed for the main game. Our reduction is also "efficient": a Turing machine can be simulated by a Recursed level whose size is linear in the encoding size of the Turing machine and whose solution length is polynomial in the running time of the Turing machine.
△ Less
Submitted 7 May, 2020; v1 submitted 12 February, 2020;
originally announced February 2020.
-
Infinite All-Layers Simple Foldability
Authors:
Hugo A. Akitaya,
Cordelia Avery,
Joseph Bergeron,
Erik D. Demaine,
Justin Kopinsky,
Jason Ku
Abstract:
We study the problem of deciding whether a crease pattern can be folded by simple folds (folding along one line at a time) under the infinite all-layers model introduced by [Akitaya et al., 2017], in which each simple fold is defined by an infinite line and must fold all layers of paper that intersect this line. This model is motivated by folding in manufacturing such as sheet-metal bending. We im…
▽ More
We study the problem of deciding whether a crease pattern can be folded by simple folds (folding along one line at a time) under the infinite all-layers model introduced by [Akitaya et al., 2017], in which each simple fold is defined by an infinite line and must fold all layers of paper that intersect this line. This model is motivated by folding in manufacturing such as sheet-metal bending. We improve on [Arkin et al., 2004] by giving a deterministic $O(n)$-time algorithm to decide simple foldability of 1D crease patterns in the all-layers model. Then we extend this 1D result to 2D, showing that simple foldability in this model can be decided in linear time for unassigned axis-aligned orthogonal crease patterns on axis-aligned 2D orthogonal paper. On the other hand, we show that simple foldability is strongly NP-complete if a subset of the creases have a mountain-valley assignment, even for an axis-aligned rectangle of paper.
△ Less
Submitted 24 January, 2019;
originally announced January 2019.
-
Relaxed Schedulers Can Efficiently Parallelize Iterative Algorithms
Authors:
Dan Alistarh,
Trevor Brown,
Justin Kopinsky,
Giorgi Nadiradze
Abstract:
There has been significant progress in understanding the parallelism inherent to iterative sequential algorithms: for many classic algorithms, the depth of the dependence structure is now well understood, and scheduling techniques have been developed to exploit this shallow dependence structure for efficient parallel implementations. A related, applied research strand has studied methods by which…
▽ More
There has been significant progress in understanding the parallelism inherent to iterative sequential algorithms: for many classic algorithms, the depth of the dependence structure is now well understood, and scheduling techniques have been developed to exploit this shallow dependence structure for efficient parallel implementations. A related, applied research strand has studied methods by which certain iterative task-based algorithms can be efficiently parallelized via relaxed concurrent priority schedulers. These allow for high concurrency when inserting and removing tasks, at the cost of executing superfluous work due to the relaxed semantics of the scheduler.
In this work, we take a step towards unifying these two research directions, by showing that there exists a family of relaxed priority schedulers that can efficiently and deterministically execute classic iterative algorithms such as greedy maximal independent set (MIS) and matching. Our primary result shows that, given a randomized scheduler with an expected relaxation factor of $k$ in terms of the maximum allowed priority inversions on a task, and any graph on $n$ vertices, the scheduler is able to execute greedy MIS with only an additive factor of poly($k$) expected additional iterations compared to an exact (but not scalable) scheduler. This counter-intuitive result demonstrates that the overhead of relaxation when computing MIS is not dependent on the input size or structure of the input graph. Experimental results show that this overhead can be clearly offset by the gain in performance due to the highly scalable scheduler. In sum, we present an efficient method to deterministically parallelize iterative sequential algorithms, with provable runtime guarantees in terms of the number of executed tasks to completion.
△ Less
Submitted 13 August, 2018;
originally announced August 2018.
-
Who witnesses The Witness? Finding witnesses in The Witness is hard and sometimes impossible
Authors:
Zachary Abel,
Jeffrey Bosboom,
Michael Coulombe,
Erik D. Demaine,
Linus Hamilton,
Adam Hesterberg,
Justin Kopinsky,
Jayson Lynch,
Mikhail Rudoy,
Clemens Thielen
Abstract:
We analyze the computational complexity of the many types of pencil-and-paper-style puzzles featured in the 2016 puzzle video game The Witness. In all puzzles, the goal is to draw a simple path in a rectangular grid graph from a start vertex to a destination vertex. The different puzzle types place different constraints on the path: preventing some edges from being visited (broken edges); forcing…
▽ More
We analyze the computational complexity of the many types of pencil-and-paper-style puzzles featured in the 2016 puzzle video game The Witness. In all puzzles, the goal is to draw a simple path in a rectangular grid graph from a start vertex to a destination vertex. The different puzzle types place different constraints on the path: preventing some edges from being visited (broken edges); forcing some edges or vertices to be visited (hexagons); forcing some cells to have certain numbers of incident path edges (triangles); or forcing the regions formed by the path to be partially monochromatic (squares), have exactly two special cells (stars), or be singly covered by given shapes (polyominoes) and/or negatively counting shapes (antipolyominoes). We show that any one of these clue types (except the first) is enough to make path finding NP-complete ("witnesses exist but are hard to find"), even for rectangular boards. Furthermore, we show that a final clue type (antibody), which necessarily "cancels" the effect of another clue in the same region, makes path finding $Σ_2$-complete ("witnesses do not exist"), even with a single antibody (combined with many anti/polyominoes), and the problem gets no harder with many antibodies. On the positive side, we give a polynomial-time algorithm for monomino clues, by reducing to hexagon clues on the boundary of the puzzle, even in the presence of broken edges, and solving "subset Hamiltonian path" for terminals on the boundary of an embedded planar graph in polynomial time.
△ Less
Submitted 8 February, 2019; v1 submitted 26 April, 2018;
originally announced April 2018.
-
Distributionally Linearizable Data Structures
Authors:
Dan Alistarh,
Trevor Brown,
Justin Kopinsky,
Jerry Z. Li,
Giorgi Nadiradze
Abstract:
Relaxed concurrent data structures have become increasingly popular, due to their scalability in graph processing and machine learning applications. Despite considerable interest, there exist families of natural, high performing randomized relaxed concurrent data structures, such as the popular MultiQueue pattern for implementing relaxed priority queue data structures, for which no guarantees are…
▽ More
Relaxed concurrent data structures have become increasingly popular, due to their scalability in graph processing and machine learning applications. Despite considerable interest, there exist families of natural, high performing randomized relaxed concurrent data structures, such as the popular MultiQueue pattern for implementing relaxed priority queue data structures, for which no guarantees are known in the concurrent setting. Our main contribution is in showing for the first time that, under a set of analytic assumptions, a family of relaxed concurrent data structures, including variants of MultiQueues, but also a new approximate counting algorithm we call the MultiCounter, provides strong probabilistic guarantees on the degree of relaxation with respect to the sequential specification, in arbitrary concurrent executions. We formalize these guarantees via a new correctness condition called distributional linearizability, tailored to concurrent implementations with randomized relaxations. Our result is based on a new analysis of an asynchronous variant of the classic power-of-two-choices load balancing algorithm, in which placement choices can be based on inconsistent, outdated information (this result may be of independent interest). We validate our results empirically, showing that the MultiCounter algorithm can implement scalable relaxed timestamps, which in turn can improve the performance of the classic TL2 transactional algorithm by up to 3 times, for some settings of parameters.
△ Less
Submitted 25 March, 2022; v1 submitted 3 April, 2018;
originally announced April 2018.
-
Path Puzzles: Discrete Tomography with a Path Constraint is Hard
Authors:
Jeffrey Bosboom,
Erik D. Demaine,
Martin L. Demaine,
Adam Hesterberg,
Roderick Kimball,
Justin Kopinsky
Abstract:
We prove that path puzzles with complete row and column information--or equivalently, 2D orthogonal discrete tomography with Hamiltonicity constraint--are strongly NP-complete, ASP-complete, and #P-complete. Along the way, we newly establish ASP-completeness and #P-completeness for 3-Dimensional Matching and Numerical 3-Dimensional Matching.
We prove that path puzzles with complete row and column information--or equivalently, 2D orthogonal discrete tomography with Hamiltonicity constraint--are strongly NP-complete, ASP-complete, and #P-complete. Along the way, we newly establish ASP-completeness and #P-completeness for 3-Dimensional Matching and Numerical 3-Dimensional Matching.
△ Less
Submitted 9 February, 2019; v1 submitted 3 March, 2018;
originally announced March 2018.
-
The Power of Choice in Priority Scheduling
Authors:
Dan Alistarh,
Justin Kopinsky,
Jerry Li,
Giorgi Nadiradze
Abstract:
Consider the following random process: we are given $n$ queues, into which elements of increasing labels are inserted uniformly at random. To remove an element, we pick two queues at random, and remove the element of lower label (higher priority) among the two. The cost of a removal is the rank of the label removed, among labels still present in any of the queues, that is, the distance from the op…
▽ More
Consider the following random process: we are given $n$ queues, into which elements of increasing labels are inserted uniformly at random. To remove an element, we pick two queues at random, and remove the element of lower label (higher priority) among the two. The cost of a removal is the rank of the label removed, among labels still present in any of the queues, that is, the distance from the optimal choice at each step. Variants of this strategy are prevalent in state-of-the-art concurrent priority queue implementations. Nonetheless, it is not known whether such implementations provide any rank guarantees, even in a sequential model.
We answer this question, showing that this strategy provides surprisingly strong guarantees: Although the single-choice process, where we always insert and remove from a single randomly chosen queue, has degrading cost, going to infinity as we increase the number of steps, in the two choice process, the expected rank of a removed element is $O( n )$ while the expected worst-case cost is $O( n \log n )$. These bounds are tight, and hold irrespective of the number of steps for which we run the process.
The argument is based on a new technical connection between "heavily loaded" balls-into-bins processes and priority scheduling.
Our analytic results inspire a new concurrent priority queue implementation, which improves upon the state of the art in terms of practical performance.
△ Less
Submitted 13 June, 2017;
originally announced June 2017.
-
Inherent Limitations of Hybrid Transactional Memory
Authors:
Dan Alistarh,
Justin Kopinsky,
Petr Kuznetsov,
Srivatsan Ravi,
Nir Shavit
Abstract:
Several Hybrid Transactional Memory (HyTM) schemes have recently been proposed to complement the fast, but best-effort, nature of Hardware Transactional Memory (HTM) with a slow, reliable software backup. However, the fundamental limitations of building a HyTM with nontrivial concurrency between hardware and software transactions are still not well understood.
In this paper, we propose a general…
▽ More
Several Hybrid Transactional Memory (HyTM) schemes have recently been proposed to complement the fast, but best-effort, nature of Hardware Transactional Memory (HTM) with a slow, reliable software backup. However, the fundamental limitations of building a HyTM with nontrivial concurrency between hardware and software transactions are still not well understood.
In this paper, we propose a general model for HyTM implementations, which captures the ability of hardware transactions to buffer memory accesses, and allows us to formally quantify and analyze the amount of overhead (instrumentation) of a HyTM scheme. We prove the following: (1) it is impossible to build a strictly serializable HyTM implementation that has both uninstrumented reads and writes, even for weak progress guarantees, and (2) under reasonable assumptions, in any opaque progressive HyTM, a hardware transaction must incur instrumentation costs linear in the size of its data set. We further provide two upper bound implementations whose instrumentation costs are optimal with respect to their progress guarantees. In sum, this paper captures for the first time an inherent trade-off between the degree of concurrency a HyTM provides between hardware and software transactions, and the amount of instrumentation overhead the implementation must incur.
△ Less
Submitted 17 February, 2015; v1 submitted 22 May, 2014;
originally announced May 2014.
-
The LevelArray: A Fast, Practical Long-Lived Renaming Algorithm
Authors:
Dan Alistarh,
Justin Kopinsky,
Alexander Matveev,
Nir Shavit
Abstract:
The long-lived renaming problem appears in shared-memory systems where a set of threads need to register and deregister frequently from the computation, while concurrent operations scan the set of currently registered threads. Instances of this problem show up in concurrent implementations of transactional memory, flat combining, thread barriers, and memory reclamation schemes for lock-free data s…
▽ More
The long-lived renaming problem appears in shared-memory systems where a set of threads need to register and deregister frequently from the computation, while concurrent operations scan the set of currently registered threads. Instances of this problem show up in concurrent implementations of transactional memory, flat combining, thread barriers, and memory reclamation schemes for lock-free data structures. In this paper, we analyze a randomized solution for long-lived renaming. The algorithmic technique we consider, called the LevelArray, has previously been used for hashing and one-shot (single-use) renaming. Our main contribu- tion is to prove that, in long-lived executions, where processes may register and deregister polynomially many times, the technique guarantees constant steps on average and O(log log n) steps with high probability for registering, unit cost for deregistering, and O(n) steps for collect queries, where n is an upper bound on the number of processes that may be active at any point in time. We also show that the algorithm has the surprising property that it is self-healing: under reasonable assumptions on the schedule, operations running while the data structure is in a degraded state implicitly help the data structure re-balance itself. This subtle mechanism obviates the need for expensive periodic rebuilding procedures. Our benchmarks validate this approach, showing that, for typical use parameters, the average number of steps a process takes to register is less than two and the worst-case number of steps is bounded by six, even in executions with billions of operations. We contrast this with other randomized implementations, whose worst-case behavior we show to be unreliable, and with deterministic implementations, whose cost is linear in n.
△ Less
Submitted 21 May, 2014;
originally announced May 2014.