11institutetext:

Parsimonious Optimal Dynamic Partial Order Reduction for Interleaving-Based Concurrency

Anonymous et al
Abstract

Stateless model checking is a fully automatic verification technique for concurrent programs that checks for safety violations by exploring all possible thread schedulings. It becomes effective when coupled with Dynamic Partial Order Reduction (DPOR), which introduces an equivalence on schedulings and reduces the amount of needed exploration. DPOR algorithms that are optimal are particularly effective in that they guarantee to explore exactly one execution from each equivalence class. Unfortunately, existing sequence-based optimal algorithms may in the worst case consume memory that is exponential in the size of the analyzed program. In this paper, we present Parsimonious-OPtimal DPOR (POP), an optimal DPOR algorithm for analyzing multi-threaded programs under sequential consistency, whose space consumption is polynomial in the worst case. POP combines several novel algorithmic techniques, including (i) a parsimonious race reversal strategy, which avoids multiple reversals of the same race, (ii) an eager race reversal strategy to avoid storing initial fragments of to-be-explored executions, and (iii) a space-efficient scheme for preventing redundant exploration, which replaces the use of sleep sets. Our implementation in Nidhugg shows that these techniques can significantly speed up the analysis of concurrent programs, and do so with low memory consumption. Comparison to a related optimal DPOR algorithm for a different representation of concurrent executions as graphs shows that POP has comparable worst-case performance for smaller benchmarks and outperforms the other one for larger programs.

1 Introduction

Testing and verification of multi-threaded programs is challenging, since it requires reasoning about all the ways in which operations executed by different threads can interfere. A successful technique for finding concurrency bugs in multithreaded programs and for verifying their absence is stateless model checking (SMC) [Godefroid:popl97]. Given a terminating program and fixed input data, SMC systematically explores the set of all thread schedulings that are possible during program runs. A dedicated runtime scheduler drives the SMC exploration by making decisions on scheduling whenever such choices may affect the interaction between threads. Given enough time, the exploration covers all possible executions and detects any unexpected program results, program crashes, or assertion violations. The technique is entirely automatic, has no false positives, does not consume excessive memory, and can reproduce the concurrency bugs it detects. SMC has been implemented in many tools (e.g., VeriSoft [Godefroid:verisoft-journal], Chess [MQBBNN:chess], Concuerror [Concuerror:ICST13], Nidhugg [tacas15:tso], rInspect [DBLP:conf/pldi/ZhangKW15], CDSChecker [NoDe:toplas16], RCMC [KLSV:popl18], and GenMC [GenMC@CAV-21]), and successfully applied to realistic programs (e.g., [GoHaJa:heartbeat] and [KoSa:sttt19]).

To reduce the number of explored executions, SMC tools typically employ dynamic partial order reduction (DPOR) [FG:dpor, abdulla2014optimal, KLSV:popl18]. DPOR defines an equivalence relation on executions, typically Mazurkiewicz trace equivalence [Mazurkiewicz:traces], which preserves many important correctness properties, such as reachability of local states and assertion violations, and explores at least one execution in each equivalence class. Thus, to analyze a program, it suffices to explore one execution from each equivalence class. DPOR was originally developed for models of concurrency [FG:dpor], where executions are expressed as sequences of interactions between threads/processes and shared objects. Subsequently, sequence-based DPOR has been adapted and refined to a number of programming models, including actor programs [TKLLMA:forte12], abstract computational models [ProbeSets@CONCUR-08], event driven programs [Event-DrivenSMC@OOPSLA-15, Maiya:tacas16, atva23], and MPI programs [DPOR:MPI]; it has been extended with features for efficiently handling spinloops and blocking constructs [Godot@FMCAD-22], and been adapted for weak concurrency memory models, such as TSO and PSO [DBLP:conf/pldi/ZhangKW15, tacas15:tso]. DPOR has also been adapted for weak memory models by representing executions as graphs, where nodes represent read and write operations, and edges represent reads-from and coherence relations; this allows the algorithm to be parametric on a specific memory model, at the cost of calling a memory-model oracle [KLSV:popl18, GenMC@PLDI-19]

An important improvement has been the introduction of optimal DPOR algorithms, which are efficient in that they guarantee to explore exactly one execution from each equivalence class. The first optimal DPOR algorithm was designed for the sequence-based representation [abdulla2014optimal]. Subsequently, optimal DPOR algorithms for even weaker equivalences than Mazurkiewicz trace equivalence have been developed [observers, rfsc@OOPSLA-19, OptimalDPOR@JSS-23]. In some DPOR algorithms [abdulla2014optimal, observers, OptimalDPOR@JSS-23], optimality comes at the price of added memory consumption which in the worst case can be exponential in the size of the program. Even though most benchmarks in the literature show a modest memory overhead as the price for optimality, it would be desirable to have an optimal DPOR algorithm whose memory consumption is guaranteed to be polynomial in the size of the program [optimal-dpor-jacm]. Such an algorithm, called TruSt [TruSt@POPL-22], was recently presented, but for a graph-based setting [GenMC@PLDI-19]. It would be desirable to develop a polynomial-space optimal DPOR algorithm also for sequence-based settings. One reason is that a majority of past work on DPOR is sequence-based; hence such an algorithm could be adapted to various programming models and features, some of which were recalled above. Another reason is that sequence-based models represent computations adhering to sequential consistency (SC) and TSO more naturally than graph-based models. For SC, representing executions as sequences of events makes executions consistent by construction and alleviates the need to resort to a potentially expensive SC oracle.

In this paper, we present the Parsimonious-OPtimal DPOR (POP) algorithm for analyzing multi-threaded programs under SC (Section 4). POP is designed for programs in which threads interact by atomic reads, writes, and RMWs to shared variables, and combines several novel algorithmic techniques.

  • A parsimonious race reversal technique (Section 4.1), which considers a race if and only if its reversal will generate a previously unexplored execution; in contrast, most existing DPOR algorithms reverse races indiscriminately, only to thereafter discard redundant reversals (e.g., by sleep sets or similar mechanisms).

  • An eager race reversal strategy (LABEL:sec:pop-algo), which immediately starts exploration of the new execution resulting from a race reversal; this prevents accumulation of a potentially exponential number of execution fragments generated by race reversals.

  • In order to avoid exploring several executions in the same equivalence class, a naïve realization of POP would employ an adaptation of sleep sets [Godefroid:thesis]. However, these can in the worst case become exponentially large. Therefore, POP employs a parsimonious characterization of sleep sets (Section 4.3): instead of representing the elements of the sleep set, POP uses a characterization of them, which allows to detect and prevent redundant exploration, and uses at most polynomial space. This sleep set characterization is uniquely computed from its generating race, implying that explorations of different executions share no state, thereby making POP suitable for parallelization.

We prove (in the appendices of the longer version) that the POP algorithm is correct (explores at least one execution in each equivalence class), optimal (explores exactly one execution in each equivalence class), does not suffer from blocked explorations, and requires only polynomial size memory.

We have implemented POP DPOR in an extension of the Nidhugg tool [tacas15:tso]. Using a wide variety of benchmarks (Section 6), we show that POP’s implementation indeed has its claimed properties, it always outperforms Optimal DPOR, and offers performance which is on par with TruSt, the state-of-the-art graph-based DPOR algorithm. Moreover, by being sequence-based, it scales much better than TruSt on programs with long executions.

2 Main Concepts

Initially: x = y = z = 0
x = 1;y = 1p𝑝pitalic_pa𝑎aitalic_a = xq𝑞qitalic_qc𝑐citalic_c = xr𝑟ritalic_rd𝑑ditalic_d = z;e𝑒eitalic_e = y;if e𝑒eitalic_e == 0f𝑓fitalic_f = xs𝑠sitalic_s

E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPTs::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = ys::𝑠absents\!\!:\!italic_s : d𝑑ditalic_d = zr::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = xq::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = xp::𝑝absentp\!\!:\!italic_p : y = 1E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPTr::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = xp::𝑝absentp\!\!:\!italic_p : y = 1q::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = xs::𝑠absents\!\!:\!italic_s : f𝑓fitalic_f = xs::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = ys::𝑠absents\!\!:\!italic_s : d𝑑ditalic_d = zp::𝑝absentp\!\!:\!italic_p : x = 1E3subscript𝐸3E_{3}italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPTs::𝑠absents\!\!:\!italic_s : f𝑓fitalic_f= xq::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = xp::𝑝absentp\!\!:\!italic_p : y = 1p::𝑝absentp\!\!:\!italic_p : x = 1s::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = ys::𝑠absents\!\!:\!italic_s : d𝑑ditalic_d = zr::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = xq::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = xs::𝑠absents\!\!:\!italic_s : f𝑓fitalic_f = xs::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = ys::𝑠absents\!\!:\!italic_s : d𝑑ditalic_d = z
Figure 1: A program and a part of its execution tree with three executions that POP will explore plus two schedules that will be the start of subsequent exploration. In E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, the blue arrows are program order, and red arcs show races that will be reversed. The three fresh reads on x are shown in teal, and schedules are shown in purple.

In this section, we informally present the core principles of our approach, in particular the three novel algorithmic techniques of parsimonious race reversal, eager race reversal, and parsimonious static representation of sleep sets, along with how they relate to previous sequence-based DPOR algorithms, on a simple example, shown in Fig. 1. In this code, four threads (p,q,r,s𝑝𝑞𝑟𝑠p,q,r,sitalic_p , italic_q , italic_r , italic_s) access three shared variables (x, y, z), using six thread-local registers (a,b,c,d,e,f𝑎𝑏𝑐𝑑𝑒𝑓a,b,c,d,e,fitalic_a , italic_b , italic_c , italic_d , italic_e , italic_f).111Throughout this paper, we assume that threads are spawned by a main thread, and that all shared variables get initialized to 00, also by the main thread. DPOR algorithms typically first explore an arbitrary execution, which is then inspected to detect races. Assume that the first execution is E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (cf. the tree in Fig. 1). To detect races in an execution E𝐸Eitalic_E, one must first compute its happens-before order, denoted 𝚑𝚋Esubscript𝚑𝚋𝐸\xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}% \mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}_{E}start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT, which is the smallest transitive relation that orders two events that are (i) in the same thread, or (ii) access a common shared variable and at least one of them is a write. A race consists of two events in different threads that are adjacent in the 𝚑𝚋Esubscript𝚑𝚋𝐸\xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}% \mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}_{E}start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT order. The execution E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT contains three races (red arcs left of E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in Fig. 1). Let us first consider the race on y, in which the first event is p::𝑝absentp\!\!:\!italic_p : y = 1 and the second is s::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = y. For each race, a DPOR algorithm constructs an initial fragment of an alternative execution, which reverses the race and branches off from the explored execution just before the race. In early DPOR algorithms, including the “classic” DPOR algorithm by Flanagan and Godefroid [FG:dpor] and the Source DPOR algorithm of Abdulla et al. [abdulla2014optimal], this initial fragment consists of just one event which can initiate an execution which reverses the race. For this race, they would store s::𝑠absents\!\!:\!italic_s : d𝑑ditalic_d = z as an alternative to p::𝑝absentp\!\!:\!italic_p : y = 1, since s::𝑠absents\!\!:\!italic_s : d𝑑ditalic_d = z must be performed before the racing access to y (i.e., s::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = y). Storing only the first event of an alternative execution does not consume much space, but has the problem that the execution after that event (s::𝑠absents\!\!:\!italic_s : d𝑑ditalic_d = z in our case) is not controlled, and may deviate from the path towards the racing event (s::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = y in our case), potentially leading to redundant exploration. Some DPOR algorithms, including Optimal DPOR and POP, address this problem by constructing the initial fragment as a sequence of events that lead to the second event, s::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = y in this case. POP constructs a minimal such sequence (called a schedule) consisting of the events that happen before (in the 𝚑𝚋E1subscript𝚑𝚋subscript𝐸1\xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}% \mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}_{E_{1}}start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT order) the second event of the race while omitting the first event of the race. The schedule in this case is s: d = z;s: e = ys: d = zs: e = y\mbox{$s\!\!:\!$ {$d$\,=\,{z}}};\mbox{$s\!\!:\!$ {$e$\,=\,{y}}}italic_s : italic_d typewriter_= typewriter_z ; italic_s : italic_e typewriter_= typewriter_y, which is inserted as an alternative continuation after p::𝑝absentp\!\!:\!italic_p : x = 1, marked purple in Fig. 1. This continuation can be extended to a maximal execution E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT.

Let us next consider the races on x in E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT; their first event is p::𝑝absentp\!\!:\!italic_p : x = 1. For the race whose second event is r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x, POP constructs the schedule r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x (there are no additional events happening before r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x), which is inserted as an initial fragment of an alternative execution, which can later be extended to E3subscript𝐸3E_{3}italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT (cf. Fig. 1). For the race whose second event is q::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = x, POP constructs the schedule q::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = x, which is also inserted as an initial fragment of an alternative execution.

Parsimonious Race Reversals:

To illustrate POP’s mechanism for reversing each race only once, let us next consider the races in execution E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. There is one race on y, between s::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = y and p::𝑝absentp\!\!:\!italic_p : y = 1. However, this race should not be reversed since it would give back the event order in E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. POP achieves this by forbidding to consider races whose first event (in this case s::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = y) is in some schedule: reversing a race whose first event is in a schedule yields a fragment that was already explored in some previous execution. The execution E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT also exhibits three races on x, all including p::𝑝absentp\!\!:\!italic_p : x = 1. The race with s::𝑠absents\!\!:\!italic_s : f𝑓fitalic_f = x is new, but the two other races have already occurred in E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. These two races should therefore not be considered, since the schedules they would generate have already been generated from the corresponding races in E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. POP achieves this by forbidding to consider races whose second event is not fresh. Intuitively, a fresh event is an event which did not occur with the same 𝚑𝚋𝚑𝚋\xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}% \mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}start_ARROW overtypewriter_hb → end_ARROW predecessors in a previously explored execution (and whose reversal will therefore generate a schedule that did not appear before). More precisely, a second event of a race is fresh if it happens-after (in the 𝚑𝚋𝚑𝚋\xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}% \mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}start_ARROW overtypewriter_hb → end_ARROW order) the last event of each schedule that appears between the two racing events. Returning to the three races on x in E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, the only second event which is fresh is s::𝑠absents\!\!:\!italic_s : f𝑓fitalic_f = x. Therefore, only this race is considered; it generates the schedule s: d = z;s: a = y;s: f = xs: d = zs: a = ys: f = x\mbox{$s\!\!:\!$ {$d$\,=\,{z}}};\mbox{$s\!\!:\!$ {$a$\,=\,{y}}};\mbox{$s\!\!:% \!$ {$f$\,=\,{x}}}italic_s : italic_d typewriter_= typewriter_z ; italic_s : italic_a typewriter_= typewriter_y ; italic_s : italic_f typewriter_= typewriter_x, which is inserted as the start of an alternative exploration (top right in Fig. 1).

Eager Race Reversal Strategy:

POP as described so far can in principle be implemented so that the schedules constructed as alternative continuations of an event are all collected before they are explored. However, such a strategy can in the worst case consume memory that is exponential in the program size. The reason is that for some programs, the number of schedules that branch off at a particular point in an execution may become exponential in the size of the program; this was first observed by Abdulla et al. [optimal-dpor-jacm, Sect. 9]; an illustrating shared-variable program is given in [TruSt@POPL-22, Sect. 2.3]. POP avoids this problem by exploring schedules eagerly: immediately after the creation of a schedule, exploration switches to continuations of that schedule. This policy can be realized by an algorithm that calls a recursive function to initiate exploration of a new schedule. Just for illustration, we show a simple such algorithm in LABEL:sec:simple-dpor, which however still has worst-case exponential memory consumption, due to a problem that is solved by our parsimonious characterization of sleep sets below. We establish, in Lemma 1, that the recursion depth of such an algorithm is at most n(n1)/2𝑛𝑛12n(n-1)/2italic_n ( italic_n - 1 ) / 2, where n𝑛nitalic_n is the length longest execution of the analyzed program.

Parsimonious Characterization of Sleep Sets:

Even though the parsimonious race reversal strategy guarantees that the initial fragments of alternative executions are inequivalent, one must prevent that their continuations become equivalent. To illustrate this problem, let us return to the reversal of races on x in Fig. 1 (in both E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT). The top leftmost event p::𝑝absentp\!\!:\!italic_p : x = 1 races with three fresh reads on x: two in E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and one in E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, shown in teal color. From these races, three schedules are constructed as initial fragments of alternative explorations: r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x and q::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = x from E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s: d = z;s: e = y;s: f = xs: d = zs: e = ys: f = x\mbox{$s\!\!:\!$ {$d$\,=\,{z}}};\mbox{$s\!\!:\!$ {$e$\,=\,{y}}};\mbox{$s\!\!:% \!$ {$f$\,=\,{x}}}italic_s : italic_d typewriter_= typewriter_z ; italic_s : italic_e typewriter_= typewriter_y ; italic_s : italic_f typewriter_= typewriter_x from E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. At this point, we need to be careful: these schedules are not conflicting, since they all read from their common shared variable x. There is then a danger that the first schedule will be continued using the second, and the second will be continued using the first; we would then explore two equivalent executions, consisting of these two schedules in either order. The standard DPOR technique for avoiding such redundant exploration is sleep sets [Godefroid:thesis]. In its standard form, a sleep set is a set of events that should not be performed before some conflicting event. Since POP uses schedules as beginnings of alternative explorations, the appropriate adaptation would be to let a sleep set be a set of schedules that should not be performed unless some conflicting event is performed before that. In Fig. 1, this would mean that after exploring the continuations of r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x, this schedule is added to the sleep set when starting to explore the continuations of q::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = x, which is thereafter also added to the sleep set when starting exploration from s: d = z;s: e = y;s: f = xs: d = zs: e = ys: f = x\mbox{$s\!\!:\!$ {$d$\,=\,{z}}};\mbox{$s\!\!:\!$ {$e$\,=\,{y}}};\mbox{$s\!\!:% \!$ {$f$\,=\,{x}}}italic_s : italic_d typewriter_= typewriter_z ; italic_s : italic_e typewriter_= typewriter_y ; italic_s : italic_f typewriter_= typewriter_x from E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. This mechanism is simple to combine with eager exploration of schedules; in fact, we show a pseudocode version of it in LABEL:sec:simple-dpor. Unfortunately, there are programs where the number of schedules that conflict with a write event is exponential, whence the memory consumption may become exponential in the size of the program, because of large sleep sets. POP avoids this problem by a parsimonious characterization of sleep sets, which consumes memory that is linear in the size of the program. We describe it below.

Consider the schedules that are generated from races with p::𝑝absentp\!\!:\!italic_p : x = 1 in E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (the racing read events are colored teal). Now order these schedules in post-order w.r.t. the position of the racing events in the exploration tree; in this case r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x is first, q::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = x is second, and s: d = z;s: e = y;s: f = xs: d = zs: e = ys: f = x\mbox{$s\!\!:\!$ {$d$\,=\,{z}}};\mbox{$s\!\!:\!$ {$e$\,=\,{y}}};\mbox{$s\!\!:% \!$ {$f$\,=\,{x}}}italic_s : italic_d typewriter_= typewriter_z ; italic_s : italic_e typewriter_= typewriter_y ; italic_s : italic_f typewriter_= typewriter_x is third. POP’s parsimonious sleep set characterization then enforces the policy that an exploration starting with one of these schedules must not explore a schedule that succeeds it in this post-order. These succeeding schedules are thus implicitly in a sleep set, which is parsimoniously characterized using linear space, even though the number of succeeding schedules may be exponential. Let us explain how this is done for the first schedule r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x. The schedules that must be avoided in explorations starting with r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x are of two kinds:

  1. 1.

    Schedules that precede it in the execution where r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x is fresh (in this case q::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = x); POP puts all such schedules into a sleep set which will accompany the exploration starting with r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x.

  2. 2.

    Schedules that arise from fresh read events in executions that branch off from E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to its right (in this case s: d = z;s: e = y;s: f = xs: d = zs: e = ys: f = x\mbox{$s\!\!:\!$ {$d$\,=\,{z}}};\mbox{$s\!\!:\!$ {$e$\,=\,{y}}};\mbox{$s\!\!:% \!$ {$f$\,=\,{x}}}italic_s : italic_d typewriter_= typewriter_z ; italic_s : italic_e typewriter_= typewriter_y ; italic_s : italic_f typewriter_= typewriter_x). Because such a read must be fresh, its schedule must be in conflict with the execution from which it branches off (E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT in this case). To detect such reads, POP creates a conflict detector for the sequence w𝑤witalic_w, where w𝑤witalic_w is the sequence of events between the racing events. Such a conflict detector is a mechanism that for each event in an execution checks whether it conflicts with w𝑤witalic_w or happens-after an event that conflicts with w𝑤witalic_w. It is represented as a pair of form H,C𝐻𝐶\langle H,C\rangle⟨ italic_H , italic_C ⟩, where H𝐻Hitalic_H is a sequence of events and C𝐶Citalic_C is a set of threads and accesses, initialized to w,𝑤\langle w,\emptyset\rangle⟨ italic_w , ∅ ⟩, which will accompany the exploration starting with r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x. During that exploration, H𝐻Hitalic_H will be the subsequence of w𝑤witalic_w that has not been exhibited in the exploration and against which conflicts must still be detected, and C𝐶Citalic_C is the set of threads and accesses that have already been observed to be in conflict with w𝑤witalic_w.

Let us illustrate the working of the sleep set and conflict detector on the exploration of E3subscript𝐸3E_{3}italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT in Fig. 1. When the exploration from r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x starts, the sleep set is initialized with {q: a = x}q: a = x\left\{{\mbox{$q\!\!:\!$ {$a$\,=\,{x}}}}\right\}{ italic_q : italic_a typewriter_= typewriter_x }, and the conflict detector with p: y = 1;q: a = x,p: y = 1q: a = x\langle\mbox{$p\!\!:\!$ {{y}\,=\,1}};\mbox{$q\!\!:\!$ {$a$\,=\,{x}}},\ \emptyset\rangle⟨ italic_p : typewriter_y typewriter_= typewriter_1 ; italic_q : italic_a typewriter_= typewriter_x , ∅ ⟩. For selecting the next event, the sleep set prevents performing q::𝑞absentq\!\!:\!italic_q : a𝑎aitalic_a = x before any write on x. The next event s::𝑠absents\!\!:\!italic_s : d𝑑ditalic_d = z in E3subscript𝐸3E_{3}italic_E start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT is not in conflict with either the sequence in the sleep set or in the conflict detector; in this case s::𝑠absents\!\!:\!italic_s : d𝑑ditalic_d = z is removed from these sequences if present, which in this case has no effect on them. The next event s::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = y is not in conflict with the sleep set, but is in conflict with the conflict detector. Therefore, the conflict detector is updated to p: y = 1;q: a = x,{s,𝚁,y}p: y = 1q: a = x𝑠𝚁y\langle\mbox{$p\!\!:\!$ {{y}\,=\,1}};\mbox{$q\!\!:\!$ {$a$\,=\,{x}}},\ \left\{% {s,\left\langle\mathtt{R},\texttt{y}\right\rangle}\right\}\rangle⟨ italic_p : typewriter_y typewriter_= typewriter_1 ; italic_q : italic_a typewriter_= typewriter_x , { italic_s , ⟨ typewriter_R , y ⟩ } ⟩, which from now on prevents a read on x by thread s𝑠sitalic_s, or happening after a write on y, implying that the event s::𝑠absents\!\!:\!italic_s : f𝑓fitalic_f = x cannot be performed next. Therefore the only non-blocked event is now p::𝑝absentp\!\!:\!italic_p : x = 1; since it is a write on x, the sleep set is emptied and the conflict detector is removed, allowing continuation of the exploration.

The sleep set and conflict detector generated from a particular race can be represented using linear space. At each prefix, the exploration may need to maintain sleep sets and conflict detectors of several preceding races, so that their total size is at most quadratic. By the previous observation that POP keeps at most a quadratic number of executions alive at any point in time, it follows that the space requirement for POP is polynomial. Details are in Theorem 5.2.

3 Programs, Executions, and Equivalence

We consider programs consisting of a finite set of threads that share a finite set of (shared) variables. Each thread has a finite set of local registers and runs a deterministic code, built in a standard way from expressions (over local registers) and atomic commands, using standard control flow constructs (sequential composition, selection, and bounded loop constructs). Atomic commands either write the value of an expression to a shared variable, or assign the value of a shared variable to a register, or can atomically both read and modify a shared variable. Conditional control flow constructs can branch on the value of an expression. From here on, we use t𝑡titalic_t to range over threads, and x,y,z𝑥𝑦𝑧x,y,zitalic_x , italic_y , italic_z to range over shared variables. The local state of a thread is defined as usual by its program counter and the contents of its registers. The global state of a program consists of the local state of each thread together with the valuation of the shared variables. The program has a unique initial state, in which shared variables have predefined initial values. We assume that memory is sequentially consistent.

The execution of a program statement is an event, which affects or is affected by the global state of the program. An event is represented by a tuple t,i,𝚃,x𝑡𝑖𝚃𝑥\left\langle t,i,{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}% {0,0,0.8}{\mathtt{T}}},x\right\rangle⟨ italic_t , italic_i , typewriter_T , italic_x ⟩, where t𝑡titalic_t is the thread performing the event, i𝑖iitalic_i is a positive integer, denoting that the event results from the i𝑖iitalic_i-th execution step in thread t𝑡titalic_t. 𝚃𝚃{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0.8}{\mathtt% {T}}}typewriter_T is the type of the event (either 𝚁𝚁\mathtt{R}typewriter_R for read or 𝚆𝚆\mathtt{W}typewriter_W for write and read-modify-write), and x𝑥xitalic_x is the accessed variable. If e𝑒eitalic_e is the event t,i,𝚃,x𝑡𝑖𝚃𝑥\left\langle t,i,{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}% {0,0,0.8}{\mathtt{T}}},x\right\rangle⟨ italic_t , italic_i , typewriter_T , italic_x ⟩, we write e.𝑡ℎformulae-sequence𝑒𝑡ℎe.\mathit{th}italic_e . italic_th for t𝑡titalic_t, e.𝚃formulae-sequence𝑒𝚃e.{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0.8}{% \mathtt{T}}}italic_e . typewriter_T for 𝚃𝚃{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0.8}{\mathtt% {T}}}typewriter_T, and e.𝑣𝑎𝑟formulae-sequence𝑒𝑣𝑎𝑟e.\mathit{var}italic_e . italic_var for x𝑥xitalic_x. An access is a pair 𝚃,x𝚃𝑥\left\langle{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0.8}{\mathtt{T}}},x\right\rangle⟨ typewriter_T , italic_x ⟩ consisting of a type and a variable. We write e.𝑎𝑐𝑐formulae-sequence𝑒𝑎𝑐𝑐e.\mathit{acc}italic_e . italic_acc for e.𝚃,e.𝑣𝑎𝑟delimited-⟨⟩formulae-sequence𝑒𝚃𝑒𝑣𝑎𝑟\left\langle e.{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0.8}{\mathtt{T}}},e.\mathit{var}\right\rangle⟨ italic_e . typewriter_T , italic_e . italic_var ⟩. We say that two accesses 𝚃,x𝚃𝑥\left\langle{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0.8}{\mathtt{T}}},x\right\rangle⟨ typewriter_T , italic_x ⟩ and 𝚃,xsuperscript𝚃superscript𝑥\left\langle{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0.8}{\mathtt{T}}}^{\prime},x^{\prime}\right\rangle⟨ typewriter_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩ are dependent, denoted 𝚃,x𝚃,x𝚃𝑥superscript𝚃superscript𝑥\left\langle{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0.8}{\mathtt{T}}},x\right\rangle\bowtie\left\langle{\color[rgb]{0,0,0.8}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,0.8}{\mathtt{T}}}^{\prime},x^{% \prime}\right\rangle⟨ typewriter_T , italic_x ⟩ ⋈ ⟨ typewriter_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⟩, if x=x𝑥superscript𝑥x=x^{\prime}italic_x = italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and at least one of 𝚃𝚃{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0.8}{\mathtt% {T}}}typewriter_T and 𝚃superscript𝚃{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0.8}{\mathtt% {T}}}^{\prime}typewriter_T start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is 𝚆𝚆\mathtt{W}typewriter_W. We say that two events e𝑒eitalic_e and esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are dependent, denoted ee𝑒𝑒e\bowtie eitalic_e ⋈ italic_e, if e.𝑡ℎ=e.𝑡ℎformulae-sequence𝑒𝑡ℎsuperscript𝑒𝑡ℎe.\mathit{th}=e^{\prime}.\mathit{th}italic_e . italic_th = italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . italic_th or e.𝑎𝑐𝑐e.𝑎𝑐𝑐formulae-sequence𝑒𝑎𝑐𝑐superscript𝑒𝑎𝑐𝑐e.\mathit{acc}\bowtie e^{\prime}.\mathit{acc}italic_e . italic_acc ⋈ italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . italic_acc. As is customary in DPOR algorithms, we can let an event represent the combined effect of a sequence of statements, if at most one of them accesses a shared variable.

An execution sequence (or just execution) E𝐸Eitalic_E is a finite sequence of events, starting from the initial state of the program. We let 𝚎𝚗𝚊𝚋𝚕𝚎𝚍(E)𝚎𝚗𝚊𝚋𝚕𝚎𝚍𝐸\mathtt{enabled}\left(E\right)typewriter_enabled ( italic_E ) denote the set of events that can be performed in the state to which E𝐸Eitalic_E leads. An execution E𝐸Eitalic_E is maximal if 𝚎𝚗𝚊𝚋𝚕𝚎𝚍(E)=𝚎𝚗𝚊𝚋𝚕𝚎𝚍𝐸\mathtt{enabled}\left(E\right)=\emptysettypewriter_enabled ( italic_E ) = ∅. We let 𝚍𝚘𝚖(E)𝚍𝚘𝚖𝐸{\operatorname{\mathtt{dom}}}\left(E\right)typewriter_dom ( italic_E ) denote the set of events in E𝐸Eitalic_E; we also write eE𝑒𝐸e\in Eitalic_e ∈ italic_E to denote e𝚍𝚘𝚖(E)𝑒𝚍𝚘𝚖𝐸e\in{\operatorname{\mathtt{dom}}}\left(E\right)italic_e ∈ typewriter_dom ( italic_E ). We use u𝑢uitalic_u and w𝑤witalic_w, possibly with superscripts, to range over sequences of events (not necessarily starting from the initial state), \langle\rangle⟨ ⟩ to denote the empty sequence, and edelimited-⟨⟩𝑒\langle e\rangle⟨ italic_e ⟩ to denote the sequence with only the event e𝑒eitalic_e. We let ww𝑤superscript𝑤w\cdot w^{\prime}italic_w ⋅ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT denote the concatenation of sequences w𝑤witalic_w and wsuperscript𝑤w^{\prime}italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and let we𝑤𝑒w\!\setminus\!eitalic_w ∖ italic_e denote the sequence w𝑤witalic_w with the first occurrence of e𝑒eitalic_e (if any) removed. For a sequence u=e1e2em𝑢subscript𝑒1subscript𝑒2subscript𝑒𝑚u=e_{1}\cdot e_{2}\cdot\ldots\cdot e_{m}italic_u = italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ … ⋅ italic_e start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, we let wu𝑤𝑢w\!\setminus\!uitalic_w ∖ italic_u denote (((we1)e2))em𝑤subscript𝑒1subscript𝑒2subscript𝑒𝑚(\cdots((w\!\setminus\!e_{1})\!\setminus\!e_{2})\!\setminus\!\cdots)\!% \setminus\!e_{m}( ⋯ ( ( italic_w ∖ italic_e start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) ∖ italic_e start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) ∖ ⋯ ) ∖ italic_e start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

The basis for a DPOR algorithm is an equivalence relation on the set of execution sequences. The definition of this equivalence is based on a happens-before relation on the events of each execution sequence, which captures the data and control dependencies that must be respected by any equivalent execution.

Definition 1 (Happens-before)

Given an execution sequence E𝐸Eitalic_E, we define the happens-before relation on E𝐸Eitalic_E, denoted 𝚑𝚋Esubscript𝚑𝚋𝐸\xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}% \mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}_{E}start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT, as the smallest irreflexive partial order on 𝚍𝚘𝚖(E)𝚍𝚘𝚖𝐸{\operatorname{\mathtt{dom}}}\left(E\right)typewriter_dom ( italic_E ) such that e𝚑𝚋Eesubscript𝚑𝚋𝐸𝑒superscript𝑒e\xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1% }\mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}_{E}e^{\prime}italic_e start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT if e𝑒eitalic_e occurs before esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in E𝐸Eitalic_E, and ee𝑒superscript𝑒e\bowtie e^{\prime}italic_e ⋈ italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

The 𝚑𝚋𝚑𝚋\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\mathtt{hb}% \color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}typewriter_hb-trace (or trace for short) of E𝐸Eitalic_E is the directed graph (𝚍𝚘𝚖(E),𝚑𝚋E)𝚍𝚘𝚖𝐸subscript𝚑𝚋𝐸({\operatorname{\mathtt{dom}}}\left(E\right),\ \xrightarrow{\color[rgb]{0,0,1}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\mathtt{hb}\color[rgb]{0,0,0}% \definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}% \pgfsys@color@gray@fill{0}}_{E})( typewriter_dom ( italic_E ) , start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT ).

Definition 2 (Equivalence)

Two execution sequences E𝐸Eitalic_E and Esuperscript𝐸E^{\prime}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are equivalent, denoted EEsimilar-to-or-equals𝐸superscript𝐸E\simeq E^{\prime}italic_E ≃ italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, if they have the same 𝚑𝚋𝚑𝚋\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}\mathtt{hb}% \color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}typewriter_hb-trace. We let [E]subscriptdelimited-[]𝐸similar-to-or-equals[E]_{\simeq}[ italic_E ] start_POSTSUBSCRIPT ≃ end_POSTSUBSCRIPT denote the equivalence class of E𝐸Eitalic_E.

The equivalence relation similar-to-or-equals\simeq partitions the set of execution sequences into equivalence classes, paving the way for an optimal DPOR algorithm which explores precisely one execution in each equivalence class.

4 Designing the DPOR Algorithm

In this section, we explain the design of POP, which is optimal in the sense that it explores precisely one execution in each equivalence class defined by Definition 2. We first need some auxiliary definitions

Definition 3 (Compatible sequences and happens-before prefix)

For two execution sequences Ew𝐸𝑤E\cdot witalic_E ⋅ italic_w and Ew𝐸superscript𝑤E\cdot w^{\prime}italic_E ⋅ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT,

  • the sequences w𝑤witalic_w and wsuperscript𝑤w^{\prime}italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are called compatible, denoted wwsimilar-to𝑤superscript𝑤w\sim w^{\prime}italic_w ∼ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, iff there are sequences w′′superscript𝑤′′w^{\prime\prime}italic_w start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT and w′′′superscript𝑤′′′w^{\prime\prime\prime}italic_w start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT s.t. Eww′′Eww′′′similar-to-or-equals𝐸𝑤superscript𝑤′′𝐸superscript𝑤superscript𝑤′′′E\cdot w\cdot w^{\prime\prime}\simeq E\cdot w^{\prime}\cdot w^{\prime\prime\prime}italic_E ⋅ italic_w ⋅ italic_w start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ≃ italic_E ⋅ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ italic_w start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT,

  • the sequence w𝑤witalic_w is called a happens-before prefix of wsuperscript𝑤w^{\prime}italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, denoted wwsquare-image-of-or-equals𝑤superscript𝑤w\sqsubseteq w^{\prime}italic_w ⊑ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, iff there is a sequence w′′superscript𝑤′′w^{\prime\prime}italic_w start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT s.t. Eww′′Ewsimilar-to-or-equals𝐸𝑤superscript𝑤′′𝐸superscript𝑤E\cdot w\cdot w^{\prime\prime}\simeq E\cdot w^{\prime}italic_E ⋅ italic_w ⋅ italic_w start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ≃ italic_E ⋅ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

We illustrate the definition on the program in Fig. 1. Let w𝑤witalic_w be the sequence of events in s: d = z;s: e = ys: d = zs: e = y\mbox{$s\!\!:\!$ {$d$\,=\,{z}}};\mbox{$s\!\!:\!$ {$e$\,=\,{y}}}italic_s : italic_d typewriter_= typewriter_z ; italic_s : italic_e typewriter_= typewriter_y, and wsuperscript𝑤w^{\prime}italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be the sequence q: a = x;r: c = x;s: d = z;s: e = yq: a = xr: c = xs: d = zs: e = y\mbox{$q\!\!:\!$ {$a$\,=\,{x}}};\mbox{$r\!\!:\!$ {$c$\,=\,{x}}};\mbox{$s\!\!:% \!$ {$d$\,=\,{z}}};\mbox{$s\!\!:\!$ {$e$\,=\,{y}}}italic_q : italic_a typewriter_= typewriter_x ; italic_r : italic_c typewriter_= typewriter_x ; italic_s : italic_d typewriter_= typewriter_z ; italic_s : italic_e typewriter_= typewriter_y. Then wwsquare-image-of-or-equals𝑤superscript𝑤w\sqsubseteq w^{\prime}italic_w ⊑ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT since Eww′′Ewsimilar-to-or-equals𝐸𝑤superscript𝑤′′𝐸superscript𝑤E\cdot w\cdot w^{\prime\prime}\simeq E\cdot w^{\prime}italic_E ⋅ italic_w ⋅ italic_w start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ≃ italic_E ⋅ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT where w′′superscript𝑤′′w^{\prime\prime}italic_w start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT is the sequence q: a = x;r: c = xq: a = xr: c = x\mbox{$q\!\!:\!$ {$a$\,=\,{x}}};\mbox{$r\!\!:\!$ {$c$\,=\,{x}}}italic_q : italic_a typewriter_= typewriter_x ; italic_r : italic_c typewriter_= typewriter_x and E𝐸Eitalic_E is some preceding execution. Let next u𝑢uitalic_u be the sequence s: d = z;s: e = y;s: f = xs: d = zs: e = ys: f = x\mbox{$s\!\!:\!$ {$d$\,=\,{z}}};\mbox{$s\!\!:\!$ {$e$\,=\,{y}}};\mbox{$s\!\!:% \!$ {$f$\,=\,{x}}}italic_s : italic_d typewriter_= typewriter_z ; italic_s : italic_e typewriter_= typewriter_y ; italic_s : italic_f typewriter_= typewriter_x, and usuperscript𝑢u^{\prime}italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT be the sequence p: y = 1;q: a = xp: y = 1q: a = x\mbox{$p\!\!:\!$ {{y}\,=\,1}};\mbox{$q\!\!:\!$ {$a$\,=\,{x}}}italic_p : typewriter_y typewriter_= typewriter_1 ; italic_q : italic_a typewriter_= typewriter_x, then u≁unot-similar-to𝑢superscript𝑢u\not\sim u^{\prime}italic_u ≁ italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, since u𝑢uitalic_u and usuperscript𝑢u^{\prime}italic_u start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT’s first accesses to y are in conflict.

Definition 4 (Schedule)

A sequence of events σ𝜎\sigmaitalic_σ is called a schedule if all its events happen-before its last one, i.e., e𝚑𝚋e𝚑𝚋superscript𝑒𝑒e^{\prime}\xrightarrow{{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{% rgb}{0,0,1}{\mathtt{{hb}}}}}eitalic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_ARROW overtypewriter_hb → end_ARROW italic_e where e𝑒eitalic_e is its last event, and esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is any other event in σ𝜎\sigmaitalic_σ. The last event e𝑒eitalic_e of a schedule σ𝜎\sigmaitalic_σ is called the head of σ𝜎\sigmaitalic_σ, sometimes denoted hd(σ)𝑑𝜎{hd}\left(\sigma\right)italic_h italic_d ( italic_σ ). We sometimes call σ𝜎\sigmaitalic_σ a read schedule if hd(σ)𝑑𝜎{hd}\left(\sigma\right)italic_h italic_d ( italic_σ ) is a read event. For an execution sequence Ew𝐸𝑤E\cdot witalic_E ⋅ italic_w and event ew𝑒𝑤e\in witalic_e ∈ italic_w, define the schedule ewsuperscript𝑤𝑒absent{e}\downarrow^{w}italic_e ↓ start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT to be the subsequence wsuperscript𝑤w^{\prime}italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of w𝑤witalic_w such that (i) ew𝑒superscript𝑤e\in w^{\prime}italic_e ∈ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, and (ii) for each ewsuperscript𝑒𝑤e^{\prime}\in witalic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_w it holds that ewsuperscript𝑒superscript𝑤e^{\prime}\in w^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ∈ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT iff e𝚑𝚋Ewesubscript𝚑𝚋𝐸𝑤superscript𝑒𝑒e^{\prime}\xrightarrow{{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{% rgb}{0,0,1}{\mathtt{{hb}}}}}_{E\cdot w}eitalic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E ⋅ italic_w end_POSTSUBSCRIPT italic_e.

4.1 Parsimonious Race Reversals

A central mechanism of many DPOR algorithms is to detect and reverse races. Intuitively, a race is a conflict between two consecutive accesses to a shared variable, where at least one access writes to the variable (i.e., it is a write or a read-modify-write).

Definition 5 (Race)

Let E𝐸Eitalic_E be an execution sequence. Two events e𝑒eitalic_e and esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in E𝐸Eitalic_E are racing in E𝐸Eitalic_E if (i) e𝑒eitalic_eand esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are performed by different threads, (ii) e𝚑𝚋Eesubscript𝚑𝚋𝐸𝑒superscript𝑒e\xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1% }\mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}_{E}e^{\prime}italic_e start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. (iii) there is no event e′′superscript𝑒′′e^{\prime\prime}italic_e start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT with e𝚑𝚋Ee′′𝚑𝚋Eesubscript𝚑𝚋𝐸𝑒superscript𝑒′′subscript𝚑𝚋𝐸superscript𝑒e\xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1% }\mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}_{E}e^{\prime\prime}% \xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}% \mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}_{E}e^{\prime}italic_e start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.

Intuitively, a race arises when two different threads perform dependent accesses to a shared variable, which are adjacent in the 𝚑𝚋Esubscript𝚑𝚋𝐸\xrightarrow{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}% \mathtt{hb}\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}% \pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}}_{E}start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT order. If e𝑒eitalic_e and esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are racing in E𝐸Eitalic_E, then to reverse the race, E𝐸Eitalic_E is decomposed as E=E1eE2𝐸subscript𝐸1𝑒subscript𝐸2E=E_{1}\cdot e\cdot E_{2}italic_E = italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_e ⋅ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, thereafter the schedule σ=eE2𝜎superscript𝑒superscriptsubscript𝐸2absent\sigma={e^{\prime}}\downarrow^{E_{2}}italic_σ = italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↓ start_POSTSUPERSCRIPT italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is formed as the initial fragment of an alternative execution, which extends E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

The key idea of parsimonious race reversal is to reverse a race only if such a reversal generates an execution that has not been explored before. To be able to do so, POP remembers whenever an event in a new execution is in a schedule, and whether it is a schedule head. This can be done, e.g., by marking events in schedules, and specifically marking the schedule head. From now on, we consider such markings to be included in the events of executions. They play an important role in selecting races.

Definition 6 (Fresh event)

For an execution Ewew𝐸𝑤𝑒superscript𝑤E\cdot w\cdot e\cdot w^{\prime}italic_E ⋅ italic_w ⋅ italic_e ⋅ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, the event e𝑒eitalic_e is called fresh in wew𝑤𝑒superscript𝑤w\cdot e\cdot w^{\prime}italic_w ⋅ italic_e ⋅ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT after E𝐸Eitalic_E if (i) if e𝑒eitalic_e is in a schedule, then it is the head of that schedule, and (ii) for each head ehsubscript𝑒e_{h}italic_e start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT of a schedule in w𝑤witalic_w it is the case that eh𝚑𝚋Eweesubscript𝚑𝚋𝐸𝑤𝑒subscript𝑒𝑒e_{h}\xrightarrow{{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,1}{\mathtt{{hb}}}}}_{E\cdot w\cdot e}eitalic_e start_POSTSUBSCRIPT italic_h end_POSTSUBSCRIPT start_ARROW overtypewriter_hb → end_ARROW start_POSTSUBSCRIPT italic_E ⋅ italic_w ⋅ italic_e end_POSTSUBSCRIPT italic_e.

Definition 7 (Parsimonious race)

Let E𝐸Eitalic_E be an execution sequence. Two events e𝑒eitalic_e and esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in E𝐸Eitalic_E are in a parsimonious race, denoted eEe𝑒subscriptless-than-or-similar-to𝐸superscript𝑒{e}\operatorname{\mathtt{\lesssim}}_{E}{e^{\prime}}italic_e ≲ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT if (i) e𝑒eitalic_eand esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT are racing in E𝐸Eitalic_E, (ii) e𝑒eitalic_eis not in a schedule in E𝐸Eitalic_E, and (iii) esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPTis fresh in we𝑤superscript𝑒w\cdot e^{\prime}italic_w ⋅ italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT after E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, where E=E1ewew𝐸subscript𝐸1𝑒𝑤superscript𝑒superscript𝑤E=E_{1}\cdot e\cdot w\cdot e^{\prime}\cdot w^{\prime}italic_E = italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_e ⋅ italic_w ⋅ italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT

Conditions (ii) and (iii) are the additional conditions for a race to be parsimonious. They filter out races, whose reversals would lead to previously explored executions. Let us provide the intuition behind these conditions. (ii) If e𝑒eitalic_e is in a schedule, then that schedule, call it σ𝜎\sigmaitalic_σ, was generated by a race in an earlier explored execution Esuperscript𝐸E^{\prime}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Hence σ𝜎\sigmaitalic_σ was contained in Esuperscript𝐸E^{\prime}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. Moreover esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT would race with the head of σ𝜎\sigmaitalic_σ also in Esuperscript𝐸E^{\prime}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT; if esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT appeared after σ𝜎\sigmaitalic_σ the resulting new schedule had been generated already in Esuperscript𝐸E^{\prime}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT; if esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT appeared before σ𝜎\sigmaitalic_σ, then we would only undo the previous race reversal. This is illustrated in Fig. 1 by the race on y, between s::𝑠absents\!\!:\!italic_s : e𝑒eitalic_e = y and p::𝑝absentp\!\!:\!italic_p : y = 1 in E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. (iii) If esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is not fresh, then esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT appeared with the same happens-before predecessors in an earlier explored execution Esuperscript𝐸E^{\prime}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, where it was in a race that would generate the same schedule as in E𝐸Eitalic_E. This is illustrated in Fig. 1 by the race on x, between p::𝑝absentp\!\!:\!italic_p : x = 1 and r::𝑟absentr\!\!:\!italic_r : c𝑐citalic_c = x in E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, which was already considered in E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT.

4.2 Eager Race Reversal Strategy

A DPOR algorithm with parsimonious race reversal could be implemented so that the schedules constructed from races are all collected before they are explored. However, for some programs, the number of alternative schedules from a program point can be exponential. In order not to consume exponential memory, POP explores schedules eagerly: immediately after the creation of a schedule, exploration switches to continuations of that schedule. This policy can be realized by an algorithm that calls a recursive function to initiate exploration of a new schedule. To illustrate this policy, we show a simple realization of it in LABEL:alg:dpor-bj-sleepsets-recursive (Simple), in LABEL:sec:simple-dpor. We establish (in Lemma 1) that the recursion depth of such an algorithm is at most n(n1)/2𝑛𝑛12n(n-1)/2italic_n ( italic_n - 1 ) / 2, where n𝑛nitalic_n is the length longest execution of the analyzed program.

4.3 Parsimonious Sleep Set Characterization

As described in Section 2, a naïve use of sleep sets by POP may in the worst case consume exponential memory. Instead, POP uses a different mechanism for avoiding redundant exploration, in which each explored execution E𝐸Eitalic_E is equipped with

  • a sleep set, denoted 𝑠𝑙𝑒𝑒𝑝[E]𝑠𝑙𝑒𝑒𝑝delimited-[]𝐸\mathit{sleep}[E]italic_sleep [ italic_E ], which is a set of read schedules that must not be performed during subsequent exploration from E𝐸Eitalic_E; more precisely, the algorithm must not explore an execution E.w′′′formulae-sequence𝐸superscript𝑤′′′E.w^{\prime\prime\prime}italic_E . italic_w start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT such that sw′′′square-image-of-or-equals𝑠superscript𝑤′′′s\sqsubseteq w^{\prime\prime\prime}italic_s ⊑ italic_w start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT for some s𝑠𝑙𝑒𝑒𝑝[E]𝑠𝑠𝑙𝑒𝑒𝑝delimited-[]𝐸s\in\mathit{sleep}[E]italic_s ∈ italic_sleep [ italic_E ], and

  • a conflict map, denoted 𝑐𝑓𝑙[E]𝑐𝑓𝑙delimited-[]𝐸\mathit{cfl}[E]italic_cfl [ italic_E ], which checks that the subsequent exploration from E𝐸Eitalic_E does not perform a read event, which is in a parsimonious race whose generated schedule conflicts with a previous execution, as described in Section 2. The description in Section 2 has made the simplification to check for races that conflict with a previous execution without checking that such a race also satisfies the freshness condition (condition (iii) in Definition 7). In the following paragraphs, we will first explain the technique in this simplified setting, and thereafter explain how it is extended to fully consider the freshness condition. Note that the simplified technique is sufficient when there are no additional schedules betwen the racing events in the checked-for races.

Let w𝑤witalic_w be a sequence of events. A conflict detector for w𝑤witalic_w is a mechanism that for each event in an execution checks whether it conflicts with w𝑤witalic_w or happens-after an event that conflicts with w𝑤witalic_w. It is represented as a pair of form H,C𝐻𝐶\langle H,C\rangle⟨ italic_H , italic_C ⟩, where H𝐻Hitalic_H is a sequence of events and C𝐶Citalic_C is a set of threads and accesses. At start of exploration, H,C𝐻𝐶\langle H,C\rangle⟨ italic_H , italic_C ⟩ is initialized as w,𝑤\langle w,\emptyset\rangle⟨ italic_w , ∅ ⟩. During exploration, C𝐶Citalic_C contains the set of threads and accesses that have been performed in conflict with w𝑤witalic_w, as well as threads and accesses that have happened after events that have been performed in conflict with w𝑤witalic_w, whereas H𝐻Hitalic_H is the subsequence of w𝑤witalic_w which could still be performed by events that are not in conflict with w𝑤witalic_w. For an event e𝑒eitalic_e, let e.𝑡ℎ𝑎𝑐𝑐formulae-sequence𝑒𝑡ℎ𝑎𝑐𝑐e.\mathit{thacc}italic_e . italic_thacc denote the set {e.𝑡ℎ,e.𝑎𝑐𝑐}formulae-sequence𝑒𝑡ℎ𝑒𝑎𝑐𝑐\left\{{e.\mathit{th},e.\mathit{acc}}\right\}{ italic_e . italic_th , italic_e . italic_acc }. For a set C𝐶Citalic_C of threads and accesses, and an event e𝑒eitalic_e, let eC𝑒𝐶e\bowtie Citalic_e ⋈ italic_C denote e.𝑡ℎC𝑎𝑐𝑐C:e.𝑎𝑐𝑐𝑎𝑐𝑐e.\mathit{th}\in C\lor\exists\mathit{acc}\in C:e.\mathit{acc}\bowtie\mathit{acc}italic_e . italic_th ∈ italic_C ∨ ∃ italic_acc ∈ italic_C : italic_e . italic_acc ⋈ italic_acc. For a pair H,C𝐻𝐶\langle H,C\rangle⟨ italic_H , italic_C ⟩ consisting of a sequence H𝐻Hitalic_H of events and set C𝐶Citalic_C of threads and accesses, and an event e𝑒eitalic_e, let eH,C𝑒𝐻𝐶e\bowtie\langle H,C\rangleitalic_e ⋈ ⟨ italic_H , italic_C ⟩ denote H≁eeCnot-similar-to𝐻𝑒𝑒𝐶H\not\sim e\lor e\bowtie Citalic_H ≁ italic_e ∨ italic_e ⋈ italic_C, i.e., that the thread of e𝑒eitalic_e or the access of e𝑒eitalic_e is in C𝐶Citalic_C or that e𝑒eitalic_e is not compatible with H𝐻Hitalic_H. For a sequence σ𝜎\sigmaitalic_σ, let σH,C𝜎𝐻𝐶\sigma\bowtie\langle H,C\rangleitalic_σ ⋈ ⟨ italic_H , italic_C ⟩ denote (H≁σ)eσ:eC:not-similar-to𝐻𝜎𝑒𝜎𝑒𝐶(H\not\sim\sigma)\lor\exists e\in\sigma:e\bowtie C( italic_H ≁ italic_σ ) ∨ ∃ italic_e ∈ italic_σ : italic_e ⋈ italic_C.

On each explored event e𝑒eitalic_e, a conflict detector H,C𝐻𝐶\langle H,C\rangle⟨ italic_H , italic_C ⟩ is updated to H,(Ce.𝑡ℎ𝑎𝑐𝑐)\langle H,(C\cup e.\mathit{thacc})\rangle⟨ italic_H , ( italic_C ∪ italic_e . italic_thacc ) ⟩ if eH,C𝑒𝐻𝐶e\bowtie\langle H,C\rangleitalic_e ⋈ ⟨ italic_H , italic_C ⟩, otherwise to He,C𝐻𝑒𝐶\langle H\!\setminus\!e\ ,\ C\rangle⟨ italic_H ∖ italic_e , italic_C ⟩ It follows that the event e𝑒eitalic_e is conflicting with the original sequence w𝑤witalic_w if eH,C𝑒𝐻𝐶e\bowtie\langle H,C\rangleitalic_e ⋈ ⟨ italic_H , italic_C ⟩.

We can now describe how the sleep set and conflict map are manipulated. When reversing a parsimonious race eEe𝑒subscriptless-than-or-similar-to𝐸superscript𝑒{e}\operatorname{\mathtt{\lesssim}}_{E}{e^{\prime}}italic_e ≲ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT where esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT is a read on x𝑥xitalic_x, the execution E𝐸Eitalic_E is decomposed as E=E1ewew𝐸subscript𝐸1𝑒𝑤superscript𝑒superscript𝑤E=E_{1}\cdot e\cdot w\cdot e^{\prime}\cdot w^{\prime}italic_E = italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_e ⋅ italic_w ⋅ italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ⋅ italic_w start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT and the schedule σ=ewe𝜎superscript𝑒superscript𝑤superscript𝑒absent\sigma={e^{\prime}}\downarrow^{w\cdot e^{\prime}}italic_σ = italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↓ start_POSTSUPERSCRIPT italic_w ⋅ italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is formed as the initial fragment of an alternative execution, which extends E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. When that execution is explored, starting from E1subscript𝐸1E_{1}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, the sleep set 𝑠𝑙𝑒𝑒𝑝[E1]𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1\mathit{sleep}[E_{1}]italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] is extended with the schedules of form e′′wsuperscript𝑤superscript𝑒′′absent{e^{\prime\prime}}\downarrow^{w}italic_e start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ↓ start_POSTSUPERSCRIPT italic_w end_POSTSUPERSCRIPT such that eE1we′′𝑒subscriptless-than-or-similar-tosubscript𝐸1𝑤superscript𝑒′′{e}\operatorname{\mathtt{\lesssim}}_{E_{1}\cdot w}{e^{\prime\prime}}italic_e ≲ start_POSTSUBSCRIPT italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_w end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT, i.e., for fresh reads on x𝑥xitalic_x that appear in w𝑤witalic_w. The conflict map 𝑐𝑓𝑙[E1]𝑐𝑓𝑙delimited-[]subscript𝐸1\mathit{cfl}[E_{1}]italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] is extended by adding a new conflict detector for w𝑤witalic_w to 𝑐𝑓𝑙[E1](x)𝑐𝑓𝑙delimited-[]subscript𝐸1𝑥\mathit{cfl}[E_{1}](x)italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ( italic_x ), initialized as w,𝑤\langle w,\emptyset\rangle⟨ italic_w , ∅ ⟩.

The sleep set and conflict map are updated during exploration, as follows: on each explored event e𝑒eitalic_e, we obtain 𝑠𝑙𝑒𝑒𝑝[Ee]𝑠𝑙𝑒𝑒𝑝delimited-[]𝐸𝑒\mathit{sleep}[E\cdot e]italic_sleep [ italic_E ⋅ italic_e ] from 𝑠𝑙𝑒𝑒𝑝[E]𝑠𝑙𝑒𝑒𝑝delimited-[]𝐸\mathit{sleep}[E]italic_sleep [ italic_E ] and 𝑐𝑓𝑙[Ee](x)𝑐𝑓𝑙delimited-[]𝐸𝑒𝑥\mathit{cfl}[E\cdot e](x)italic_cfl [ italic_E ⋅ italic_e ] ( italic_x ) from 𝑐𝑓𝑙[E](x)𝑐𝑓𝑙delimited-[]𝐸𝑥\mathit{cfl}[E](x)italic_cfl [ italic_E ] ( italic_x ) for each variable x𝑥xitalic_x as follows:

𝑠𝑙𝑒𝑒𝑝[Ee]={ses𝑠𝑙𝑒𝑒𝑝[E]se}𝑐𝑓𝑙[Ee](x)=if e.𝑎𝑐𝑐=𝚆,x then else[{H,(C{e.𝑡ℎ,e.𝑎𝑐𝑐})H,C𝑐𝑓𝑙[E](x)eH,C}{He,CH,C𝑐𝑓𝑙[E](x)e⋈̸H,C}]\begin{array}[]{lcl}\mathit{sleep}[E\cdot e]&=&\left\{{s\!\setminus\!e}\mid\,{% s\in\mathit{sleep}[E]\land s\sim e}\right\}\\ \mathit{cfl}[E\cdot e](x)&=&\textbf{if }e.\mathit{acc}=\left\langle\mathtt{W},% x\right\rangle\textbf{ then }\emptyset\\ &&\textbf{else}\\ &&\left[\begin{array}[]{ll}&\left\{{\langle H,(C\cup\left\{{e.\mathit{th},e.% \mathit{acc}}\right\})\rangle}\mid\,{\langle H,C\rangle\in\mathit{cfl}[E](x)% \land e\bowtie\langle H,C\rangle}\right\}\\ \cup&\left\{{\langle H\!\setminus\!e\ ,\ C\rangle}\mid\,{\langle H,C\rangle\in% \mathit{cfl}[E](x)\land e\not\bowtie\langle H,C\rangle}\right\}\end{array}% \right]\end{array}start_ARRAY start_ROW start_CELL italic_sleep [ italic_E ⋅ italic_e ] end_CELL start_CELL = end_CELL start_CELL { italic_s ∖ italic_e ∣ italic_s ∈ italic_sleep [ italic_E ] ∧ italic_s ∼ italic_e } end_CELL end_ROW start_ROW start_CELL italic_cfl [ italic_E ⋅ italic_e ] ( italic_x ) end_CELL start_CELL = end_CELL start_CELL if italic_e . italic_acc = ⟨ typewriter_W , italic_x ⟩ then ∅ end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL else end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL end_CELL start_CELL [ start_ARRAY start_ROW start_CELL end_CELL start_CELL { ⟨ italic_H , ( italic_C ∪ { italic_e . italic_th , italic_e . italic_acc } ) ⟩ ∣ ⟨ italic_H , italic_C ⟩ ∈ italic_cfl [ italic_E ] ( italic_x ) ∧ italic_e ⋈ ⟨ italic_H , italic_C ⟩ } end_CELL end_ROW start_ROW start_CELL ∪ end_CELL start_CELL { ⟨ italic_H ∖ italic_e , italic_C ⟩ ∣ ⟨ italic_H , italic_C ⟩ ∈ italic_cfl [ italic_E ] ( italic_x ) ∧ italic_e ⋈̸ ⟨ italic_H , italic_C ⟩ } end_CELL end_ROW end_ARRAY ] end_CELL end_ROW end_ARRAY

that is, the sleep set 𝑠𝑙𝑒𝑒𝑝[E]𝑠𝑙𝑒𝑒𝑝delimited-[]𝐸\mathit{sleep}[E]italic_sleep [ italic_E ] is updated by removing e𝑒eitalic_e from those sequences that are compatible with e𝑒eitalic_e and completely discarding the other sequences; for each variable x𝑥xitalic_x, if e𝑒eitalic_e is a write on x𝑥xitalic_x then 𝑐𝑓𝑙[Ee](x)=𝑐𝑓𝑙delimited-[]𝐸𝑒𝑥\mathit{cfl}[E\cdot e](x)=\emptysetitalic_cfl [ italic_E ⋅ italic_e ] ( italic_x ) = ∅, otherwise the conflict map 𝑐𝑓𝑙[E](x)𝑐𝑓𝑙delimited-[]𝐸𝑥\mathit{cfl}[E](x)italic_cfl [ italic_E ] ( italic_x ) is updated by adding the thread and access of e𝑒eitalic_e to the set C𝐶Citalic_C for each pair H,C𝐻𝐶\langle H,C\rangle⟨ italic_H , italic_C ⟩ that depends with e𝑒eitalic_e, and removing e𝑒eitalic_e from H𝐻Hitalic_H in the others.

It follows that an extension of E𝐸Eitalic_E by an event e𝑒eitalic_e which reads from x𝑥xitalic_x should be blocked if either e𝑠𝑙𝑒𝑒𝑝(E)delimited-⟨⟩𝑒𝑠𝑙𝑒𝑒𝑝𝐸\langle e\rangle\in\mathit{sleep}(E)⟨ italic_e ⟩ ∈ italic_sleep ( italic_E ) or eH,C𝑒𝐻𝐶e\bowtie\langle H,C\rangleitalic_e ⋈ ⟨ italic_H , italic_C ⟩ for some H,C𝑐𝑓𝑙[E](x)𝐻𝐶𝑐𝑓𝑙delimited-[]𝐸𝑥\langle H,C\rangle\in\mathit{cfl}[E](x)⟨ italic_H , italic_C ⟩ ∈ italic_cfl [ italic_E ] ( italic_x ).

Let us now leave the simplified setting and make a full description of the conflict map, for the case that the sequence w𝑤witalic_w between the racing events contains schedules. We then have to work a bit harder to check whether encountered read events are also fresh. In this case w𝑤witalic_w can be written as w0σ1w1σ2σmwmsubscript𝑤0subscript𝜎1subscript𝑤1subscript𝜎2subscript𝜎𝑚subscript𝑤𝑚w_{0}\cdot\sigma_{1}\cdot w_{1}\cdot\sigma_{2}\cdot\cdots\sigma_{m}\cdot w_{m}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ⋅ ⋯ italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ⋅ italic_w start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, where σ1,,σmsubscript𝜎1subscript𝜎𝑚\sigma_{1},\ldots,\sigma_{m}italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are the schedules in w𝑤witalic_w. Let uksubscript𝑢𝑘u_{k}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT denote w0σkwksubscript𝑤0subscript𝜎𝑘subscript𝑤𝑘w_{0}\cdot\cdots\sigma_{k}\cdot w_{k}italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ ⋯ italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⋅ italic_w start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for k=0,,m𝑘0𝑚k=0,\ldots,mitalic_k = 0 , … , italic_m. It must be checked that the new exploration E1w′′′subscript𝐸1superscript𝑤′′′E_{1}\cdot w^{\prime\prime\prime}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_w start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT starting from E1σsubscript𝐸1𝜎E_{1}\cdot\sigmaitalic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ does not exhibit an event e′′′superscript𝑒′′′e^{\prime\prime\prime}italic_e start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT which reads x𝑥xitalic_x such that the schedule e′′′w′′′superscriptsuperscript𝑤′′′superscript𝑒′′′absent{e^{\prime\prime\prime}}\downarrow^{w^{\prime\prime\prime}}italic_e start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT ↓ start_POSTSUPERSCRIPT italic_w start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is in conflict with uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT for some i𝑖iitalic_i and furthermore e′′′superscript𝑒′′′e^{\prime\prime\prime}italic_e start_POSTSUPERSCRIPT ′ ′ ′ end_POSTSUPERSCRIPT happens-after hd(σi)𝑑subscript𝜎𝑖{hd}\left(\sigma_{i}\right)italic_h italic_d ( italic_σ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) for all ji𝑗𝑖j\leq iitalic_j ≤ italic_i. For this case, 𝑐𝑓𝑙[E](x)𝑐𝑓𝑙delimited-[]𝐸𝑥\mathit{cfl}[E](x)italic_cfl [ italic_E ] ( italic_x ) is for each variable x𝑥xitalic_x extended from being just a conflict detector to being a pair 𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞𝐶𝑠𝑒𝑞𝐷𝑠𝑒𝑞\langle\mathit{Cseq},\mathit{Dseq}\rangle⟨ italic_Cseq , italic_Dseq ⟩, where 𝐶𝑠𝑒𝑞𝐶𝑠𝑒𝑞\mathit{Cseq}italic_Cseq is a tuple of conflict detectors H0,C0,,Hm,Cmsubscript𝐻0subscript𝐶0subscript𝐻𝑚subscript𝐶𝑚\langle\langle H_{0},C_{0}\rangle,\ldots,\langle H_{m},C_{m}\rangle\rangle⟨ ⟨ italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⟩ , … , ⟨ italic_H start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ⟩ ⟩ and 𝐷𝑠𝑒𝑞𝐷𝑠𝑒𝑞\mathit{Dseq}italic_Dseq is a tuple D1,,Dmsubscript𝐷1subscript𝐷𝑚\langle D_{1},\ldots,D_{m}\rangle⟨ italic_D start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ⟩ of sets of threads and accesses. Here each conflict detector Hk,Cksubscript𝐻𝑘subscript𝐶𝑘\langle H_{k},C_{k}\rangle⟨ italic_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ⟩ checks for conflicts with uksubscript𝑢𝑘u_{k}italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and each set Dksubscript𝐷𝑘D_{k}italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT collects threads and accesses that happen after hd(σk)𝑑subscript𝜎𝑘{hd}\left(\sigma_{k}\right)italic_h italic_d ( italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ), initialized with hd(σk).𝑡ℎ𝑎𝑐𝑐formulae-sequence𝑑subscript𝜎𝑘𝑡ℎ𝑎𝑐𝑐{hd}\left(\sigma_{k}\right).\mathit{thacc}italic_h italic_d ( italic_σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) . italic_thacc.

For a pair 𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞𝐶𝑠𝑒𝑞𝐷𝑠𝑒𝑞\langle\mathit{Cseq},\mathit{Dseq}\rangle⟨ italic_Cseq , italic_Dseq ⟩ consisting of a sequence 𝐶𝑠𝑒𝑞𝐶𝑠𝑒𝑞\mathit{Cseq}italic_Cseq of conflict detectors and a sequence 𝐷𝑠𝑒𝑞𝐷𝑠𝑒𝑞\mathit{Dseq}italic_Dseq of sets of threads and accesses, let e𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞𝑒𝐶𝑠𝑒𝑞𝐷𝑠𝑒𝑞e\bowtie\langle\mathit{Cseq},\mathit{Dseq}\rangleitalic_e ⋈ ⟨ italic_Cseq , italic_Dseq ⟩ denote that there is an i𝑖iitalic_i such that eHi,Ci𝑒subscript𝐻𝑖subscript𝐶𝑖e\bowtie\langle H_{i},C_{i}\rangleitalic_e ⋈ ⟨ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ and eDj𝑒subscript𝐷𝑗e\bowtie{D_{j}}italic_e ⋈ italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for all 1ji1𝑗𝑖1\leq j\leq i1 ≤ italic_j ≤ italic_i. For a sequence σ𝜎\sigmaitalic_σ, let σ𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞𝜎𝐶𝑠𝑒𝑞𝐷𝑠𝑒𝑞\sigma\bowtie\langle\mathit{Cseq},\mathit{Dseq}\rangleitalic_σ ⋈ ⟨ italic_Cseq , italic_Dseq ⟩ denote that there is an i such that σHi,Ci𝜎subscript𝐻𝑖subscript𝐶𝑖\sigma\bowtie\langle H_{i},C_{i}\rangleitalic_σ ⋈ ⟨ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ and j:1ji.eσ:eDj\forall j:1\leq j\leq i.\exists e\in\sigma:e\bowtie{D_{j}}∀ italic_j : 1 ≤ italic_j ≤ italic_i . ∃ italic_e ∈ italic_σ : italic_e ⋈ italic_D start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT.

During exploration 𝑐𝑓𝑙[Ee]𝑐𝑓𝑙delimited-[]𝐸𝑒\mathit{cfl}[E\cdot e]italic_cfl [ italic_E ⋅ italic_e ] is obtained from 𝑐𝑓𝑙[E]𝑐𝑓𝑙delimited-[]𝐸\mathit{cfl}[E]italic_cfl [ italic_E ] as described by the function 𝚄𝚙𝚍𝚊𝚝𝚎𝚄𝚙𝚍𝚊𝚝𝚎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Update}}typewriter_Update in Algorithm 1, in the following way. For each variable x𝑥xitalic_x, if e𝑒eitalic_e is a write on x𝑥xitalic_x then 𝑐𝑓𝑙[Ee](x)=𝑐𝑓𝑙delimited-[]𝐸𝑒𝑥\mathit{cfl}[E\cdot e](x)=\emptysetitalic_cfl [ italic_E ⋅ italic_e ] ( italic_x ) = ∅, otherwise the conflict map 𝑐𝑓𝑙[E](x)𝑐𝑓𝑙delimited-[]𝐸𝑥\mathit{cfl}[E](x)italic_cfl [ italic_E ] ( italic_x ) is updated as follows: For each pair 𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞𝐶𝑠𝑒𝑞𝐷𝑠𝑒𝑞\langle\mathit{Cseq},\mathit{Dseq}\rangle⟨ italic_Cseq , italic_Dseq ⟩ in 𝑐𝑓𝑙[E](x)𝑐𝑓𝑙delimited-[]𝐸𝑥\mathit{cfl}[E](x)italic_cfl [ italic_E ] ( italic_x ), then for each pair Hi,Cisubscript𝐻𝑖subscript𝐶𝑖\langle H_{i},C_{i}\rangle⟨ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ in 𝐶𝑠𝑒𝑞𝐶𝑠𝑒𝑞\mathit{Cseq}italic_Cseq, if Hi,Cisubscript𝐻𝑖subscript𝐶𝑖\langle H_{i},C_{i}\rangle⟨ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ depends with e𝑒eitalic_e then e.𝑡ℎ𝑎𝑐𝑐formulae-sequence𝑒𝑡ℎ𝑎𝑐𝑐e.\mathit{thacc}italic_e . italic_thacc is added to Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, otherwise e𝑒eitalic_e is removed from Hisubscript𝐻𝑖H_{i}italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Furthermore each set Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in 𝐷𝑠𝑒𝑞𝐷𝑠𝑒𝑞\mathit{Dseq}italic_Dseq is extended with e.𝑡ℎ𝑎𝑐𝑐formulae-sequence𝑒𝑡ℎ𝑎𝑐𝑐e.\mathit{thacc}italic_e . italic_thacc if eDi𝑒subscript𝐷𝑖e\bowtie D_{i}italic_e ⋈ italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT.

It follows that an extension of E𝐸Eitalic_E by an event e𝑒eitalic_e which reads from x𝑥xitalic_x should be blocked if either e𝑠𝑙𝑒𝑒𝑝(E)delimited-⟨⟩𝑒𝑠𝑙𝑒𝑒𝑝𝐸\langle e\rangle\in\mathit{sleep}(E)⟨ italic_e ⟩ ∈ italic_sleep ( italic_E ) or e𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞𝑒𝐶𝑠𝑒𝑞𝐷𝑠𝑒𝑞e\bowtie\langle\mathit{Cseq},\mathit{Dseq}\rangleitalic_e ⋈ ⟨ italic_Cseq , italic_Dseq ⟩ for some 𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞𝑐𝑓𝑙[E](x)𝐶𝑠𝑒𝑞𝐷𝑠𝑒𝑞𝑐𝑓𝑙delimited-[]𝐸𝑥\langle\mathit{Cseq},\mathit{Dseq}\rangle\in\mathit{cfl}[E](x)⟨ italic_Cseq , italic_Dseq ⟩ ∈ italic_cfl [ italic_E ] ( italic_x ).

4.4 The Parsimonious-OPtimal DPOR (POP) Algorithm

The POP algorithm, shown as Algorithm 1, takes an input program, and explores its executions by repeated calls to the procedure 𝙴𝚡𝚙𝚕𝚘𝚛𝚎𝙴𝚡𝚙𝚕𝚘𝚛𝚎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}typewriter_Explore. For each prefix Esuperscript𝐸E^{\prime}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of an execution that is under exploration, the algorithm maintains a sleep set 𝑠𝑙𝑒𝑒𝑝[E]𝑠𝑙𝑒𝑒𝑝delimited-[]superscript𝐸\mathit{sleep}[E^{\prime}]italic_sleep [ italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ] and a conflict map 𝑐𝑓𝑙[E]𝑐𝑓𝑙delimited-[]superscript𝐸\mathit{cfl}[E^{\prime}]italic_cfl [ italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ], as described in Section 4.3, in order to prevent redundant exploration of read schedules. The algorithm first picks an enabled event e𝑒eitalic_e, initializes the sleep sets of \langle\rangle⟨ ⟩ and edelimited-⟨⟩𝑒\langle e\rangle⟨ italic_e ⟩, as well as the ranges of the conflict maps for \langle\rangle⟨ ⟩ and edelimited-⟨⟩𝑒\langle e\rangle⟨ italic_e ⟩, to the empty set, whereafter it calls 𝙴𝚡𝚙𝚕𝚘𝚛𝚎(e)𝙴𝚡𝚙𝚕𝚘𝚛𝚎delimited-⟨⟩𝑒{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}(\langle e\rangle)typewriter_Explore ( ⟨ italic_e ⟩ ).

1 Pick e𝚎𝚗𝚊𝚋𝚕𝚎𝚍()𝑒𝚎𝚗𝚊𝚋𝚕𝚎𝚍e\in\mathtt{enabled}\left(\langle\rangle\right)italic_e ∈ typewriter_enabled ( ⟨ ⟩ )
2 𝑠𝑙𝑒𝑒𝑝[]=𝑠𝑙𝑒𝑒𝑝[e]=𝑠𝑙𝑒𝑒𝑝delimited-[]𝑠𝑙𝑒𝑒𝑝delimited-[]delimited-⟨⟩𝑒\mathit{sleep}[\langle\rangle]=\mathit{sleep}[\langle e\rangle]=\emptysetitalic_sleep [ ⟨ ⟩ ] = italic_sleep [ ⟨ italic_e ⟩ ] = ∅
3 𝑐𝑓𝑙[](x)=𝑐𝑓𝑙[e](x)=𝑐𝑓𝑙delimited-[]𝑥𝑐𝑓𝑙delimited-[]delimited-⟨⟩𝑒𝑥\mathit{cfl}[\langle\rangle](x)=\mathit{cfl}[\langle e\rangle](x)=\emptysetitalic_cfl [ ⟨ ⟩ ] ( italic_x ) = italic_cfl [ ⟨ italic_e ⟩ ] ( italic_x ) = ∅ for each shared variable x𝑥xitalic_x
4 𝙴𝚡𝚙𝚕𝚘𝚛𝚎(e)𝙴𝚡𝚙𝚕𝚘𝚛𝚎delimited-⟨⟩𝑒{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}(\langle e\rangle)typewriter_Explore ( ⟨ italic_e ⟩ )
5 𝙴𝚡𝚙𝚕𝚘𝚛𝚎(E)𝙴𝚡𝚙𝚕𝚘𝚛𝚎𝐸{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}(E)typewriter_Explore ( italic_E )
6      foreach eE𝚕𝚊𝚜𝚝(E)𝑒subscriptless-than-or-similar-to𝐸𝚕𝚊𝚜𝚝𝐸{e}\operatorname{\mathtt{\lesssim}}_{E}{{\mathtt{last}}\left(E\right)}italic_e ≲ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT typewriter_last ( italic_E ) do
7           let E=E1eE2𝐸subscript𝐸1𝑒subscript𝐸2E=E_{1}\cdot e\cdot E_{2}italic_E = italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_e ⋅ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
8           let e=𝚕𝚊𝚜𝚝(E)superscript𝑒𝚕𝚊𝚜𝚝𝐸e^{\prime}={\mathtt{last}}\left(E\right)italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT = typewriter_last ( italic_E )
9           let σ=eE2𝜎superscript𝑒superscriptsubscript𝐸2absent\sigma={e^{\prime}}\downarrow^{E_{2}}italic_σ = italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↓ start_POSTSUPERSCRIPT italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
10           if s𝑠𝑙𝑒𝑒𝑝[E1]:sσ:not-exists𝑠𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1square-image-of-or-equals𝑠𝜎\nexists s\in\mathit{sleep}[E_{1}]:s\sqsubseteq\sigma∄ italic_s ∈ italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] : italic_s ⊑ italic_σ and
11            (e.𝚃=𝚆)formulae-sequencesuperscript𝑒𝚃𝚆(e^{\prime}.{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0.8}{\mathtt{T}}}=\mathtt{W})( italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . typewriter_T = typewriter_W ) or 𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞𝑐𝑓𝑙[E](e.𝑣𝑎𝑟):σ𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞\nexists\langle\mathit{Cseq},\mathit{Dseq}\rangle\in\mathit{cfl}[E](e^{\prime}% .\mathit{var}):\sigma\bowtie\langle\mathit{Cseq},\mathit{Dseq}\rangle∄ ⟨ italic_Cseq , italic_Dseq ⟩ ∈ italic_cfl [ italic_E ] ( italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . italic_var ) : italic_σ ⋈ ⟨ italic_Cseq , italic_Dseq ⟩ then
12                𝑠𝑙𝑒𝑒𝑝[E1σ]={sσs𝑠𝑙𝑒𝑒𝑝[E1]sσ}𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1𝜎conditional-set𝑠𝜎𝑠𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1𝑠similar-to𝜎\mathit{sleep}[E_{1}\cdot\sigma]=\left\{{s\!\setminus\!\sigma}\mid\,{s\in% \mathit{sleep}[E_{1}]\land s\sim\sigma}\right\}italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] = { italic_s ∖ italic_σ ∣ italic_s ∈ italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] ∧ italic_s ∼ italic_σ }
13                𝑐𝑓𝑙[E1σ]=𝚄𝚙𝚍𝚊𝚝𝚎(𝑐𝑓𝑙[E1],σ)𝑐𝑓𝑙delimited-[]subscript𝐸1𝜎𝚄𝚙𝚍𝚊𝚝𝚎𝑐𝑓𝑙delimited-[]subscript𝐸1𝜎\mathit{cfl}[E_{1}\cdot\sigma]={\color[rgb]{0.82,0.1,0.26}\definecolor[named]{% pgfstrokecolor}{rgb}{0.82,0.1,0.26}\mathtt{Update}}(\mathit{cfl}[E_{1}],\sigma)italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] = typewriter_Update ( italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] , italic_σ )
14                if (e.𝚃=𝚁)formulae-sequencesuperscript𝑒𝚃𝚁(e^{\prime}.{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,0.8}{\mathtt{T}}}=\mathtt{R})( italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT . typewriter_T = typewriter_R ) then
15                     𝑠𝑙𝑒𝑒𝑝[E1σ]={e′′E2eEe′′e′′e}\mathit{sleep}[E_{1}\cdot\sigma]\ \cup\!=\ \left\{{{e^{\prime\prime}}% \downarrow^{E_{2}}}\mid\,{{e}\operatorname{\mathtt{\lesssim}}_{E}{e^{\prime% \prime}}\land e^{\prime\prime}\neq e^{\prime}}\right\}italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] ∪ = { italic_e start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ↓ start_POSTSUPERSCRIPT italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ∣ italic_e ≲ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ∧ italic_e start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT ≠ italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT }
16                     let E2=w0σ1w1σmwmesubscript𝐸2subscript𝑤0subscript𝜎1subscript𝑤1subscript𝜎𝑚subscript𝑤𝑚superscript𝑒E_{2}=w_{0}\cdot\sigma_{1}\cdot w_{1}\cdot\cdots\sigma_{m}\cdot w_{m}\cdot e^{\prime}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_w start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ ⋯ italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ⋅ italic_w start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ⋅ italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT
17                     let 𝐶𝑠𝑒𝑞=u0,,,um,𝐶𝑠𝑒𝑞subscript𝑢0subscript𝑢𝑚\mathit{Cseq}=\langle\langle u_{0},\emptyset\rangle,\ldots,\langle u_{m},% \emptyset\rangle\rangleitalic_Cseq = ⟨ ⟨ italic_u start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , ∅ ⟩ , … , ⟨ italic_u start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , ∅ ⟩ ⟩ where ui=(w0σ1wi)σsubscript𝑢𝑖subscript𝑤0subscript𝜎1subscript𝑤𝑖𝜎u_{i}=(w_{0}\cdot\sigma_{1}\cdot\cdots\cdot w_{i})\!\setminus\!\sigmaitalic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = ( italic_w start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ⋅ italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ ⋯ ⋅ italic_w start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) ∖ italic_σ
18                     let 𝐷𝑠𝑒𝑞=hd(σ1).𝑡ℎ𝑎𝑐𝑐,,hd(σm).𝑡ℎ𝑎𝑐𝑐\mathit{Dseq}=\langle{hd}\left(\sigma_{1}\right).\mathit{thacc},\ldots,{hd}% \left(\sigma_{m}\right).\mathit{thacc}\rangleitalic_Dseq = ⟨ italic_h italic_d ( italic_σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) . italic_thacc , … , italic_h italic_d ( italic_σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) . italic_thacc ⟩
19                     𝑐𝑓𝑙[E1σ](x)=𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞limit-from𝑐𝑓𝑙delimited-[]subscript𝐸1𝜎𝑥𝐶𝑠𝑒𝑞𝐷𝑠𝑒𝑞\mathit{cfl}[E_{1}\cdot\sigma](x)\ \cup\!=\langle\mathit{Cseq},\mathit{Dseq}\rangleitalic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] ( italic_x ) ∪ = ⟨ italic_Cseq , italic_Dseq ⟩
20               𝙴𝚡𝚙𝚕𝚘𝚛𝚎(E1σ)𝙴𝚡𝚙𝚕𝚘𝚛𝚎subscript𝐸1𝜎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}(E_{1}\cdot\sigma)typewriter_Explore ( italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ )
21     if e𝚎𝚗𝚊𝚋𝚕𝚎𝚍(E)𝑒𝚎𝚗𝚊𝚋𝚕𝚎𝚍𝐸\exists e\in\mathtt{enabled}\left(E\right)∃ italic_e ∈ typewriter_enabled ( italic_E ) s.t. e𝑠𝑙𝑒𝑒𝑝[E]delimited-⟨⟩𝑒𝑠𝑙𝑒𝑒𝑝delimited-[]𝐸\langle e\rangle\notin\mathit{sleep}[E]⟨ italic_e ⟩ ∉ italic_sleep [ italic_E ] and
22       ((e.𝚃=𝚆)formulae-sequence𝑒𝚃𝚆(e.{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0.8}{% \mathtt{T}}}=\mathtt{W})( italic_e . typewriter_T = typewriter_W ) or 𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞𝑐𝑓𝑙[E](e.𝑣𝑎𝑟):e𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞\nexists\langle\mathit{Cseq},\mathit{Dseq}\rangle\in\mathit{cfl}[E](e.\mathit{% var}):e\bowtie\langle\mathit{Cseq},\mathit{Dseq}\rangle∄ ⟨ italic_Cseq , italic_Dseq ⟩ ∈ italic_cfl [ italic_E ] ( italic_e . italic_var ) : italic_e ⋈ ⟨ italic_Cseq , italic_Dseq ⟩) then
23           𝑠𝑙𝑒𝑒𝑝[Ee]={ses𝑠𝑙𝑒𝑒𝑝[E]se}𝑠𝑙𝑒𝑒𝑝delimited-[]𝐸𝑒conditional-set𝑠𝑒𝑠𝑠𝑙𝑒𝑒𝑝delimited-[]𝐸𝑠similar-to𝑒\mathit{sleep}[E\cdot e]=\left\{{s\!\setminus\!e}\mid\,{s\in\mathit{sleep}[E]% \land s\sim e}\right\}italic_sleep [ italic_E ⋅ italic_e ] = { italic_s ∖ italic_e ∣ italic_s ∈ italic_sleep [ italic_E ] ∧ italic_s ∼ italic_e }
24           𝑐𝑓𝑙[Ee]=𝚄𝚙𝚍𝚊𝚝𝚎(𝑐𝑓𝑙[E],e)𝑐𝑓𝑙delimited-[]𝐸𝑒𝚄𝚙𝚍𝚊𝚝𝚎𝑐𝑓𝑙delimited-[]𝐸delimited-⟨⟩𝑒\mathit{cfl}[E\cdot e]={\color[rgb]{0.82,0.1,0.26}\definecolor[named]{% pgfstrokecolor}{rgb}{0.82,0.1,0.26}\mathtt{Update}}(\mathit{cfl}[E],\langle e\rangle)italic_cfl [ italic_E ⋅ italic_e ] = typewriter_Update ( italic_cfl [ italic_E ] , ⟨ italic_e ⟩ )
25           𝙴𝚡𝚙𝚕𝚘𝚛𝚎(Ee)𝙴𝚡𝚙𝚕𝚘𝚛𝚎𝐸𝑒{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}(E\cdot e)typewriter_Explore ( italic_E ⋅ italic_e )
26          
27𝚄𝚙𝚍𝚊𝚝𝚎(𝑐𝑓𝑙,σ)𝚄𝚙𝚍𝚊𝚝𝚎𝑐𝑓𝑙𝜎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Update}}(\mathit{cfl},\sigma)typewriter_Update ( italic_cfl , italic_σ )
28      foreach e in σ𝑒 in 𝜎e\mbox{ \bf in }\sigmaitalic_e bold_in italic_σ (from beginning to end) do
29           if (e.𝚃=𝚆)formulae-sequence𝑒𝚃𝚆(e.{\color[rgb]{0,0,0.8}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0.8}{% \mathtt{T}}}=\mathtt{W})( italic_e . typewriter_T = typewriter_W ) then 𝑐𝑓𝑙(e.𝑣𝑎𝑟)\mathit{cfl}(e.\mathit{var})italic_cfl ( italic_e . italic_var ) is set to \emptyset
30           foreach variable 𝚢𝚢\mathtt{y}typewriter_y do
31                foreach pair 𝐶𝑠𝑒𝑞,𝐷𝑠𝑒𝑞𝐶𝑠𝑒𝑞𝐷𝑠𝑒𝑞\langle\mathit{Cseq},\mathit{Dseq}\rangle⟨ italic_Cseq , italic_Dseq ⟩ in 𝑐𝑓𝑙(𝚢)𝑐𝑓𝑙𝚢\mathit{cfl}(\mathtt{y})italic_cfl ( typewriter_y ) do
32                     foreach Hi,Cisubscript𝐻𝑖subscript𝐶𝑖\langle H_{i},C_{i}\rangle⟨ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ in 𝐶𝑠𝑒𝑞𝐶𝑠𝑒𝑞\mathit{Cseq}italic_Cseq do
33                          if eHi,Ci𝑒subscript𝐻𝑖subscript𝐶𝑖e\bowtie\langle H_{i},C_{i}\rangleitalic_e ⋈ ⟨ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ then
34                               replace Hi,Cisubscript𝐻𝑖subscript𝐶𝑖\langle H_{i},C_{i}\rangle⟨ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ by Hi,Cie.𝑡ℎ𝑎𝑐𝑐delimited-⟨⟩formulae-sequencesubscript𝐻𝑖subscript𝐶𝑖𝑒𝑡ℎ𝑎𝑐𝑐\langle H_{i},C_{i}\cup e.\mathit{thacc}\rangle⟨ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∪ italic_e . italic_thacc ⟩ in 𝑐𝑓𝑙(𝚢)𝑐𝑓𝑙𝚢\mathit{cfl}(\mathtt{y})italic_cfl ( typewriter_y )
35                         else
36                               replace Hi,Cisubscript𝐻𝑖subscript𝐶𝑖\langle H_{i},C_{i}\rangle⟨ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ by Hie,Cisubscript𝐻𝑖𝑒subscript𝐶𝑖\langle H_{i}\!\setminus\!e,C_{i}\rangle⟨ italic_H start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∖ italic_e , italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⟩ in 𝑐𝑓𝑙(𝚢)𝑐𝑓𝑙𝚢\mathit{cfl}(\mathtt{y})italic_cfl ( typewriter_y )
37                    foreach Disubscript𝐷𝑖D_{i}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in 𝐷𝑠𝑒𝑞𝐷𝑠𝑒𝑞\mathit{Dseq}italic_Dseq do
38                          if eDi𝑒subscript𝐷𝑖e\bowtie D_{i}italic_e ⋈ italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT then Di=e.𝑡ℎ𝑎𝑐𝑐formulae-sequencelimit-fromsubscript𝐷𝑖𝑒𝑡ℎ𝑎𝑐𝑐D_{i}\ \cup=e.\mathit{thacc}italic_D start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∪ = italic_e . italic_thacc
39                         
40     return 𝑐𝑓𝑙𝑐𝑓𝑙\mathit{cfl}italic_cfl
Algorithm 1 POP (Recursive)

Each call to 𝙴𝚡𝚙𝚕𝚘𝚛𝚎(E)𝙴𝚡𝚙𝚕𝚘𝚛𝚎𝐸{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}(E)typewriter_Explore ( italic_E ) consists of a race reversal phase (Algorithms 1 to 1) and an exploration phase (Algorithms 1 to 1). In the race reversal phase, it considers all parsimonious races between an event e𝑒eitalic_e in E𝐸Eitalic_E and the last event esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT of E𝐸Eitalic_E (Algorithm 1). For each such race, of form eEe𝑒subscriptless-than-or-similar-to𝐸superscript𝑒{e}\operatorname{\mathtt{\lesssim}}_{E}{e^{\prime}}italic_e ≲ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT, it decomposes E𝐸Eitalic_E as E1eE2subscript𝐸1𝑒subscript𝐸2E_{1}\cdot e\cdot E_{2}italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_e ⋅ italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (Algorithm 1), and forms the schedule σ𝜎\sigmaitalic_σ that reverses the race as eE2superscriptsubscript𝐸2superscript𝑒absent{e^{\prime}}\downarrow^{E_{2}}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ↓ start_POSTSUPERSCRIPT italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUPERSCRIPT (Algorithm 1). It then intends to call 𝙴𝚡𝚙𝚕𝚘𝚛𝚎(E1σ)𝙴𝚡𝚙𝚕𝚘𝚛𝚎subscript𝐸1𝜎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}(E_{1}\cdot\sigma)typewriter_Explore ( italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ) in order to recursively switch the exploration to the newly reversed race, but before that it needs to check that exploring E1σsubscript𝐸1𝜎E_{1}\cdot\sigmaitalic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ will not be redundant, and also initialize 𝑠𝑙𝑒𝑒𝑝[E1σ]𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1𝜎\mathit{sleep}[E_{1}\cdot\sigma]italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] and 𝑐𝑓𝑙[E1σ]𝑐𝑓𝑙delimited-[]subscript𝐸1𝜎\mathit{cfl}[E_{1}\cdot\sigma]italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ]. To check that exploring E1σsubscript𝐸1𝜎E_{1}\cdot\sigmaitalic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ will not be redundant, it checks that the schedule σ𝜎\sigmaitalic_σ does not include a schedule in 𝑠𝑙𝑒𝑒𝑝[E1]𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1\mathit{sleep}[E_{1}]italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ], and that the head of σ𝜎\sigmaitalic_σ cannot form a schedule that is characterized by 𝑐𝑓𝑙[E1]𝑐𝑓𝑙delimited-[]subscript𝐸1\mathit{cfl}[E_{1}]italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] (Algorithm 1). If this check fails, this race will not lead to a call of 𝙴𝚡𝚙𝚕𝚘𝚛𝚎(E1σ)𝙴𝚡𝚙𝚕𝚘𝚛𝚎subscript𝐸1𝜎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}(E_{1}\cdot\sigma)typewriter_Explore ( italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ). If, on the other hand, the check succeeds, 𝑠𝑙𝑒𝑒𝑝[E1σ]𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1𝜎\mathit{sleep}[E_{1}\cdot\sigma]italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] is computed by first updating it from 𝑠𝑙𝑒𝑒𝑝[E1]𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1\mathit{sleep}[E_{1}]italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] (Algorithm 1), and computing 𝑐𝑓𝑙[E1σ]𝑐𝑓𝑙delimited-[]subscript𝐸1𝜎\mathit{cfl}[E_{1}\cdot\sigma]italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] from 𝑐𝑓𝑙[E1]𝑐𝑓𝑙delimited-[]subscript𝐸1\mathit{cfl}[E_{1}]italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] using the function 𝚄𝚙𝚍𝚊𝚝𝚎𝚄𝚙𝚍𝚊𝚝𝚎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Update}}typewriter_Update (Algorithm 1). If the last event of E𝐸Eitalic_E is a read event, then 𝑠𝑙𝑒𝑒𝑝[E1σ]𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1𝜎\mathit{sleep}[E_{1}\cdot\sigma]italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] and 𝑐𝑓𝑙[E1σ]𝑐𝑓𝑙delimited-[]subscript𝐸1𝜎\mathit{cfl}[E_{1}\cdot\sigma]italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] must be further extended: 𝑠𝑙𝑒𝑒𝑝[E1σ]𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1𝜎\mathit{sleep}[E_{1}\cdot\sigma]italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] is extended with the schedules of parsimonious write-read races eEe′′𝑒subscriptless-than-or-similar-to𝐸superscript𝑒′′{e}\operatorname{\mathtt{\lesssim}}_{E}{e^{\prime\prime}}italic_e ≲ start_POSTSUBSCRIPT italic_E end_POSTSUBSCRIPT italic_e start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT with e′′superscript𝑒′′e^{\prime\prime}italic_e start_POSTSUPERSCRIPT ′ ′ end_POSTSUPERSCRIPT occurring before esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT in E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (Algorithm 1), and 𝑐𝑓𝑙[E1σ]𝑐𝑓𝑙delimited-[]subscript𝐸1𝜎\mathit{cfl}[E_{1}\cdot\sigma]italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ] is extended with the machinery for checking whether the execution will exhibit a read in a parsimonious race on the variable of esuperscript𝑒e^{\prime}italic_e start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT that conflicts with E2subscript𝐸2E_{2}italic_E start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT with σ𝜎\sigmaitalic_σ removed (Algorithm 1). After these preparations, 𝙴𝚡𝚙𝚕𝚘𝚛𝚎(E1σ)𝙴𝚡𝚙𝚕𝚘𝚛𝚎subscript𝐸1𝜎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}(E_{1}\cdot\sigma)typewriter_Explore ( italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_σ ) is called recursively (Algorithm 1).

After the return of all recursive calls initiated in the race reversal phase, 𝙴𝚡𝚙𝚕𝚘𝚛𝚎𝙴𝚡𝚙𝚕𝚘𝚛𝚎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}typewriter_Explore enters the exploration phase. There it picks an event e𝑒eitalic_e that is enabled for execution. Before starting the execution, it must be checked that it is not a read event which is a sequence in the sleep set nor that it completes a schedule that is characterized by 𝑐𝑓𝑙[E]𝑐𝑓𝑙delimited-[]𝐸\mathit{cfl}[E]italic_cfl [ italic_E ]. These checks (Algorithm 1) are done in the same way as before a recursive call to 𝙴𝚡𝚙𝚕𝚘𝚛𝚎𝙴𝚡𝚙𝚕𝚘𝚛𝚎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}typewriter_Explore. If the checks succeed, the call is prepared by updating 𝑠𝑙𝑒𝑒𝑝[E1e]𝑠𝑙𝑒𝑒𝑝delimited-[]subscript𝐸1𝑒\mathit{sleep}[E_{1}\cdot e]italic_sleep [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_e ] (Algorithm 1) and 𝑐𝑓𝑙[E1e]𝑐𝑓𝑙delimited-[]subscript𝐸1𝑒\mathit{cfl}[E_{1}\cdot e]italic_cfl [ italic_E start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ⋅ italic_e ] (at Algorithm 1) in an analogous manner.

5 Correctness and Space Complexity

In this section, we state theorems of correctness, optimality, and space complexity of POP. We first consider correctness and optimality.

Theorem 5.1

For a terminating program P𝑃Pitalic_P, the POP algorithm has the properties that (i) for each maximal execution E𝐸Eitalic_E of P𝑃Pitalic_P, it explores some execution Esuperscript𝐸E^{\prime}italic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT with EEsimilar-to-or-equalssuperscript𝐸𝐸E^{\prime}\simeq Eitalic_E start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ≃ italic_E, and (ii) it never explores two different but equivalent maximal executions, and (iii) it is never blocked (at LABEL:algbrl:explore-check) unless it has explored a maximal execution.

We thereafter consider space complexity.

Lemma 1

The number of nested recursive calls to 𝙴𝚡𝚙𝚕𝚘𝚛𝚎𝙴𝚡𝚙𝚕𝚘𝚛𝚎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}typewriter_Explore at LABEL:algbrl:race-recursive-call is at most n(n1)/2𝑛𝑛12n(n-1)/2italic_n ( italic_n - 1 ) / 2, where n𝑛nitalic_n is the length of the longest execution of the program.

Note that in this lemma, we do not count the calls at LABEL:algbrl:explore-continue, since they are considered as normal exploration of some execution. Only the calls at LABEL:algbrl:race-recursive-call start the exploration of a new execution.

Theorem 5.2

LABEL:alg:bj-pop needs space which is polynomial in n𝑛nitalic_n, where n𝑛nitalic_n is the length of the longest execution of the analyzed program.

6 Implementation and Evaluation

Our implementation was done in a fork of the Nidhugg tool.222The code is publicly available on GitHub in simplified-dpor branch of a Nidhugg’s fork (https://github.com/sarbojit4/nidhugg). We also submitted our implementation as well as the benchmarking infrastructure we have used as an artifact (https://doi.org/10.5281/zenodo.11001033). Nidhugg is a state-of-the-art stateless model checker for C/C++ programs with Pthreads, which works at the level of LLVM Intermediate Representation, typically produced by the Clang compiler. Nidhugg comes with a selection of DPOR algorithms, one of which is Optimal DPOR [abdulla2014optimal] nowadays also enhanced with Partial Loop Purity elimination and support for await statements [Godot@FMCAD-22]. In our Nidhugg fork, we have added the POP algorithm as another selection. Its iterative implementation involved: (i) designing an efficient data structure to simulate recursive calls to 𝙴𝚡𝚙𝚕𝚘𝚛𝚎𝙴𝚡𝚙𝚕𝚘𝚛𝚎{\color[rgb]{0.82,0.1,0.26}\definecolor[named]{pgfstrokecolor}{rgb}{% 0.82,0.1,0.26}\mathtt{Explore}}typewriter_Explore, i.e., follow the next schedule to explore and backtrack to the previous execution when no further races to reverse; and (ii) develo** a procedure that filters out races that are not parsimonious races, and implementing it with the necessary data structures to store and maintain the parsimonious sleep sets.

Table 1: Time and memory performance of three optimal DPOR algorithms on ten benchmark programs which are parametric in the number of threads used.
Time (secs) Memory (MB)
Benchmark Executions TruSt Optimal POP TruSt Optimal POP
circular-buffer(7) 3432 0.62 0.45 0.43 85 84 84
circular-buffer(8) 12870 2.63 1.79 1.66 85 84 84
circular-buffer(9) 48620 11.04 7.21 6.67 85 84 84
fib-bench(4) 19605 1.08 1.93 1.82 85 84 84
fib-bench(5) 218243 14.59 24.66 24.10 85 84 84
fib-bench(6) 2364418 186.25 301.30 297.40 85 84 84
linuxrwlocks(6) 99442 3.61 13.71 12.88 90 91 91
linuxrwlocks(7) 829168 32.75 127.66 121.17 90 91 91
linuxrwlocks(8) 6984234 311.93 1176.13 1119.23 90 91 91
filesystem(22) 512 0.72 0.62 0.34 86 84 84
filesystem(24) 2048 2.84 2.97 1.32 86 187 84
filesystem(26) 8192 11.88 15.71 5.66 85 622 84
indexer(15) 4096 11.07 8.58 5.65 89 116 90
indexer(16) 32768 90.14 80.37 46.46 89 464 90
indexer(17) 262144 736.78 827.02 399.87 89 3030 90
lastzero(10) 3328 0.07 0.34 0.27 85 84 84
lastzero(15) 147456 3.19 24.46 15.09 85 276 84
lastzero(20) 6029312 152.13 1828.92 786.19 85 8883 84
exp-mem3(7) 10080 0.22 0.67 0.54 86 104 85
exp-mem3(8) 80640 1.96 6.15 4.61 86 506 85
exp-mem3(9) 725760 19.11 73.68 44.83 86 4489 85
dispatcher(4) 6854 1.15 1.75 1.47 90 90 90
dispatcher(5) 151032 34.66 55.07 42.76 89 407 90
dispatcher(6) 4057388 1245.13 2333.51 1424.57 89 9097 90
poke(10) 135944 88.54 96.30 63.45 90 791 90
poke(15) 728559 874.76 891.26 479.03 89 5527 90
poke(20) 2366924 4502.45 4356.59 2008.92 90 22383 90
length-param(2,1024) 4 0.14 0.05 0.06 85 84 84
length-param(2,8196) 4 7.95 0.16 0.14 95 101 89
length-param(2,65536) 4 1413.00 1.13 0.90 389 441 343

In the rest of this section, we evaluate the performance of POP’s implementation and compare it, in terms of time and memory, against the implementation of Optimal DPOR in Nidhugg commit 5805d77 and the graph-based Truly Stateless (TruSt) Optimal DPOR algorithm [TruSt@POPL-22] as implemented in GenMC v0.10.0 using options -sc --disable-instruction-caching. All tools employed LLVM 14.0.6. The evaluation was conducted on a desktop with a Ryzen 7950X CPU running Debian 12.4.

Table 1 contains the results of our evaluation. Its first nine benchmarks are from the DPOR literature, and are all parametric on the number of threads (shown in parentheses). The last benchmark, length-param, is synthetic and is additionally parametric on the length of its executions. Since these DPOR algorithms are optimal, they explore the same number of executions (2nd column) in all ten benchmarks. We will analyze the results in five groups (cf. Table 1).

The first group consists of three programs (circular-buffer from SCTBench [thomsonDB16], fib-bench from SV-Comp [svcomp:19], and the linuxrwlocks from SATCheck [SATCheck@OOPSLA-15]). Here, all algorithms consume memory that stays constant as the size of the program and the number of executions explored increase. We can therefore compare the raw performance of the implementation of these three DPOR algorithms. POP’s implementation is fastest on circular-buffer, while TruSt’s is fastest on the two other programs. However, notice that all three implementations scale similarly.

The second group consists of the two benchmarks (filesystem and indexer) from the “classic” DPOR paper of Flanagan and Godefroid [FG:dpor]. Here, Optimal DPOR shows an increase in memory consumption (measured in MB), while the other two algorithms use constant memory. POP is fastest here by approximately 2×2\times2 ×.

The third group, consisting of lastzero [abdulla2014optimal] and exp-mem3,333exp-mem3 is slight variant of the exp-mem program used in the TruSt paper. It uses atomic stores and loads instead of fetch-and-adds (FAAs), because the current implementation of Optimal DPOR (and POP) in Nidhugg employs an optimization which treats independent FAAs as non-conflicting [Godot@FMCAD-22] and explores only one trace on the exp-mem program independently of the benchmark’s parameter. two synthetic benchmarks also used in the TruSt paper [TruSt@POPL-22, Table 1], shows a similar picture in terms of memory consumption: Optimal DPOR’s increases more noticeably here, while the two other algorithms use memory that stays constant. Time-wise, TruSt is 2–5×\times× faster than POP, which in turn is 2×2\times2 × faster than Optimal.

The fourth group, consisting of two concurrent data structure programs (dispatcher and poke) from the Quasi-Optimal POR paper [Quasi-Optimal@CAV-18], shows Optimal’s memory explosion more profoundly, and provides further evidence of the good memory performance of the TruSt and POP algorithms. Time-wise, there is no clear winner here, with TruSt’s implementation being a bit faster on dispatcher, and with POP’s being faster and scaling slightly better than TruSt’s on poke.

Finally, let us examine the algorithms’ performance on length-param(T𝑇Titalic_T,N𝑁Nitalic_N), a synthetic but simple program in which a number of threads (just two here) issue N𝑁Nitalic_N stores and loads to thread-specific global variables, followed by a store and a load to a variable shared between threads. The total number of executions is just four here, but the executions grow in length. One can clearly see the superior time performance of sequence-based DPOR algorithms, such as Optimal and POP, compared to graph-based algorithms such as TruSt. We can also notice that POP performs slightly better than Optimal in terms of memory.

We mention in passing that we also conducted an experiment in which we used the implementation of POP to check, under the SC memory model, the code of Linux kernel’s Hierarchical Read-Copy Update (Tree RCU) implementation [KoSa:sttt19]. Both Optimal DPOR and POP performed similarly: on our desktop, the experiment finished in about five minutes, exploring a bit over 42 0004200042\,00042 000 executions, and with very low (90MB) memory requirements. Since GenMC cannot handle RCU’s code, we do not include these results in the table, but we offer it as further evidence about POP’s implementation being able to handle “real code” out there.

Wrap** up our evaluation, we can make the following two general claims:

  1. 1.

    Both POP and TruSt live up to their promise about performing SMC exploration which is optimal (w.r.t. the Mazurkiewicz equivalence) but also with polynomial (in fact, in practice, constant) space consumption.

  2. 2.

    The implementation of the POP algorithm consistently outperforms that of Optimal DPOR in Nidhugg. This is mostly due to increased simplicity.