-
Lost in Interpretation: Navigating Challenges in Validating Persistency Models Amid Vague Specs and Stubborn Machines, All with a Sense of Humour
Authors:
Vasileios Klimis,
Alastair F. Donaldson,
Viktor Vafeiadis,
John Wickerson,
Azalea Raad
Abstract:
Memory persistency models provide a foundation for persistent programming by specifying which (and when) writes to non-volatile memory (NVM) become persistent. Memory persistency models for the Intel-x86 and Arm architectures have been formalised, but not empirically validated against real machines. Traditional validation methods %such as %extensive litmus testing used for memory \emph{consistency…
▽ More
Memory persistency models provide a foundation for persistent programming by specifying which (and when) writes to non-volatile memory (NVM) become persistent. Memory persistency models for the Intel-x86 and Arm architectures have been formalised, but not empirically validated against real machines. Traditional validation methods %such as %extensive litmus testing used for memory \emph{consistency} models do not straightforwardly apply because a test program cannot directly observe when its data has become persistent: it cannot distinguish between reading data from a volatile cache and from NVM. We investigate addressing this challenge using a commercial off-the-shelf device that intercepts data on the memory bus and logs all writes in the order they reach the memory. Using this technique we conducted a litmus-testing campaign aimed at empirically validating the persistency guarantees of Intel-x86 and Arm machines. We observed writes propagating to memory out of order, and took steps to build confidence that these observations were not merely artefacts of our testing setup. However, despite gaining high confidence in the trustworthiness of our observation method, our conclusions remain largely negative. We found that the Intel-x86 architecture is not amenable to our approach, and on consulting Intel engineers discovered that there are currently no reliable methods of validating their persistency guarantees. For Arm, we found that even a machine recommended to us by a persistency expert at Arm did not match the formal Arm persistency model, due to a loophole in the specification.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Specifying and Testing GPU Workgroup Progress Models
Authors:
Tyler Sorensen,
Lucas F. Salvador,
Harmit Raval,
Hugues Evrard,
John Wickerson,
Margaret Martonosi,
Alastair F. Donaldson
Abstract:
As GPU availability has increased and programming support has matured, a wider variety of applications are being ported to these platforms. Many parallel applications contain fine-grained synchronization idioms; as such, their correct execution depends on a degree of relative forward progress between threads (or thread groups). Unfortunately, many GPU programming specifications say almost nothing…
▽ More
As GPU availability has increased and programming support has matured, a wider variety of applications are being ported to these platforms. Many parallel applications contain fine-grained synchronization idioms; as such, their correct execution depends on a degree of relative forward progress between threads (or thread groups). Unfortunately, many GPU programming specifications say almost nothing about relative forward progress guarantees between workgroups. Although prior work has proposed a spectrum of plausible progress models for GPUs, cross-vendor specifications have yet to commit to any model.
This work is a collection of tools experimental data to aid specification designers when considering forward progress guarantees in programming frameworks. As a foundation, we formalize a small parallel programming language that captures the essence of fine-grained synchronization. We then provide a means of formally specifying a progress model, and develop a termination oracle that decides whether a given program is guaranteed to eventually terminate with respect to a given progress model. Next, we formalize a constraint for concurrent programs that require relative forward progress to terminate. Using this constraint, we synthesize a large set of 483 progress litmus tests. Combined with the termination oracle, this allows us to determine the expected status of each litmus test -- i.e. whether it is guaranteed eventual termination -- under various progress models. We present a large experimental campaign running the litmus tests across 8 GPUs from 5 different vendors. Our results highlight that GPUs have significantly different termination behaviors under our test suite. Most notably, we find that Apple and ARM GPUs do not support the linear occupancy-bound model, an intuitive progress model defined by prior work and hypothesized to describe the workgroup schedulers of existing GPUs.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
A report on the first virtual PLDI conference
Authors:
Alastair F. Donaldson
Abstract:
This is a report on the PLDI 2020 conference, for which I was General Chair, which was held virtually for the first time as a result of the COVID-19 pandemic. The report contains: my personal reflections on the positive and negative aspects of the event; a description of the format of the event and associated logistical details; and data (with some analysis) on attendees' views on the conference f…
▽ More
This is a report on the PLDI 2020 conference, for which I was General Chair, which was held virtually for the first time as a result of the COVID-19 pandemic. The report contains: my personal reflections on the positive and negative aspects of the event; a description of the format of the event and associated logistical details; and data (with some analysis) on attendees' views on the conference format, the extent to which attendees engaged with the conference, attendees' views on virtual vs. physical conferences (with a focus on PLDI specifically) and the diversity of conference registrants. I hope that the report will be a useful resource for organizers of upcoming virtual conferences, and generally interesting to the Programming Languages community and beyond.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
A Systematic Impact Study for Fuzzer-Found Compiler Bugs
Authors:
Michaƫl Marcozzi,
Qiyi Tang,
Alastair F. Donaldson,
Cristian Cadar
Abstract:
Despite much recent interest in compiler randomized testing (fuzzing), the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated base…
▽ More
Despite much recent interest in compiler randomized testing (fuzzing), the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) how much such changes do cause regression test suite failures and could be used to manually trigger divergences during execution. The study is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or by formal verification. The results show that almost half of the fuzzer-found bugs propagate to the generated binaries for some packages, but rarely affect their syntax and cause two failures in total when running their test suites. User-reported and formal verification bugs do not exhibit a higher impact, with less frequently triggered bugs and one test failure. Our manual analysis of a selection of bugs, either fuzzer-found or not, suggests that none can easily trigger a runtime divergence on the packages considered in the analysis, and that in general they affect only corner cases.
△ Less
Submitted 5 September, 2019; v1 submitted 25 February, 2019;
originally announced February 2019.
-
Do Your Cores Play Nicely? A Portable Framework for Multi-core Interference Tuning and Analysis
Authors:
Dan Iorga,
Tyler Sorensen,
Alastair F. Donaldson
Abstract:
Multi-core architectures can be leveraged to allow independent processes to run in parallel. However, due to resources shared across cores, such as caches, distinct processes may interfere with one another, e.g. affecting execution time. Analysing the extent of this interference is difficult due to: (1) the diversity of modern architectures, which may contain different implementations of shared re…
▽ More
Multi-core architectures can be leveraged to allow independent processes to run in parallel. However, due to resources shared across cores, such as caches, distinct processes may interfere with one another, e.g. affecting execution time. Analysing the extent of this interference is difficult due to: (1) the diversity of modern architectures, which may contain different implementations of shared resources, and (2) the complex nature of modern processors, in which interference might arise due to subtle interactions. To address this, we propose a black-box auto-tuning approach that searches for processes that are effective at causing slowdowns for a program when executed in parallel. Such slowdowns provide lower bounds on worst-case execution time; an important metric in systems with real-time constraints.
Our approach considers a set of parameterised "enemy" processes and "victim" programs, each targeting a shared resource. The autotuner searches for enemy process parameters that are effective at causing slowdowns in the victim programs. The idea is that victim programs behave as a proxy for shared resource usage of arbitrary programs. We evaluate our approach on: 5 different chips; 3 resources (cache, memory bus, and main memory); and consider several search strategies and slowdown metrics. Using enemy processes tuned per chip, we evaluate the slowdowns on the autobench and coremark benchmark suites and show that our method is able to achieve slowdowns in 98% of benchmark/chip combinations and provide similar results to manually written enemy processes.
△ Less
Submitted 13 September, 2018;
originally announced September 2018.
-
Cooperative Kernels: GPU Multitasking for Blocking Algorithms (Extended Version)
Authors:
Tyler Sorensen,
Hugues Evrard,
Alastair F. Donaldson
Abstract:
There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g.\ OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today's GPUs in a manner that does not allow the GPU…
▽ More
There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g.\ OpenCL) do not mandate fair scheduling, and GPU schedulers are unfair in practice. Current approaches avoid this issue by exploiting scheduling quirks of today's GPUs in a manner that does not allow the GPU to be shared with other workloads (such as graphics rendering tasks). We propose cooperative kernels, an extension to the traditional GPU programming model geared towards writing blocking algorithms. Workgroups of a cooperative kernel are fairly scheduled, and multitasking is supported via a small set of language extensions through which the kernel and scheduler cooperate. We describe a prototype implementation of a cooperative kernel framework implemented in OpenCL 2.0 and evaluate our approach by porting a set of blocking GPU applications to cooperative kernels and examining their performance under multitasking. Our prototype exploits no vendor-specific hardware, driver or compiler support, thus our results provide a lower-bound on the efficiency with which cooperative kernels can be implemented in practice.
△ Less
Submitted 6 July, 2017;
originally announced July 2017.
-
Implementing and Evaluating Candidate-Based Invariant Generation
Authors:
Adam Betts,
Nathan Chong,
Pantazis Deligiannis,
Alastair F. Donaldson,
Jeroen Ketema
Abstract:
The discovery of inductive invariants lies at the heart of static program verification. Presently, many automatic solutions to inductive invariant generation are inflexible, only applicable to certain classes of programs, or unpredictable. An automatic technique that circumvents these deficiencies to some extent is candidate-based invariant generation. This paper describes our efforts to apply can…
▽ More
The discovery of inductive invariants lies at the heart of static program verification. Presently, many automatic solutions to inductive invariant generation are inflexible, only applicable to certain classes of programs, or unpredictable. An automatic technique that circumvents these deficiencies to some extent is candidate-based invariant generation. This paper describes our efforts to apply candidate-based invariant generation in GPUVerify, a static checker for programs that run on GPUs. We study a set of GPU programs that contain loops, drawn from a number of open source suites and vendor SDKs.
We describe the methodology we used to incrementally improve the invariant generation capabilities of GPUVerify to handle these benchmarks, through candidate-based invariant generation, using cheap static analysis to speculate potential program invariants. We also describe a set of experiments that we used to examine the effectiveness of our rules for candidate generation, assessing rules based on their generality (the extent to which they generate candidate invariants), hit rate (the extent to which the generated candidates hold), worth (the extent to which provable candidates actually help in allowing verification to succeed), and influence (the extent to which the success of one generation rule depends on candidates generated by another rule).
The candidates produced by GPUVerify help to verify 231 of the 253 programs. This increase in precision, however, makes GPUVerify sluggish: the more candidates that are generated, the more time is spent determining which are inductive invariants. To speed up this process, we have investigated four under-approximating program analyses that aim to reject false candidates quickly and a framework whereby these analyses can run in sequence or in parallel.
△ Less
Submitted 15 June, 2017; v1 submitted 4 December, 2016;
originally announced December 2016.
-
Integrating a large-scale testing campaign in the CK framework
Authors:
Andrei Lascu,
Alastair F. Donaldson
Abstract:
We consider the problem of conducting large experimental campaigns in programming languages research. Most research efforts require a certain level of bookkee** of results. This is manageable via quick, on-the-fly infrastructure implementations. However, it becomes a problem for large-scale testing initiatives, especially as the needs of the project evolve along the way. We look at how the Colle…
▽ More
We consider the problem of conducting large experimental campaigns in programming languages research. Most research efforts require a certain level of bookkee** of results. This is manageable via quick, on-the-fly infrastructure implementations. However, it becomes a problem for large-scale testing initiatives, especially as the needs of the project evolve along the way. We look at how the Collective Knowledge generalized testing framework can help with such a project and its overall applicability and ease of use. The project in question is an OpenCL compiler testing campaign. We investigate how to use the Collective Knowledge framework to lead the experimental campaign, by providing storage and representation of test cases and their results. We also provide an initial implementation, publicly available.
△ Less
Submitted 8 January, 2016; v1 submitted 9 November, 2015;
originally announced November 2015.
-
Overhauling SC Atomics in C11 and OpenCL
Authors:
Mark Batty,
Alastair F. Donaldson,
John Wickerson
Abstract:
Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic operations and fences in the C11 and OpenCL memory models is subtle, leading to convoluted prose descriptions that translate to complex axiomatic formalisations. We conduct an overhaul of SC atomics in C11, reducing the associated axioms in both number and complexity. A consequence of our simplification is…
▽ More
Despite the conceptual simplicity of sequential consistency (SC), the semantics of SC atomic operations and fences in the C11 and OpenCL memory models is subtle, leading to convoluted prose descriptions that translate to complex axiomatic formalisations. We conduct an overhaul of SC atomics in C11, reducing the associated axioms in both number and complexity. A consequence of our simplification is that the SC operations in an execution no longer need to be totally ordered. This relaxation enables, for the first time, efficient and exhaustive simulation of litmus tests that use SC atomics. We extend our improved C11 model to obtain the first rigorous memory model formalisation for OpenCL (which extends C11 with support for heterogeneous many-core programming). In the OpenCL setting, we refine the SC axioms still further to give a sensible semantics to SC operations that employ a 'memory scope' to restrict their visibility to specific threads. Our overhaul requires slight strengthenings of both the C11 and the OpenCL memory models, causing some behaviours to become disallowed. We argue that these strengthenings are natural, and that all of the formalised C11 and OpenCL compilation schemes of which we are aware (Power and x86 CPUs for C11, AMD GPUs for OpenCL) remain valid in our revised models. Using the Herd memory model simulator, we show that our overhaul leads to an exponential improvement in simulation time for C11 litmus tests compared with the original model, making exhaustive simulation competitive, time-wise, with the non-exhaustive CDSChecker tool.
△ Less
Submitted 16 November, 2016; v1 submitted 24 March, 2015;
originally announced March 2015.
-
Proceedings 7th Workshop on Programming Language Approaches to Concurrency and Communication-cEntric Software
Authors:
Alastair F. Donaldson,
Vasco T. Vasconcelos
Abstract:
This volume contains the post-proceedings of PLACES 2014, the seventh Workshop on Programming Language Approaches to Concurrency and Communication-cEntric Software, which was held in Grenoble, France, on April 12th 2014, and co-located with ETAPS, the European Joint Conferences on Theory and Practice of Software. The PLACES workshop series aims to offer a forum where researchers from different fi…
▽ More
This volume contains the post-proceedings of PLACES 2014, the seventh Workshop on Programming Language Approaches to Concurrency and Communication-cEntric Software, which was held in Grenoble, France, on April 12th 2014, and co-located with ETAPS, the European Joint Conferences on Theory and Practice of Software. The PLACES workshop series aims to offer a forum where researchers from different fields exchange new ideas on one of the central challenges for programming in the near future: the development of programming languages, methodologies and infrastructures where concurrency and distribution are the norm rather than a marginal concern. Previous editions of PLACES were held in Rome (2013), Tallin (2012), Saarbrueken (2011), Paphos (2010) and York (2009), all co-located with ETAPS, and the first PLACES was held in Oslo and co-located with DisCoTec (2008).
The Program Committee, after a careful and thorough reviewing process, selected nine papers out of 12 submissions for presentation at the workshop and inclusion in this post-proceedings. Each submission was evaluated by three referees (with one paper receiving a fourth review), and the accepted papers were selected during a week-long electronic discussion. One of the nine accepted papers was conditionally accepted subject to a process of shepherding by a PC member, which was successful and led to the paper's full acceptance.
In addition to the contributed papers, the workshop will feature an invited talk by Akash Lal, Microsoft Research India, entitled "Finding Concurrency Bugs Under Imprecise Harnesses".
△ Less
Submitted 12 June, 2014;
originally announced June 2014.
-
PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs
Authors:
Riyadh Baghdadi,
Albert Cohen,
Serge Guelton,
Sven Verdoolaege,
Jun Inoue,
Tobias Grosser,
Georgia Kouveli,
Alexey Kravets,
Anton Lokhmotov,
Cedric Nugteren,
Fraser Waters,
Alastair F. Donaldson
Abstract:
We motivate the design and implementation of a platform-neutral compute intermediate language (PENCIL) for productive and performance-portable accelerator programming.
We motivate the design and implementation of a platform-neutral compute intermediate language (PENCIL) for productive and performance-portable accelerator programming.
△ Less
Submitted 22 February, 2013;
originally announced February 2013.
-
Automatic Verification of Message-Based Device Drivers
Authors:
Sidney Amani,
Peter Chubb,
Alastair F. Donaldson,
Alexander Legg,
Leonid Ryzhyk,
Yan** Zhu
Abstract:
We develop a practical solution to the problem of automatic verification of the interface between device drivers and the OS. Our solution relies on a combination of improved driver architecture and verification tools. It supports drivers written in C and can be implemented in any existing OS, which sets it apart from previous proposals for verification-friendly drivers. Our Linux-based evaluati…
▽ More
We develop a practical solution to the problem of automatic verification of the interface between device drivers and the OS. Our solution relies on a combination of improved driver architecture and verification tools. It supports drivers written in C and can be implemented in any existing OS, which sets it apart from previous proposals for verification-friendly drivers. Our Linux-based evaluation shows that this methodology amplifies the power of existing verification tools in detecting driver bugs, making it possible to verify properties beyond the reach of traditional techniques.
△ Less
Submitted 26 November, 2012;
originally announced November 2012.