-
Parendi: Thousand-Way Parallel RTL Simulation
Authors:
Mahyar Emami,
Thomas Bourgeat,
James Larus
Abstract:
Hardware development relies on simulations, particularly cycle-accurate RTL (Register Transfer Level) simulations, which consume significant time. As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation, where ideally, simulators could run on thousand…
▽ More
Hardware development relies on simulations, particularly cycle-accurate RTL (Register Transfer Level) simulations, which consume significant time. As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation, where ideally, simulators could run on thousands of parallel cores. However, existing simulators can only exploit tens of cores.
This paper studies the challenges inherent in running parallel RTL simulation on a multi-thousand-core machine (the Graphcore IPU, a 1472-core machine). Simulation performance requires balancing three factors: synchronization, communication, and computation. We experimentally evaluate each metric and analyze how it affects parallel simulation speed, drawing on contrasts between the large-scale IPU and smaller but faster x86 systems.
Using this analysis, we build Parendi, an RTL simulator for the IPU. It distributes RTL simulation across 5888 cores on 4 IPU sockets. Parendi runs large RTL designs up to 4x faster than a powerful, state-of-the-art x86 multicore system.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Citadel: Real-World Hardware-Software Contracts for Secure Enclaves Through Microarchitectural Isolation and Controlled Speculation
Authors:
Jules Drean,
Miguel Gomez-Garcia,
Fisher Jepsen,
Thomas Bourgeat,
Srinivas Devadas
Abstract:
Hardware isolation primitives such as secure enclaves aim to protect sensitive programs, but remain vulnerable to transient execution attacks. Complete microarchitectural isolation is not a satisfactory defense mechanism as it leaves out public shared memory, critical for usability and application performance. Conversely, hardware-software co-designs for secure speculation can counter these attack…
▽ More
Hardware isolation primitives such as secure enclaves aim to protect sensitive programs, but remain vulnerable to transient execution attacks. Complete microarchitectural isolation is not a satisfactory defense mechanism as it leaves out public shared memory, critical for usability and application performance. Conversely, hardware-software co-designs for secure speculation can counter these attacks but are not yet practical, since they make assumptions on the speculation modes, the exposed microarchitectural state, and the software, which are all hard to support for the entire software stack. This paper advocates for processors to incorporate microarchitectural isolation primitives and mechanisms for controlled speculation, enabling different execution modes. These modes can restrict what is exposed to an attacker, effectively balancing performance and program-analysis complexity. We introduce two mechanisms to securely share memory between an enclave and an untrusted OS in an out-of-order processor. We show that our two modes are complementary, achieving speculative non-interference with a reasonable performance impact, while requiring minimal code annotation and simple program analysis doable by hand. Our prototype, Citadel, is a multicore processor running on an FPGA, booting untrusted Linux, and supporting comprehensive enclave capabilities, such as shared memory, and remote attestation. To our knowledge, Citadel is the first end-to-end enclave platform to run secure applications, such as cryptographic libraries or small private inference workloads, on a speculative out-of-order multicore processor while protecting against a significant class of side-channel attacks.
△ Less
Submitted 8 May, 2024; v1 submitted 26 June, 2023;
originally announced June 2023.
-
Flexible Instruction-Set Semantics via Type Classes
Authors:
Thomas Bourgeat,
Ian Clester,
Andres Erbsen,
Samuel Gruetter,
Pratap Singh,
Andrew Wright,
Adam Chlipala
Abstract:
Instruction sets, from families like x86 and ARM, are at the center of many ambitious formal-methods projects. Many verification, synthesis, programming, and debugging tools rely on formal semantics of instruction sets, but different tools can use semantics in rather different ways. As a result, a central challenge for that community is how semantics should be written and what techniques should be…
▽ More
Instruction sets, from families like x86 and ARM, are at the center of many ambitious formal-methods projects. Many verification, synthesis, programming, and debugging tools rely on formal semantics of instruction sets, but different tools can use semantics in rather different ways. As a result, a central challenge for that community is how semantics should be written and what techniques should be used to connect them to new use cases. The best-known work applying single semantics across quite-different tools relies on domain-specific languages like Sail, where the language and its translation tools are specialized to the realm of instruction sets. We decided to explore a different approach, with semantics written in a carefully chosen subset of Haskell. This style does not depend on any new language translators, relying instead on parameterization of semantics over type-class instances. As a result, a semantics can be a first-class object within a logic, and application of a semantics for a new kind of tool can be a first-class operation in the logic, allowing sharing of theorems across applications. Our case study is for the open RISC-V instruction-set family, and we have used a single core semantics to support testing, interactive proof, and model checking of both software and hardware. We especially highlight an application of a first-class semantics within Coq that can be instantiated in different ways within one proof: simulation between variants where multiplication is implemented in hardware or in the machine code of a particular software trap handler.
△ Less
Submitted 16 November, 2022; v1 submitted 1 April, 2021;
originally announced April 2021.
-
MI6: Secure Enclaves in a Speculative Out-of-Order Processor
Authors:
Thomas Bourgeat,
Ilia Lebedev,
Andrew Wright,
Sizhuo Zhang,
Arvind,
Srinivas Devadas
Abstract:
Recent attacks have broken process isolation by exploiting microarchitectural side channels that allow indirect access to shared microarchitectural state. Enclaves strengthen the process abstraction to restore isolation guarantees.
We propose MI6, an aggressive, speculative out-of-order processor capable of providing secure enclaves under a threat model that includes an untrusted OS and an attac…
▽ More
Recent attacks have broken process isolation by exploiting microarchitectural side channels that allow indirect access to shared microarchitectural state. Enclaves strengthen the process abstraction to restore isolation guarantees.
We propose MI6, an aggressive, speculative out-of-order processor capable of providing secure enclaves under a threat model that includes an untrusted OS and an attacker capable of mounting any software attack currently considered practical, including control flow speculation attacks. MI6 is inspired by Sanctum [16] and extends its isolation guarantee to more realistic memory hierarchies. It also introduces a purge instruction, which is used only when a secure process is scheduled, and implements it for a complex processor microarchitecture. We model the performance impact of enclaves in MI6 through FPGA emulation on AWS F1 FPGAs by running SPEC CINT2006 benchmarks on top of an untrusted Linux OS. Security comes at the cost of approximately 16.4% average slowdown for protected programs.
△ Less
Submitted 29 August, 2019; v1 submitted 23 December, 2018;
originally announced December 2018.
-
New Algorithmic Approaches to Point Constellation Recognition
Authors:
Thomas Bourgeat,
Julien Bringer,
Herve Chabanne,
Robin Champenois,
Jeremie Clement,
Houda Ferradi,
Marc Heinrich,
Paul Melotti,
David Naccache,
Antoine Voizard
Abstract:
Point constellation recognition is a common problem with many pattern matching applications. Whilst useful in many contexts, this work is mainly motivated by fingerprint matching. Fingerprints are traditionally modelled as constellations of oriented points called minutiae. The fingerprint verifier's task consists in comparing two point constellations. The compared constellations may differ by rota…
▽ More
Point constellation recognition is a common problem with many pattern matching applications. Whilst useful in many contexts, this work is mainly motivated by fingerprint matching. Fingerprints are traditionally modelled as constellations of oriented points called minutiae. The fingerprint verifier's task consists in comparing two point constellations. The compared constellations may differ by rotation and translation or by much more involved transforms such as distortion or occlusion. This paper presents three new constellation matching algorithms. The first two methods generalize an algorithm by Bringer and Despiegel. Our third proposal creates a very interesting analogy between mechanical system simulation and the constellation recognition problem.
△ Less
Submitted 24 March, 2014;
originally announced May 2014.