Skip to main content

Showing 1–10 of 10 results for author: Larus, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.04714  [pdf, other

    cs.DC cs.AR

    Parendi: Thousand-Way Parallel RTL Simulation

    Authors: Mahyar Emami, Thomas Bourgeat, James Larus

    Abstract: Hardware development relies on simulations, particularly cycle-accurate RTL (Register Transfer Level) simulations, which consume significant time. As single-processor performance grows only slowly, conventional, single-threaded RTL simulation is becoming less practical for increasingly complex chips and systems. A solution is parallel RTL simulation, where ideally, simulators could run on thousand… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  2. arXiv:2403.00433  [pdf, other

    cs.DC

    Jiagu: Optimizing Serverless Computing Resource Utilization with Harmonized Efficiency and Practicability

    Authors: Qingyuan Liu, Yanning Yang, Dong Du, Yubin Xia, ** Zhang, Jia Feng, James Larus, Haibo Chen

    Abstract: Current serverless platforms struggle to optimize resource utilization due to their dynamic and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall short, often sacrificing utilization for practicability or incurring performance trade-offs. Overcommitment requires predicting performance to prevent QoS violation, introducing trade-off between prediction accuracy an… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 17 pages, 17 figures

  3. Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism

    Authors: Mahyar Emami, Sahand Kashani, Keisuke Kamahori, Mohammad Sepehr Pourghannad, Ritik Raj, James R. Larus

    Abstract: The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware depend heavily upon cycle-accurate simulation of register-transfer-level (RTL) designs. The fastest software RTL simulators can simulate designs at 1--1000 kHz, i.e., more than three orders of magnitude slower than hardware. Improved s… ▽ More

    Submitted 20 October, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

  4. arXiv:2201.00060  [pdf, other

    cs.SE cs.PL

    Statistical Program Slicing: a Hybrid Slicing Technique for Analyzing Deployed Software

    Authors: Bogdan Alexandru Stoica, Swarup K. Sahoo, James R. Larus, Vikram S. Adve

    Abstract: Dynamic program slicing can significantly reduce the code developers need to inspect by narrowing it down to only a subset of relevant program statements. However, despite an extensive body of research showing its usefulness, dynamic slicing is still short from production-level use due to the high cost of runtime instrumentation. As an alternative, we propose statistical program slicing, a novel… ▽ More

    Submitted 31 December, 2021; originally announced January 2022.

  5. arXiv:2107.09333  [pdf, other

    cs.AR cs.CL cs.PF

    StreamBlocks: A compiler for heterogeneous dataflow computing (technical report)

    Authors: Endri Bezati, Mahyar Emami, Jörn Janneck, James Larus

    Abstract: To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be difficult to predict in advance and require experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new p… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

    ACM Class: C.5; D.1.3; D.3.0; I.6.5; B.6.0; B.8.2; B.4.0

  6. arXiv:2005.12273  [pdf

    cs.CR cs.CY

    Decentralized Privacy-Preserving Proximity Tracing

    Authors: Carmela Troncoso, Mathias Payer, Jean-Pierre Hubaux, Marcel Salathé, James Larus, Edouard Bugnion, Wouter Lueks, Theresa Stadler, Apostolos Pyrgelis, Daniele Antonioli, Ludovic Barman, Sylvain Chatel, Kenneth Paterson, Srdjan Čapkun, David Basin, Jan Beutel, Dennis Jackson, Marc Roeschlin, Patrick Leu, Bart Preneel, Nigel Smart, Aysajan Abidin, Seda Gürses, Michael Veale, Cas Cremers , et al. (9 additional authors not shown)

    Abstract: This document describes and analyzes a system for secure and privacy-preserving proximity tracing at large scale. This system, referred to as DP3T, provides a technological foundation to help slow the spread of SARS-CoV-2 by simplifying and accelerating the process of notifying people who might have been exposed to the virus so that they can take appropriate measures to break its transmission chai… ▽ More

    Submitted 25 May, 2020; originally announced May 2020.

    Comments: 46 pages, 6 figures, first published 3 April 2020 on https://github.com/DP-3T/documents where companion documents and code can be found

  7. arXiv:1908.10574  [pdf, other

    cs.DC

    Parallel and Scalable Precise Clustering for Homologous Protein Discovery

    Authors: Stuart Byma, Akash Dhasade, Adrian Altenhoff, Christophe Dessimoz, James R. Larus

    Abstract: This paper presents a new, parallel implementation of clustering and demonstrates its utility in greatly speeding up the process of identifying homologous proteins. Clustering is a technique to reduce the number of comparison needed to find similar pairs in a set of $n$ elements such as protein sequences. Precise clustering ensures that each pair of similar elements appears together in at least on… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.

    Comments: 11 pages, 11 figures. Submitted for publication

  8. arXiv:1908.09291  [pdf, other

    cs.DC

    Extending TensorFlow's Semantics with Pipelined Execution

    Authors: Sam Whitlock, James Larus, Edouard Bugnion

    Abstract: TensorFlow is a popular cloud computing framework that targets machine learning applications. It separates the specification of application logic (in a dataflow graph) from the execution of the logic. TensorFlow's native runtime executes the application with low overhead across a diverse set of hardware including CPUs, GPUs, and ASICs. Although the underlying dataflow engine supporting these featu… ▽ More

    Submitted 25 August, 2019; originally announced August 2019.

    ACM Class: C.2.4; D.1.3; D.2.11

  9. Fine-Grain Checkpointing with In-Cache-Line Logging

    Authors: Nachshon Cohen, David T. Aksun, Hillel Avni, James R. Larus

    Abstract: Non-Volatile Memory offers the possibility of implementing high-performance, durable data structures. However, achieving performance comparable to well-designed data structures in non-persistent (transient) memory is difficult, primarily because of the cost of ensuring the order in which memory writes reach NVM. Often, this requires flushing data to NVM and waiting a full memory round-trip time.… ▽ More

    Submitted 2 February, 2019; originally announced February 2019.

    Comments: In 2019 Architectural Support for Programming Languages and Operating Systems (ASPLOS 19), April 13, 2019, Providence, RI, USA

  10. arXiv:1709.02610  [pdf, other

    cs.DC cs.DB cs.PL

    Efficient Logging in Non-Volatile Memory by Exploiting Coherency Protocols

    Authors: Nachshon Cohen, Michal Friedman, James R. Larus

    Abstract: Non-volatile memory (NVM) technologies such as PCM, ReRAM and STT-RAM allow processors to directly write values to persistent storage at speeds that are significantly faster than previous durable media such as hard drives or SSDs. Many applications of NVM are constructed on a logging subsystem, which enables operations to appear to execute atomically and facilitates recovery from failures. Writes… ▽ More

    Submitted 8 September, 2017; originally announced September 2017.