Skip to main content

Showing 1–14 of 14 results for author: Wentzlaff, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.10244  [pdf, other

    cs.AR cs.DC

    Muchisim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems

    Authors: Marcelo Orenes-Vera, Esin Tureci, Margaret Martonosi, David Wentzlaff

    Abstract: The design space exploration of scaled-out manycores for communication-intensive applications (e.g., graph analytics and sparse linear algebra) is hampered due to either lack of scalability or accuracy of existing frameworks at simulating data-dependent execution patterns. This paper presents MuchiSim, a novel parallel simulator designed to address these challenges when exploring the design space… ▽ More

    Submitted 22 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: In Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

  2. arXiv:2312.03134  [pdf, other

    cs.AR cs.DC cs.LG

    A Hardware Evaluation Framework for Large Language Model Inference

    Authors: Hengrui Zhang, August Ning, Rohan Prabhakar, David Wentzlaff

    Abstract: The past year has witnessed the increasing popularity of Large Language Models (LLMs). Their unprecedented scale and associated high hardware cost have impeded their broader adoption, calling for efficient hardware designs. With the large hardware needed to simply run LLM inference, evaluating different hardware designs becomes a new bottleneck. This work introduces LLMCompass, a hardware evalua… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  3. arXiv:2311.15810  [pdf, other

    cs.AR cs.DC

    Tascade: Hardware Support for Atomic-free, Asynchronous and Efficient Reduction Trees

    Authors: Marcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret Martonosi

    Abstract: Graph search and sparse data-structure traversal workloads contain challenging irregular memory patterns on global data structures that need to be modified atomically. Distributed processing of these workloads has relied on server threads operating on their own data copies that are merged upon global synchronization. As parallelism increases within each server, the communication challenges that ar… ▽ More

    Submitted 22 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  4. arXiv:2311.15443  [pdf, other

    cs.AR cs.DC

    DCRA: A Distributed Chiplet-based Reconfigurable Architecture for Irregular Applications

    Authors: Marcelo Orenes-Vera, Esin Tureci, Margaret Martonosi, David Wentzlaff

    Abstract: In recent years, the growing demand to process large graphs and sparse datasets has led to increased research efforts to develop hardware- and software-based architectural solutions to accelerate them. While some of these approaches achieve scalable parallelization with up to thousands of cores, adaptation of these proposals by the industry remained slow. To help solve this dissonance, we identifi… ▽ More

    Submitted 25 February, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

  5. arXiv:2309.09437  [pdf, other

    cs.AR cs.SE

    Using LLMs to Facilitate Formal Verification of RTL

    Authors: Marcelo Orenes-Vera, Margaret Martonosi, David Wentzlaff

    Abstract: Formal property verification (FPV) has existed for decades and has been shown to be effective at finding intricate RTL bugs. However, formal properties, such as those written as SystemVerilog Assertions (SVA), are time-consuming and error-prone to write, even for experienced users. Prior work has attempted to lighten this burden by raising the abstraction level so that SVA is generated from high-l… ▽ More

    Submitted 28 September, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  6. arXiv:2304.09389  [pdf, other

    cs.DC cs.AR

    Massive Data-Centric Parallelism in the Chiplet Era

    Authors: Marcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret Martonosi

    Abstract: Recent works have introduced task-based parallelization schemes to accelerate graph search and sparse data-structure traversal, where some solutions scale up to thousands of processing units (PUs) on a single chip. However parallelizing these memory-intensive workloads across millions of cores requires a scalable communication scheme as well as designing a cost-efficient computing node that makes… ▽ More

    Submitted 11 August, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

  7. arXiv:2301.02785  [pdf, other

    cs.AR

    Duet: Creating Harmony between Processors and Embedded FPGAs

    Authors: Ang Li, August Ning, David Wentzlaff

    Abstract: The demise of Moore's Law has led to the rise of hardware acceleration. However, the focus on accelerating stable algorithms in their entirety neglects the abundant fine-grained acceleration opportunities available in broader domains and squanders host processors' compute power. This paper presents Duet, a scalable, manycore-FPGA architecture that promotes embedded FPGAs (eFPGA) to be equal peers… ▽ More

    Submitted 7 January, 2023; originally announced January 2023.

    Comments: Accepted to HPCA 2023

  8. Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

    Authors: Marcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret Martonosi

    Abstract: Applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization. While prior work with prefetching, decoupling, or pipelining can mitigate memory latency and improve core utilization, memory bottlenecks persist due to limited off-chip bandwidth. Approaches doing process… ▽ More

    Submitted 4 May, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: In Proceedings of the 29th IEEE Symposium on High-Performance Computer Architecture (HPCA-29)

    Journal ref: 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

  9. Evolving Transferable Neural Pruning Functions

    Authors: Yuchen Liu, S. Y. Kung, David Wentzlaff

    Abstract: Structural design of neural networks is crucial for the success of deep learning. While most prior works in evolutionary learning aim at directly searching the structure of a network, few attempts have been made on another promising track, channel pruning, which recently has made major headway in designing efficient deep learning models. In fact, prior pruning methods adopt human-made pruning func… ▽ More

    Submitted 3 August, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: Published at GECCO 2022

    Journal ref: Proceedings of the Genetic and Evolutionary Computation Conference, 2022 (385--394)

  10. arXiv:2110.10864  [pdf, other

    cs.CV

    Class-Discriminative CNN Compression

    Authors: Yuchen Liu, David Wentzlaff, S. Y. Kung

    Abstract: Compressing convolutional neural networks (CNNs) by pruning and distillation has received ever-increasing focus in the community. In particular, designing a class-discrimination based approach would be desired as it fits seamlessly with the CNNs training objective. In this paper, we propose class-discriminative compression (CDC), which injects class discrimination in both pruning and distillation… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

  11. arXiv:2104.04003  [pdf, other

    cs.AR

    AutoSVA: Democratizing Formal Verification of RTL Module Interactions

    Authors: Marcelo Orenes-Vera, Aninda Manocha, David Wentzlaff, Margaret Martonosi

    Abstract: Modern SoC design relies on the ability to separately verify IP blocks relative to their own specifications. Formal verification (FV) using SystemVerilog Assertions (SVA) is an effective method to exhaustively verify blocks at unit-level. Unfortunately, FV has a steep learning curve and requires engineering effort that discourages hardware designers from using it during RTL module development. We… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

  12. arXiv:2004.14492  [pdf, other

    cs.CV

    Rethinking Class-Discrimination Based CNN Channel Pruning

    Authors: Yuchen Liu, David Wentzlaff, S. Y. Kung

    Abstract: Channel pruning has received ever-increasing focus on network compression. In particular, class-discrimination based channel pruning has made major headway, as it fits seamlessly with the classification objective of CNNs and provides good explainability. Prior works singly propose and evaluate their discriminant functions, while further study on the effectiveness of the adopted metrics is absent.… ▽ More

    Submitted 29 April, 2020; originally announced April 2020.

  13. arXiv:1811.08091  [pdf, other

    cs.AR

    JuxtaPiton: Enabling Heterogeneous-ISA Research with RISC-V and SPARC FPGA Soft-cores

    Authors: Katie Lim, Jonathan Balkind, David Wentzlaff

    Abstract: Energy efficiency has become an increasingly important concern in computer architecture due to the end of Dennard scaling. Heterogeneity has been explored as a way to achieve better energy efficiency and heterogeneous microarchitecture chips have become common in the mobile setting. Recent research has explored using heterogeneous-ISA, heterogeneous microarchitecture, general-purpose cores to ac… ▽ More

    Submitted 20 November, 2018; originally announced November 2018.

  14. arXiv:1712.07816  [pdf, other

    cs.CR

    Acoustic Denial of Service Attacks on HDDs

    Authors: Mohammad Shahrad, Arsalan Mosenia, Liwei Song, Mung Chiang, David Wentzlaff, Prateek Mittal

    Abstract: Among storage components, hard disk drives (HDDs) have become the most commonly-used type of non-volatile storage due to their recent technological advances, including, enhanced energy efficacy and significantly-improved areal density. Such advances in HDDs have made them an inevitable part of numerous computing systems, including, personal computers, closed-circuit television (CCTV) systems, medi… ▽ More

    Submitted 21 December, 2017; originally announced December 2017.

    Comments: 8 pages, 8 figures