Skip to main content

Showing 1–9 of 9 results for author: Orenes-Vera, M

.
  1. arXiv:2312.10244  [pdf, other

    cs.AR cs.DC

    Muchisim: A Simulation Framework for Design Exploration of Multi-Chip Manycore Systems

    Authors: Marcelo Orenes-Vera, Esin Tureci, Margaret Martonosi, David Wentzlaff

    Abstract: The design space exploration of scaled-out manycores for communication-intensive applications (e.g., graph analytics and sparse linear algebra) is hampered due to either lack of scalability or accuracy of existing frameworks at simulating data-dependent execution patterns. This paper presents MuchiSim, a novel parallel simulator designed to address these challenges when exploring the design space… ▽ More

    Submitted 22 April, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: In Proceedings of the 2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

  2. arXiv:2311.15810  [pdf, other

    cs.AR cs.DC

    Tascade: Hardware Support for Atomic-free, Asynchronous and Efficient Reduction Trees

    Authors: Marcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret Martonosi

    Abstract: Graph search and sparse data-structure traversal workloads contain challenging irregular memory patterns on global data structures that need to be modified atomically. Distributed processing of these workloads has relied on server threads operating on their own data copies that are merged upon global synchronization. As parallelism increases within each server, the communication challenges that ar… ▽ More

    Submitted 22 April, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  3. arXiv:2311.15443  [pdf, other

    cs.AR cs.DC

    DCRA: A Distributed Chiplet-based Reconfigurable Architecture for Irregular Applications

    Authors: Marcelo Orenes-Vera, Esin Tureci, Margaret Martonosi, David Wentzlaff

    Abstract: In recent years, the growing demand to process large graphs and sparse datasets has led to increased research efforts to develop hardware- and software-based architectural solutions to accelerate them. While some of these approaches achieve scalable parallelization with up to thousands of cores, adaptation of these proposals by the industry remained slow. To help solve this dissonance, we identifi… ▽ More

    Submitted 25 February, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

  4. arXiv:2309.09437  [pdf, other

    cs.AR cs.SE

    Using LLMs to Facilitate Formal Verification of RTL

    Authors: Marcelo Orenes-Vera, Margaret Martonosi, David Wentzlaff

    Abstract: Formal property verification (FPV) has existed for decades and has been shown to be effective at finding intricate RTL bugs. However, formal properties, such as those written as SystemVerilog Assertions (SVA), are time-consuming and error-prone to write, even for experienced users. Prior work has attempted to lighten this burden by raising the abstraction level so that SVA is generated from high-l… ▽ More

    Submitted 28 September, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  5. arXiv:2304.09389  [pdf, other

    cs.DC cs.AR

    Massive Data-Centric Parallelism in the Chiplet Era

    Authors: Marcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret Martonosi

    Abstract: Recent works have introduced task-based parallelization schemes to accelerate graph search and sparse data-structure traversal, where some solutions scale up to thousands of processing units (PUs) on a single chip. However parallelizing these memory-intensive workloads across millions of cores requires a scalable communication scheme as well as designing a cost-efficient computing node that makes… ▽ More

    Submitted 11 August, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

  6. Wafer-Scale Fast Fourier Transforms

    Authors: Marcelo Orenes-Vera, Ilya Sharapov, Robert Schreiber, Mathias Jacquelin, Philippe Vandermersch, Sharan Chetlur

    Abstract: We have implemented fast Fourier transforms for one, two, and three-dimensional arrays on the Cerebras CS-2, a system whose memory and processing elements reside on a single silicon wafer. The wafer-scale engine (WSE) encompasses a two-dimensional mesh of roughly 850,000 processing elements (PEs) with fast local memory and equally fast nearest-neighbor interconnections. Our wafer-scale FFT (wsFF… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Journal ref: Proceedings of the 37th International Conference on Supercomputing 2023

  7. Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

    Authors: Marcelo Orenes-Vera, Esin Tureci, David Wentzlaff, Margaret Martonosi

    Abstract: Applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization. While prior work with prefetching, decoupling, or pipelining can mitigate memory latency and improve core utilization, memory bottlenecks persist due to limited off-chip bandwidth. Approaches doing process… ▽ More

    Submitted 4 May, 2023; v1 submitted 26 July, 2022; originally announced July 2022.

    Comments: In Proceedings of the 29th IEEE Symposium on High-Performance Computer Architecture (HPCA-29)

    Journal ref: 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)

  8. arXiv:2104.04003  [pdf, other

    cs.AR

    AutoSVA: Democratizing Formal Verification of RTL Module Interactions

    Authors: Marcelo Orenes-Vera, Aninda Manocha, David Wentzlaff, Margaret Martonosi

    Abstract: Modern SoC design relies on the ability to separately verify IP blocks relative to their own specifications. Formal verification (FV) using SystemVerilog Assertions (SVA) is an effective method to exhaustively verify blocks at unit-level. Unfortunately, FV has a steep learning curve and requires engineering effort that discourages hardware designers from using it during RTL module development. We… ▽ More

    Submitted 8 April, 2021; originally announced April 2021.

  9. arXiv:2004.07415  [pdf, other

    cs.AR

    The MosaicSim Simulator (Full Technical Report)

    Authors: Opeoluwa Matthews, Aninda Manocha, Davide Giri, Marcelo Orenes-Vera, Esin Tureci, Tyler Sorensen, Tae Jun Ham, Juan L. Aragón, Luca P. Carloni, Margaret Martonosi

    Abstract: As Moore's Law has slowed and Dennard Scaling has ended, architects are increasingly turning to heterogeneous parallelism and domain-specific hardware-software co-designs. These trends present new challenges for simulation-based performance assessments that are central to early-stage architectural exploration. Simulators must be lightweight to support rich heterogeneous combinations of general pur… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: This is a full technical report on the MosaicSim simulator. This version is a variation of the original ISPASS publication with additions describing the accuracy of MosaicSim's memory hierarchy performance modeling and additional hardware features, e.g. branch predictors. This technical report will be maintained as the MosaicSim developers continue to augment the simulator with more features