Skip to main content

Showing 1–20 of 20 results for author: Emer, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.01742  [pdf, other

    cs.PL

    The Continuous Tensor Abstraction: Where Indices are Real

    Authors: Jaeyeon Won, Willow Ahrens, Joel S. Emer, Saman Amarasinghe

    Abstract: This paper introduces the continuous tensor abstraction, allowing indices to take real-number values (e.g., A[3.14]), and provides a continuous loop construct that iterates over the infinitely large set of real numbers. This paper expands the existing tensor abstraction to include continuous tensors that exhibit a piecewise-constant property, enabling the transformation of an infinite amount of co… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2406.10491  [pdf, other

    cs.AR

    FuseMax: Leveraging Extended Einsums to Optimize Attention Accelerator Design

    Authors: Nandeeka Nayak, Xinrui Wu, Toluwanimi O. Odemuyiwa, Michael Pellauer, Joel S. Emer, Christopher W. Fletcher

    Abstract: Attention for transformers is a critical workload that has recently received significant "attention" as a target for custom acceleration. Yet, while prior work succeeds in reducing attention's memory-bandwidth requirements, it creates load imbalance between attention operators (resulting in severe compute under-utilization) and requires on-chip memory that scales with sequence length (which is exp… ▽ More

    Submitted 25 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: 15 pages, 10 figures

  3. arXiv:2405.07266  [pdf, other

    cs.ET cs.AR

    Architecture-Level Modeling of Photonic Deep Neural Network Accelerators

    Authors: Tanner Andrulis, Gohar Irfan Chaudhry, Vinith M. Suriyakumar, Joel S. Emer, Vivienne Sze

    Abstract: Photonics is a promising technology to accelerate Deep Neural Networks as it can use optical interconnects to reduce data movement energy and it enables low-energy, high-throughput optical-analog computations. To realize these benefits in a full system (accelerator + DRAM), designers must ensure that the benefits of using the electrical, optical, analog, and digital domains exceed the costs of c… ▽ More

    Submitted 14 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: Published at ISPASS 2024

  4. arXiv:2405.07259  [pdf, other

    cs.AR

    CiMLoop: A Flexible, Accurate, and Fast Compute-In-Memory Modeling Tool

    Authors: Tanner Andrulis, Joel S. Emer, Vivienne Sze

    Abstract: Compute-In-Memory (CiM) is a promising solution to accelerate Deep Neural Networks (DNNs) as it can avoid energy-intensive DNN weight movement and use memory arrays to perform low-energy, high-density computations. These benefits have inspired research across the CiM stack, but CiM research often focuses on only one level of the stack (i.e., devices, circuits, architecture, workload, or map**) o… ▽ More

    Submitted 29 May, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: Available at https://github.com/mit-emze/cimloop. Published in ISPASS 2024

  5. arXiv:2404.11591  [pdf, other

    cs.DS

    The EDGE Language: Extended General Einsums for Graph Algorithms

    Authors: Toluwanimi O. Odemuyiwa, Joel S. Emer, John D. Owens

    Abstract: In this work, we propose a unified abstraction for graph algorithms: the Extended General Einsums language, or EDGE. The EDGE language expresses graph algorithms in the language of tensor algebra, providing a rigorous, succinct, and expressive mathematical framework. EDGE leverages two ideas: (1) the well-known foundations provided by the graph-matrix duality, where a graph is simply a 2D tensor,… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 79 pages, 14 figures

  6. arXiv:2404.06553  [pdf, other

    cs.AR

    Modeling Analog-Digital-Converter Energy and Area for Compute-In-Memory Accelerator Design

    Authors: Tanner Andrulis, Ruicong Chen, Hae-Seung Lee, Joel S. Emer, Vivienne Sze

    Abstract: Analog Compute-in-Memory (CiM) accelerators use analog-digital converters (ADCs) to read the analog values that they compute. ADCs can consume significant energy and area, so architecture-level ADC decisions such as ADC resolution or number of ADCs can significantly impact overall CiM accelerator energy and area. Therefore, modeling how architecture-level decisions affect ADC energy and area is cr… ▽ More

    Submitted 14 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  7. Tailors: Accelerating Sparse Tensor Algebra by Overbooking Buffer Capacity

    Authors: Zi Yu Xue, Yannan Nellie Wu, Joel S. Emer, Vivienne Sze

    Abstract: Sparse tensor algebra is a challenging class of workloads to accelerate due to low arithmetic intensity and varying sparsity patterns. Prior sparse tensor algebra accelerators have explored tiling sparse data to increase exploitable data reuse and improve throughput, but typically allocate tile size in a given buffer for the worst-case data occupancy. This severely limits the utilization of availa… ▽ More

    Submitted 26 June, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: 17 pages, 13 figures, in MICRO 2023

    Journal ref: 56th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '23), 2023

  8. arXiv:2309.04119  [pdf, other

    cs.CR cs.AR

    Penetrating Shields: A Systematic Analysis of Memory Corruption Mitigations in the Spectre Era

    Authors: Weon Taek Na, Joel S. Emer, Mengjia Yan

    Abstract: This paper provides the first systematic analysis of a synergistic threat model encompassing memory corruption vulnerabilities and microarchitectural side-channel vulnerabilities. We study speculative shield bypass attacks that leverage speculative execution attacks to leak secrets that are critical to the security of memory corruption mitigations (i.e., the shields), and then use the leaked secre… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 14 pages

    ACM Class: K.6.5; D.4.6; C.2.0

  9. HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

    Authors: Yannan Nellie Wu, Po-An Tsai, Saurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel S. Emer

    Abstract: Due to complex interactions among various deep neural network (DNN) optimization techniques, modern DNNs can have weights and activations that are dense or sparse with diverse sparsity degrees. To offer a good trade-off between accuracy and hardware performance, an ideal DNN accelerator should have high flexibility to efficiently translate DNN sparsity into reductions in energy and/or latency with… ▽ More

    Submitted 1 October, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to MICRO23

  10. RAELLA: Reforming the Arithmetic for Efficient, Low-Resolution, and Low-Loss Analog PIM: No Retraining Required!

    Authors: Tanner Andrulis, Joel S. Emer, Vivienne Sze

    Abstract: Processing-In-Memory (PIM) accelerators have the potential to efficiently run Deep Neural Network (DNN) inference by reducing costly data movement and by using resistive RAM (ReRAM) for efficient analog compute. Unfortunately, overall PIM accelerator efficiency is limited by energy-intensive analog-to-digital converters (ADCs). Furthermore, existing accelerators that reduce ADC cost do so by chang… ▽ More

    Submitted 16 April, 2023; originally announced April 2023.

    Comments: 16 pages; 15 figures; Accepted at ISCA 2023 (the International Symposium on Computer Architecture)

    ACM Class: C.1.3

  11. TeAAL: A Declarative Framework for Modeling Sparse Tensor Accelerators

    Authors: Nandeeka Nayak, Toluwanimi O. Odemuyiwa, Shubham Ugare, Christopher W. Fletcher, Michael Pellauer, Joel S. Emer

    Abstract: Over the past few years, the explosion in sparse tensor algebra workloads has led to a corresponding rise in domain-specific accelerators to service them. Due to the irregularity present in sparse tensors, these accelerators employ a wide variety of novel solutions to achieve good performance. At the same time, prior work on design-flexible sparse accelerator modeling does not express this full ra… ▽ More

    Submitted 11 June, 2024; v1 submitted 16 April, 2023; originally announced April 2023.

    Comments: 17 pages, 13 figures

  12. The Sparse Abstract Machine

    Authors: Olivia Hsu, Maxwell Strange, Ritvik Sharma, Jaeyeon Won, Kunle Olukotun, Joel Emer, Mark Horowitz, Fredrik Kjolstad

    Abstract: We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming dataflow abstraction with sparse primitives that encompass a large space of scheduled tensor algebra expressions. SAM dataflow graphs naturally separate tensor formats from algorithms and are expressi… ▽ More

    Submitted 23 March, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: 18 pages, 17 figures, 3 tables

    Journal ref: ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems Volume 3 (2023) 710-726

  13. arXiv:2205.05826  [pdf, other

    cs.AR cs.CV cs.DC

    Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

    Authors: Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, Joel S. Emer

    Abstract: In recent years, many accelerators have been proposed to efficiently process sparse tensor algebra applications (e.g., sparse neural networks). However, these proposals are single points in a large and diverse design space. The lack of systematic description and modeling support for these sparse tensor accelerators impedes hardware designers from efficient and effective design space exploration. T… ▽ More

    Submitted 9 January, 2023; v1 submitted 11 May, 2022; originally announced May 2022.

    Comments: Update website link, update UOP format description

  14. arXiv:2006.13926  [pdf, other

    cs.ET

    Freely scalable and reconfigurable optical hardware for deep learning

    Authors: Liane Bernstein, Alexander Sludds, Ryan Hamerly, Vivienne Sze, Joel Emer, Dirk Englund

    Abstract: As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy and solve more complex problems. This trend has been enabled by an increase in available compute power; however, efforts to continue to scale electronic processors are impeded by the costs of communication, thermal management, power delivery and clocking. To improve scalability, we propose a digital optical neur… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: 19 pages (15 main and 4 supplementary), 11 figures (6 main and 5 supplementary)

  15. arXiv:2005.01445  [pdf, other

    cs.DC cs.AR

    Estimating Silent Data Corruption Rates Using a Two-Level Model

    Authors: Siva Kumar Sastry Hari, Paolo Rech, Timothy Tsai, Mark Stephenson, Arslan Zulfiqar, Michael Sullivan, Philip Shirvani, Paul Racunas, Joel Emer, Stephen W. Keckler

    Abstract: High-performance and safety-critical system architects must accurately evaluate the application-level silent data corruption (SDC) rates of processors to soft errors. Such an evaluation requires error propagation all the way from particle strikes on low-level state up to the program output. Existing approaches that rely on low-level simulations with fault injection cannot evaluate full application… ▽ More

    Submitted 27 April, 2020; originally announced May 2020.

  16. arXiv:1807.07928  [pdf, other

    cs.DC cs.CV

    Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices

    Authors: Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, Vivienne Sze

    Abstract: A recent trend in DNN development is to extend the reach of deep learning applications to platforms that are more resource and energy constrained, e.g., mobile devices. These endeavors aim to reduce the DNN model size and improve the hardware processing efficiency, and have resulted in DNNs that are much more compact in their structures and/or have high data sparsity. These compact or sparse model… ▽ More

    Submitted 20 May, 2019; v1 submitted 10 July, 2018; originally announced July 2018.

    Comments: accepted for publication in IEEE Journal on Emerging and Selected Topics in Circuits and Systems. This extended version on arXiv also includes Eyexam in the appendix

  17. arXiv:1708.04485  [pdf, other

    cs.NE cs.AR cs.LG

    SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

    Authors: Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, William J. Dally

    Abstract: Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for machine learning. High performance and extreme energy efficiency are critical for deployments of CNNs in a wide range of situations, especially mobile platforms such as autonomous vehicles, cameras, and electronic personal assistants. This paper introduces the Sparse CNN (SCNN) accelerator architecture, which improve… ▽ More

    Submitted 23 May, 2017; originally announced August 2017.

  18. arXiv:1703.09039  [pdf, other

    cs.CV

    Efficient Processing of Deep Neural Networks: A Tutorial and Survey

    Authors: Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel Emer

    Abstract: Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at the cost of high computational complexity. Accordingly, techniques that enable efficient processing of DNNs to improve energy efficiency and throughput without… ▽ More

    Submitted 13 August, 2017; v1 submitted 27 March, 2017; originally announced March 2017.

    Comments: Based on tutorial on DNN Hardware at eyeriss.mit.edu/tutorial.html

  19. arXiv:1703.05853  [pdf, other

    cs.CV

    Towards Closing the Energy Gap Between HOG and CNN Features for Embedded Vision

    Authors: Amr Suleiman, Yu-Hsin Chen, Joel Emer, Vivienne Sze

    Abstract: Computer vision enables a wide range of applications in robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. For many of these applications, local embedded processing is preferred due to privacy and/or latency concerns. Accordingly, energy-efficient embedded vision hardware delivering real-time and robust performance is crucial. While deep learning is ga… ▽ More

    Submitted 16 March, 2017; originally announced March 2017.

  20. Hardware for Machine Learning: Challenges and Opportunities

    Authors: Vivienne Sze, Yu-Hsin Chen, Joel Emer, Amr Suleiman, Zhengdong Zhang

    Abstract: Machine learning plays a critical role in extracting meaningful information out of the zetabytes of sensor data collected every day. For some applications, the goal is to analyze and understand the data to identify trends (e.g., surveillance, portable/wearable electronics); in other applications, the goal is to take immediate action based the data (e.g., robotics/drones, self-driving cars, smart I… ▽ More

    Submitted 16 October, 2017; v1 submitted 22 December, 2016; originally announced December 2016.

    Comments: Published as an invited conference paper at CICC 2017