Skip to main content

Showing 1–10 of 10 results for author: Falsafi, B

.
  1. arXiv:2405.20935  [pdf, other

    cs.LG cs.AI

    Effective Interplay between Sparsity and Quantization: From Theory to Practice

    Authors: Simla Burcu Harma, Ayan Chakraborty, Elizaveta Kostenok, Danila Mishin, Dongho Ha, Babak Falsafi, Martin Jaggi, Ming Liu, Yunho Oh, Suvinay Subramanian, Amir Yazdanbakhsh

    Abstract: The increasing size of deep neural networks necessitates effective model compression to improve computational efficiency and reduce their memory footprint. Sparsity and quantization are two prominent compression methods that have individually demonstrated significant reduction in computational and memory footprints while preserving model accuracy. While effective, the interplay between these two m… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  2. arXiv:2211.10737  [pdf, other

    cs.LG

    Accuracy Booster: Enabling 4-bit Fixed-point Arithmetic for DNN Training

    Authors: Simla Burcu Harma, Ayan Chakraborty, Nicholas Sperry, Babak Falsafi, Martin Jaggi, Yunho Oh

    Abstract: The unprecedented demand for computing resources to train DNN models has led to a search for minimal numerical encoding. Recent state-of-the-art (SOTA) proposals advocate for multi-level scaled narrow bitwidth numerical formats. In this paper, we show that single-level scaling is sufficient to maintain training accuracy while maximizing arithmetic density. We identify a previously proposed single-… ▽ More

    Submitted 31 May, 2024; v1 submitted 19 November, 2022; originally announced November 2022.

  3. arXiv:2203.11540  [pdf, other

    cs.AR cs.LG

    Scale-out Systolic Arrays

    Authors: Ahmet Caner Yüzügüler, Canberk Sönmez, Mario Drumond, Yunho Oh, Babak Falsafi, Pascal Frossard

    Abstract: Multi-pod systolic arrays are emerging as the architecture of choice in DNN inference accelerators. Despite their potential, designing multi-pod systolic arrays to maximize effective throughput/Watt (i.e., throughput/Watt adjusted when accounting for array utilization) poses a unique set of challenges. In this work, we study three key pillars in multi-pod systolic array designs, namely array granu… ▽ More

    Submitted 22 March, 2022; originally announced March 2022.

  4. arXiv:2010.09330  [pdf, other

    cs.AR

    Enabling High-Capacity, Latency-Tolerant, and Highly-Concurrent GPU Register Files via Software/Hardware Cooperation

    Authors: Mohammad Sadrosadati, Amirhossein Mirhosseini, Ali Hajiabadi, Seyed Borna Ehsani, Hajar Falahati, Hamid Sarbazi-Azad, Mario Drumond, Babak Falsafi, Rachata Ausavarungnirun, Onur Mutlu

    Abstract: Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce the register file power consumption by caching reg… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: To Appear in ACM Transactions on Computer Systems (TOCS)

  5. arXiv:2001.07045  [pdf, other

    cs.AR cs.OS

    SPARTA: A Divide and Conquer Approach to Address Translation for Accelerators

    Authors: Javier Picorel, Seyed Alireza Sanaee Kohroudi, Zi Yan, Abhishek Bhattacharjee, Babak Falsafi, Djordje Jevdjic

    Abstract: Virtual memory (VM) is critical to the usability and programmability of hardware accelerators. Unfortunately, implementing accelerator VM efficiently is challenging because the area and power constraints make it difficult to employ the large multi-level TLBs used in general-purpose CPUs. Recent research proposals advocate a number of restrictions on virtual-to-physical address map**s in order to… ▽ More

    Submitted 20 January, 2020; originally announced January 2020.

  6. SMoTherSpectre: exploiting speculative execution through port contention

    Authors: Atri Bhattacharyya, Alexandra Sandulescu, Matthias Neugschwandtner, Alessandro Sorniotti, Babak Falsafi, Mathias Payer, Anil Kurmus

    Abstract: Spectre, Meltdown, and related attacks have demonstrated that kernels, hypervisors, trusted execution environments, and browsers are prone to information disclosure through micro-architectural weaknesses. However, it remains unclear as to what extent other applications, in particular those that do not load attacker-provided code, may be impacted. It also remains unclear as to what extent these att… ▽ More

    Submitted 26 September, 2019; v1 submitted 5 March, 2019; originally announced March 2019.

  7. arXiv:1809.05859  [pdf, other

    cs.AR cs.ET cs.PL

    Exploiting Errors for Efficiency: A Survey from Circuits to Algorithms

    Authors: Phillip Stanley-Marbell, Armin Alaghi, Michael Carbin, Eva Darulova, Lara Dolecek, Andreas Gerstlauer, Ghayoor Gillani, Djordje Jevdjic, Thierry Moreau, Mattia Cacciotti, Alexandros Daglis, Natalie Enright Jerger, Babak Falsafi, Sasa Misailovic, Adrian Sampson, Damien Zufferey

    Abstract: When a computational task tolerates a relaxation of its specification or when an algorithm tolerates the effects of noise in its execution, hardware, programming languages, and system software can trade deviations from correct behavior for lower resource usage. We present, for the first time, a synthesis of research results on computing systems that only make as many errors as their users can tole… ▽ More

    Submitted 16 September, 2018; originally announced September 2018.

    Comments: 35 pages

  8. arXiv:1804.01526  [pdf, other

    cs.LG math.NA stat.ML

    Training DNNs with Hybrid Block Floating Point

    Authors: Mario Drumond, Tao Lin, Martin Jaggi, Babak Falsafi

    Abstract: The wide adoption of DNNs has given birth to unrelenting computing requirements, forcing datacenter operators to adopt domain-specific accelerators to train them. These accelerators typically employ densely packed full precision floating-point arithmetic to maximize performance per area. Ongoing research efforts seek to further increase that performance density by replacing floating-point with fix… ▽ More

    Submitted 2 December, 2018; v1 submitted 4 April, 2018; originally announced April 2018.

    Comments: 9 pages, 3 figures. Accepted in Neural Information Processing Systems 2018 (NeurIPS 2018)

  9. Design Guidelines for High-Performance SCM Hierarchies

    Authors: Dmitrii Ustiugov, Alexandros Daglis, Javier Picorel, Mark Sutherland, Edouard Bugnion, Babak Falsafi, Dionisios Pnevmatikatos

    Abstract: With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the much-anticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature of memory-resident services makes seamless integration of SCM in servers questionable. In this paper, we ask the question of how best to introduce SCM for such… ▽ More

    Submitted 7 March, 2019; v1 submitted 20 January, 2018; originally announced January 2018.

    Comments: Published at MEMSYS'18

  10. arXiv:1612.00445  [pdf, other

    cs.AR cs.OS

    Near-Memory Address Translation

    Authors: Javier Picorel, Djordje Jevdjic, Babak Falsafi

    Abstract: Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs hav… ▽ More

    Submitted 21 August, 2017; v1 submitted 1 December, 2016; originally announced December 2016.

    Comments: 15 pages, 9 figures