Skip to main content

Showing 1–3 of 3 results for author: Edo, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2010.08065  [pdf, other

    cs.AR cs.AI

    FPRaker: A Processing Element For Accelerating Neural Network Training

    Authors: Omar Mohamed Awad, Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Ciaran Bannon, Anand Jayarajan, Gennady Pekhimenko, Andreas Moshovos

    Abstract: We present FPRaker, a processing element for composing training accelerators. FPRaker processes several floating-point multiply-accumulation operations concurrently and accumulates their result into a higher precision accumulator. FPRaker boosts performance and energy efficiency during training by taking advantage of the values that naturally appear during training. Specifically, it processes the… ▽ More

    Submitted 15 October, 2020; originally announced October 2020.

  2. TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

    Authors: Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, Andreas Moshovos

    Abstract: TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training process while also increasing energy efficiency. TensorDash combines a low-cost, sparse input operand interconnect comprising an 8-input multiplexer per multipli… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

  3. GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference

    Authors: Ali Hadi Zadeh, Isak Edo, Omar Mohamed Awad, Andreas Moshovos

    Abstract: Attention-based models have demonstrated remarkable success in various natural language understanding tasks. However, efficient execution remains a challenge for these models which are memory-bound due to their massive number of parameters. We present GOBO, a model quantization technique that compresses the vast majority (typically 99.9%) of the 32-bit floating-point parameters of state-of-the-art… ▽ More

    Submitted 26 September, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: Accepted at the 53rd IEEE/ACM International Symposium on Microarchitecture - MICRO 2020