Skip to main content

Showing 1–6 of 6 results for author: Tayaranian, M

.
  1. arXiv:2402.01169  [pdf, ps, other

    cs.CV cs.AI

    Faster Inference of Integer SWIN Transformer by Removing the GELU Activation

    Authors: Mohammadreza Tayaranian, Seyyed Hasan Mozafari, James J. Clark, Brett Meyer, Warren Gross

    Abstract: SWIN transformer is a prominent vision transformer model that has state-of-the-art accuracy in image classification tasks. Despite this success, its unique architecture causes slower inference compared with similar deep neural networks. Integer quantization of the model is one of the methods used to improve its inference latency. However, state-of-the-art has not been able to fully quantize the mo… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 5 pages, 1 figure. Submitted to Edge Intelligence Workshop III, an AAAI 2024 workshop

  2. arXiv:2209.09815  [pdf, other

    cs.LG

    Towards Fine-tuning Pre-trained Language Models with Integer Forward and Backward Propagation

    Authors: Mohammadreza Tayaranian, Alireza Ghaffari, Marzieh S. Tahaei, Mehdi Rezagholizadeh, Masoud Asgharian, Vahid Partovi Nia

    Abstract: The large number of parameters of some prominent language models, such as BERT, makes their fine-tuning on downstream tasks computationally intensive and energy hungry. Previously researchers were focused on lower bit-width integer data types for the forward propagation of language models to save memory and computation. As for the backward propagation, however, only 16-bit floating-point data type… ▽ More

    Submitted 12 February, 2023; v1 submitted 20 September, 2022; originally announced September 2022.

    Comments: Accepted in EACL2023

  3. arXiv:2208.02070  [pdf, other

    cs.CL cs.LG

    Efficient Fine-Tuning of Compressed Language Models with Learners

    Authors: Danilo Vucetic, Mohammadreza Tayaranian, Maryam Ziaeefard, James J. Clark, Brett H. Meyer, Warren J. Gross

    Abstract: Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the computational challenges of training to downstream tasks. We introduce Learner modules and priming, novel methods for fine-tuning that exploit the overparameterization of… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: 8 pages, 9 figures, 2 tables, presented at ICML 2022 workshop on Hardware-Aware Efficient Training (HAET 2022)

  4. arXiv:2207.08822  [pdf, other

    cs.LG cs.CC

    Is Integer Arithmetic Enough for Deep Learning Training?

    Authors: Alireza Ghaffari, Marzieh S. Tahaei, Mohammadreza Tayaranian, Masoud Asgharian, Vahid Partovi Nia

    Abstract: The ever-increasing computational complexity of deep learning models makes their training and deployment difficult on various cloud and edge platforms. Replacing floating-point arithmetic with low-bit integer arithmetic is a promising approach to save energy, memory footprint, and latency of deep learning models. As such, quantization has attracted the attention of researchers in recent years. How… ▽ More

    Submitted 4 January, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

    Comments: final camera ready submitted to NeurIPS

  5. Efficient Fine-Tuning of BERT Models on the Edge

    Authors: Danilo Vucetic, Mohammadreza Tayaranian, Maryam Ziaeefard, James J. Clark, Brett H. Meyer, Warren J. Gross

    Abstract: Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased reso… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: 4 pages, 2 figures, 3 tables. To be published in ISCAS 2022 and made available on IEEE Xplore

  6. arXiv:2001.00053  [pdf, other

    cs.LG cs.NE

    On the Resilience of Deep Learning for Reduced-voltage FPGAs

    Authors: Kamyar Givaki, Behzad Salami, Reza Hojabr, S. M. Reza Tayaranian, Ahmad Khonsari, Dara Rahmati, Saeid Gorgin, Adrian Cristal, Osman S. Unsal

    Abstract: Deep Neural Networks (DNNs) are inherently computation-intensive and also power-hungry. Hardware accelerators such as Field Programmable Gate Arrays (FPGAs) are a promising solution that can satisfy these requirements for both embedded and High-Performance Computing (HPC) systems. In FPGAs, as well as CPUs and GPUs, aggressive voltage scaling below the nominal level is an effective technique for p… ▽ More

    Submitted 26 December, 2019; originally announced January 2020.