Skip to main content

Showing 1–8 of 8 results for author: Hajimolahoseini, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.19995  [pdf, other

    cs.CL cs.AI cs.LG

    Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model

    Authors: Habib Hajimolahoseini, Mohammad Hassanpour, Foozhan Ataiefard, Boxing Chen, Yang Liu

    Abstract: This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. Our approach leverages a pre-trained model, which is then incrementally decompressed to smaller sizes using progressively lower ranks. This method allows for significant reductions in computational overhead and energy consumption, as subsequent models are derived… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  2. arXiv:2401.15293  [pdf, other

    cs.CV cs.AI cs.LG

    SkipViT: Speeding Up Vision Transformers with a Token-Level Skip Connection

    Authors: Foozhan Ataiefard, Walid Ahmed, Habib Hajimolahoseini, Saina Asani, Farnoosh Javadi, Mohammad Hassanpour, Omar Mohamed Awad, Austin Wen, Kangling Liu, Yang Liu

    Abstract: Vision transformers are known to be more computationally and data-intensive than CNN models. These transformer models such as ViT, require all the input image tokens to learn the relationship among them. However, many of these tokens are not informative and may contain irrelevant information such as unrelated background or unimportant scenery. These tokens are overlooked by the multi-head self-att… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

  3. arXiv:2311.15134  [pdf, other

    cs.LG cs.AI

    SwiftLearn: A Data-Efficient Training Method of Deep Learning Models using Importance Sampling

    Authors: Habib Hajimolahoseini, Omar Mohamed Awad, Walid Ahmed, Austin Wen, Saina Asani, Mohammad Hassanpour, Farnoosh Javadi, Mehdi Ahmadi, Foozhan Ataiefard, Kangling Liu, Yang Liu

    Abstract: In this paper, we present SwiftLearn, a data-efficient approach to accelerate training of deep learning models using a subset of data samples selected during the warm-up stages of training. This subset is selected based on an importance criteria measured over the entire dataset during warm-up stages, aiming to preserve the model performance with fewer examples during the rest of training. The impo… ▽ More

    Submitted 25 November, 2023; originally announced November 2023.

  4. arXiv:2311.03426  [pdf, other

    cs.LG cs.AI cs.CV

    GQKVA: Efficient Pre-training of Transformers by Grou** Queries, Keys, and Values

    Authors: Farnoosh Javadi, Walid Ahmed, Habib Hajimolahoseini, Foozhan Ataiefard, Mohammad Hassanpour, Saina Asani, Austin Wen, Omar Mohamed Awad, Kangling Liu, Yang Liu

    Abstract: Massive transformer-based models face several challenges, including slow and computationally intensive pre-training and over-parametrization. This paper addresses these challenges by proposing a versatile method called GQKVA, which generalizes query, key, and value grou** techniques. GQKVA is designed to speed up transformer pre-training while reducing the model size. Our experiments with variou… ▽ More

    Submitted 13 December, 2023; v1 submitted 6 November, 2023; originally announced November 2023.

  5. arXiv:2309.12412  [pdf, other

    cs.CV cs.LG

    Speeding up Resnet Architecture with Layers Targeted Low Rank Decomposition

    Authors: Walid Ahmed, Habib Hajimolahoseini, Austin Wen, Yang Liu

    Abstract: Compression of a neural network can help in speeding up both the training and the inference of the network. In this research, we study applying compression using low rank decomposition on network layers. Our research demonstrates that to acquire a speed up, the compression methodology should be aware of the underlying hardware as analysis should be done to choose which layers to compress. The adva… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  6. arXiv:2309.03965  [pdf, other

    cs.LG cs.CV

    Improving Resnet-9 Generalization Trained on Small Datasets

    Authors: Omar Mohamed Awad, Habib Hajimolahoseini, Michael Lim, Gurpreet Gosal, Walid Ahmed, Yang Liu, Gordon Deng

    Abstract: This paper presents our proposed approach that won the first prize at the ICLR competition on Hardware Aware Efficient Training. The challenge is to achieve the highest possible accuracy in an image classification task in less than 10 minutes. The training is done on a small dataset of 5000 images picked randomly from CIFAR-10 dataset. The evaluation is performed by the competition organizers on a… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  7. arXiv:2309.03824  [pdf, other

    cs.LG cs.AI

    Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization

    Authors: Habib Hajimolahoseini, Walid Ahmed, Yang Liu

    Abstract: Low Rank Decomposition (LRD) is a model compression technique applied to the weight tensors of deep learning models in order to reduce the number of trainable parameters and computational complexity. However, due to high number of new layers added to the architecture after applying LRD, it may not lead to a high training/inference acceleration if the decomposition ranks are not small enough. The i… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  8. arXiv:2110.08460  [pdf, other

    cs.CL

    A Short Study on Compressing Decoder-Based Language Models

    Authors: Tianda Li, Yassir El Mesbahi, Ivan Kobyzev, Ahmad Rashid, Atif Mahmud, Nithin Anchuri, Habib Hajimolahoseini, Yang Liu, Mehdi Rezagholizadeh

    Abstract: Pre-trained Language Models (PLMs) have been successful for a wide range of natural language processing (NLP) tasks. The state-of-the-art of PLMs, however, are extremely large to be used on edge devices. As a result, the topic of model compression has attracted increasing attention in the NLP community. Most of the existing works focus on compressing encoder-based models (tiny-BERT, distilBERT, di… ▽ More

    Submitted 15 October, 2021; originally announced October 2021.