Skip to main content

Showing 1–3 of 3 results for author: Ziaeefard, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.07606  [pdf, other

    cs.CV cs.LG

    CES-KD: Curriculum-based Expert Selection for Guided Knowledge Distillation

    Authors: Ibtihel Amara, Maryam Ziaeefard, Brett H. Meyer, Warren Gross, James J. Clark

    Abstract: Knowledge distillation (KD) is an effective tool for compressing deep classification models for edge devices. However, the performance of KD is affected by the large capacity gap between the teacher and student networks. Recent methods have resorted to a multiple teacher assistant (TA) setting for KD, which sequentially decreases the size of the teacher model to relatively bridge the size gap betw… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: ICPR2022

  2. arXiv:2208.02070  [pdf, other

    cs.CL cs.LG

    Efficient Fine-Tuning of Compressed Language Models with Learners

    Authors: Danilo Vucetic, Mohammadreza Tayaranian, Maryam Ziaeefard, James J. Clark, Brett H. Meyer, Warren J. Gross

    Abstract: Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the computational challenges of training to downstream tasks. We introduce Learner modules and priming, novel methods for fine-tuning that exploit the overparameterization of… ▽ More

    Submitted 3 August, 2022; originally announced August 2022.

    Comments: 8 pages, 9 figures, 2 tables, presented at ICML 2022 workshop on Hardware-Aware Efficient Training (HAET 2022)

  3. Efficient Fine-Tuning of BERT Models on the Edge

    Authors: Danilo Vucetic, Mohammadreza Tayaranian, Maryam Ziaeefard, James J. Clark, Brett H. Meyer, Warren J. Gross

    Abstract: Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased reso… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: 4 pages, 2 figures, 3 tables. To be published in ISCAS 2022 and made available on IEEE Xplore