Skip to main content

Showing 1–9 of 9 results for author: Maghraoui, K E

.
  1. arXiv:2405.16646  [pdf, other

    cs.LG

    A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

    Authors: Mohammed Nowaz Rabbani Chowdhury, Meng Wang, Kaoutar El Maghraoui, Naigang Wang, Pin-Yu Chen, Christopher Carothers

    Abstract: The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks, i.e., experts, through trainable routers. MoE reduces the training computation significantly for large models, but its deployment can be still memory or computation expensive for some downstream tasks. Model pruning is a popular approach to reduce inference computation, but its application in… ▽ More

    Submitted 30 May, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

    Journal ref: The 41st International Conference on Machine Learning, ICML 2024

  2. arXiv:2403.08796  [pdf, other

    eess.IV cs.CV cs.NE

    Analog In-Memory Computing with Uncertainty Quantification for Efficient Edge-based Medical Imaging Segmentation

    Authors: Imane Hamzaoui, Hadjer Benmeziane, Zayneb Cherif, Kaoutar El Maghraoui

    Abstract: This work investigates the role of the emerging Analog In-memory computing (AIMC) paradigm in enabling Medical AI analysis and improving the certainty of these models at the edge. It contrasts AIMC's efficiency with traditional digital computing's limitations in power, speed, and scalability. Our comprehensive evaluation focuses on brain tumor analysis, spleen segmentation, and nuclei detection. T… ▽ More

    Submitted 1 February, 2024; originally announced March 2024.

  3. arXiv:2309.11246  [pdf, other

    cs.LG cs.NE

    Grassroots Operator Search for Model Edge Adaptation

    Authors: Hadjer Benmeziane, Kaoutar El Maghraoui, Hamza Ouarnoughi, Smail Niar

    Abstract: Hardware-aware Neural Architecture Search (HW-NAS) is increasingly being used to design efficient deep learning architectures. An efficient and flexible search space is crucial to the success of HW-NAS. Current approaches focus on designing a macro-architecture and searching for the architecture's hyperparameters based on a set of possible values. This approach is biased by the expertise of deep l… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  4. Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

    Authors: Manuel Le Gallo, Corey Lammie, Julian Buechel, Fabio Carta, Omobayode Fagbohungbe, Charles Mackin, Hsinyu Tsai, Vijay Narayanan, Abu Sebastian, Kaoutar El Maghraoui, Malte J. Rasch

    Abstract: Analog In-Memory Computing (AIMC) is a promising approach to reduce the latency and energy consumption of Deep Neural Network (DNN) inference and training. However, the noisy and non-linear device characteristics, and the non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be deployed on such hardware to achieve equivalent accuracy to digital computing. In this tutorial, we prov… ▽ More

    Submitted 26 January, 2024; v1 submitted 18 July, 2023; originally announced July 2023.

    Journal ref: APL Machine Learning (2023) 1 (4): 041102

  5. arXiv:2305.10459  [pdf, other

    cs.AR cs.CV cs.LG

    AnalogNAS: A Neural Network Design Framework for Accurate Inference with Analog In-Memory Computing

    Authors: Hadjer Benmeziane, Corey Lammie, Irem Boybat, Malte Rasch, Manuel Le Gallo, Hsinyu Tsai, Ramachandran Muralidhar, Smail Niar, Ouarnoughi Hamza, Vijay Narayanan, Abu Sebastian, Kaoutar El Maghraoui

    Abstract: The advancement of Deep Learning (DL) is driven by efficient Deep Neural Network (DNN) design and new hardware accelerators. Current DNN design is primarily tailored for general-purpose use and deployment on commercially viable platforms. Inference at the edge requires low latency, compact and power-efficient models, and must be cost-effective. Digital processors based on typical von Neumann archi… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to IEEE Edge

  6. A flexible and fast PyTorch toolkit for simulating training and inference on analog crossbar arrays

    Authors: Malte J. Rasch, Diego Moreda, Tayfun Gokmen, Manuel Le Gallo, Fabio Carta, Cindy Goldberg, Kaoutar El Maghraoui, Abu Sebastian, Vijay Narayanan

    Abstract: We introduce the IBM Analog Hardware Acceleration Kit, a new and first of a kind open source toolkit to simulate analog crossbar arrays in a convenient fashion from within PyTorch (freely available at https://github.com/IBM/aihwkit). The toolkit is under active development and is centered around the concept of an "analog tile" which captures the computations performed on a crossbar array. Analog t… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: Submitted to AICAS2021

  7. arXiv:2103.10911  [pdf, other

    cs.DC cs.AI

    Performance Analysis of Deep Learning Workloads on a Composable System

    Authors: Kauotar El Maghraoui, Lorraine M. Herger, Chekuri Choudary, Kim Tran, Todd Deshane, David Hanson

    Abstract: A composable infrastructure is defined as resources, such as compute, storage, accelerators and networking, that are shared in a pool and that can be grouped in various configurations to meet application requirements. This freedom to 'mix and match' resources dynamically allows for experimentation early in the design cycle, prior to the final architectural design or hardware implementation of a sy… ▽ More

    Submitted 19 March, 2021; originally announced March 2021.

    Comments: Submitted to IPDPS ScaDL 2021

  8. arXiv:2101.09336  [pdf, other

    cs.LG cs.CC

    A Comprehensive Survey on Hardware-Aware Neural Architecture Search

    Authors: Hadjer Benmeziane, Kaoutar El Maghraoui, Hamza Ouarnoughi, Smail Niar, Martin Wistuba, Naigang Wang

    Abstract: Neural Architecture Search (NAS) methods have been growing in popularity. These techniques have been fundamental to automate and speed up the time consuming and error-prone process of synthesizing novel Deep Learning (DL) architectures. NAS has been extensively studied in the past few years. Arguably their most significant impact has been in image classification and object detection tasks where th… ▽ More

    Submitted 22 January, 2021; originally announced January 2021.

    Comments: Submitted to Proceedings of IEEE

  9. arXiv:1805.06801  [pdf, other

    cs.DC

    Dependability in a Multi-tenant Multi-framework Deep Learning as-a-Service Platform

    Authors: Scott Boag, Parijat Dube, Kaoutar El Maghraoui, Benjamin Herta, Waldemar Hummer, K. R. Jayaram, Rania Khalaf, Vinod Muthusamy, Michael Kalantar, Archit Verma

    Abstract: Deep learning (DL), a form of machine learning, is becoming increasingly popular in several application domains. As a result, cloud-based Deep Learning as a Service (DLaaS) platforms have become an essential infrastructure in many organizations. These systems accept, schedule, manage and execute DL training jobs at scale. This paper explores dependability in the context of a DLaaS platform used… ▽ More

    Submitted 17 May, 2018; originally announced May 2018.