Skip to main content

Showing 1–10 of 10 results for author: Sarah, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.18377  [pdf, other

    cs.AI

    LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models

    Authors: Anthony Sarah, Sharath Nittur Sridhar, Maciej Szankin, Sairam Sundaresan

    Abstract: The abilities of modern large language models (LLMs) in solving natural language processing, complex reasoning, sentiment analysis and other tasks have been extraordinary which has prompted their extensive adoption. Unfortunately, these abilities come with very high memory and computational costs which precludes the use of LLMs on most hardware platforms. To mitigate this, we propose an effective… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  2. arXiv:2402.11780  [pdf, other

    cs.AR cs.AI

    CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware

    Authors: Souvik Kundu, Anthony Sarah, Vinay Joshi, Om J Omer, Sreenivas Subramoney

    Abstract: With the recent growth in demand for large-scale deep neural networks, compute in-memory (CiM) has come up as a prominent solution to alleviate bandwidth and on-chip interconnect bottlenecks that constrain Von-Neuman architectures. However, the construction of CiM hardware poses a challenge as any specific memory hierarchy in terms of cache sizes and memory bandwidth at different interfaces may no… ▽ More

    Submitted 18 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

    Comments: 6 pages, 4 figures, 5 tables; Accepted as a full paper by the tinyML Research Symposium 2024

  3. arXiv:2312.13301  [pdf, other

    cs.LG cs.AI

    SimQ-NAS: Simultaneous Quantization Policy and Neural Architecture Search

    Authors: Sharath Nittur Sridhar, Maciej Szankin, Fang Chen, Sairam Sundaresan, Anthony Sarah

    Abstract: Recent one-shot Neural Architecture Search algorithms rely on training a hardware-agnostic super-network tailored to a specific task and then extracting efficient sub-networks for different hardware platforms. Popular approaches separate the training of super-networks from the search for sub-networks, often employing predictors to alleviate the computational overhead associated with search. Additi… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  4. arXiv:2309.02592  [pdf, other

    eess.AS cs.SD

    BWSNet: Automatic Perceptual Assessment of Audio Signals

    Authors: Clément Le Moine Veillon, Victor Rosi, Pablo Arias Sarah, Léane Salais, Nicolas Obin

    Abstract: This paper introduces BWSNet, a model that can be trained from raw human judgements obtained through a Best-Worst scaling (BWS) experiment. It maps sound samples into an embedded space that represents the perception of a studied attribute. To this end, we propose a set of cost functions and constraints, interpreting trial-wise ordinal relations as distance comparisons in a metric learning task. We… ▽ More

    Submitted 21 January, 2024; v1 submitted 5 September, 2023; originally announced September 2023.

  5. arXiv:2308.15609  [pdf, other

    cs.LG cs.AI

    InstaTune: Instantaneous Neural Architecture Search During Fine-Tuning

    Authors: Sharath Nittur Sridhar, Souvik Kundu, Sairam Sundaresan, Maciej Szankin, Anthony Sarah

    Abstract: One-Shot Neural Architecture Search (NAS) algorithms often rely on training a hardware agnostic super-network for a domain specific task. Optimal sub-networks are then extracted from the trained super-network for different hardware platforms. However, training super-networks from scratch can be extremely time consuming and compute intensive especially for large models that rely on a two-stage trai… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  6. arXiv:2205.10358  [pdf, other

    cs.LG cs.NE

    A Hardware-Aware Framework for Accelerating Neural Architecture Search Across Modalities

    Authors: Daniel Cummings, Anthony Sarah, Sharath Nittur Sridhar, Maciej Szankin, Juan Pablo Munoz, Sairam Sundaresan

    Abstract: Recent advances in Neural Architecture Search (NAS) such as one-shot NAS offer the ability to extract specialized hardware-aware sub-network configurations from a task-specific super-network. While considerable effort has been employed towards improving the first stage, namely, the training of the super-network, the search for derivative high-performing sub-networks is still under-explored. Popula… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

  7. arXiv:2202.12954  [pdf, other

    cs.AI

    A Hardware-Aware System for Accelerating Deep Neural Network Optimization

    Authors: Anthony Sarah, Daniel Cummings, Sharath Nittur Sridhar, Sairam Sundaresan, Maciej Szankin, Tristan Webb, J. Pablo Munoz

    Abstract: Recent advances in Neural Architecture Search (NAS) which extract specialized hardware-aware configurations (a.k.a. "sub-networks") from a hardware-agnostic "super-network" have become increasingly popular. While considerable effort has been employed towards improving the first stage, namely, the training of the super-network, the search for derivative high-performing sub-networks is still largely… ▽ More

    Submitted 25 February, 2022; originally announced February 2022.

  8. arXiv:2202.12934  [pdf, other

    cs.NE

    Accelerating Neural Architecture Exploration Across Modalities Using Genetic Algorithms

    Authors: Daniel Cummings, Sharath Nittur Sridhar, Anthony Sarah, Maciej Szankin

    Abstract: Neural architecture search (NAS), the study of automating the discovery of optimal deep neural network architectures for tasks in domains such as computer vision and natural language processing, has seen rapid growth in the machine learning research community. While there have been many recent advancements in NAS, there is still a significant focus on reducing the computational cost incurred when… ▽ More

    Submitted 25 February, 2022; originally announced February 2022.

  9. arXiv:2202.12411  [pdf, other

    cs.CL

    TrimBERT: Tailoring BERT for Trade-offs

    Authors: Sharath Nittur Sridhar, Anthony Sarah, Sairam Sundaresan

    Abstract: Models based on BERT have been extremely successful in solving a variety of natural language processing (NLP) tasks. Unfortunately, many of these large models require a great deal of computational resources and/or time for pre-training and fine-tuning which limits wider adoptability. While self-attention layers have been well-studied, a strong justification for inclusion of the intermediate layers… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2012.11881

  10. arXiv:2012.11881  [pdf, other

    cs.CL cs.AI

    Undivided Attention: Are Intermediate Layers Necessary for BERT?

    Authors: Sharath Nittur Sridhar, Anthony Sarah

    Abstract: In recent times, BERT-based models have been extremely successful in solving a variety of natural language processing (NLP) tasks such as reading comprehension, natural language inference, sentiment analysis, etc. All BERT-based architectures have a self-attention block followed by a block of intermediate layers as the basic building component. However, a strong justification for the inclusion of… ▽ More

    Submitted 4 April, 2023; v1 submitted 22 December, 2020; originally announced December 2020.