Skip to main content

Showing 1–11 of 11 results for author: Akhauri, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16635  [pdf, other

    cs.LG cs.AI cs.CL

    ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models

    Authors: Yash Akhauri, Ahmed F AbouElhamayed, Jordan Dotzel, Zhiru Zhang, Alexander M Rush, Safeen Huda, Mohamed S Abdelfattah

    Abstract: The high power consumption and latency-sensitive deployments of large language models (LLMs) have motivated techniques like quantization and sparsity. Contextual sparsity, where the sparsity pattern is input-dependent, is crucial in LLMs because the permanent removal of attention heads or neurons from LLMs can significantly degrade accuracy. Prior work has attempted to model contextual sparsity us… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2404.04900  [pdf, other

    cs.CL

    Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models

    Authors: Jordan Dotzel, Yash Akhauri, Ahmed S. AbouElhamayed, Carly Jiang, Mohamed Abdelfattah, Zhiru Zhang

    Abstract: Large language models (LLMs) often struggle with strict memory, latency, and power demands. To meet these demands, various forms of dynamic sparsity have been proposed that reduce compute on an input-by-input basis. These methods improve over static methods by exploiting the variance across individual inputs, which has steadily grown with the exponential increase in training data. Yet, the increas… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: First two authors have equal contribution

  3. arXiv:2403.02484  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Encodings for Prediction-based Neural Architecture Search

    Authors: Yash Akhauri, Mohamed S. Abdelfattah

    Abstract: Predictor-based methods have substantially enhanced Neural Architecture Search (NAS) optimization. The efficacy of these predictors is largely influenced by the method of encoding neural network architectures. While traditional encodings used an adjacency matrix describing the graph structure of a neural network, novel encodings embrace a variety of approaches from unsupervised pretraining of late… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  4. arXiv:2403.02446  [pdf, other

    cs.LG cs.AR cs.CV cs.PF

    On Latency Predictors for Neural Architecture Search

    Authors: Yash Akhauri, Mohamed S. Abdelfattah

    Abstract: Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint on a specific hardware device. Central to these search algorithms is a prediction model that is designed to provide a hardware latency estimate for a candidate… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted at MLSys'24

  5. arXiv:2306.02459  [pdf, other

    cs.LG cs.AR cs.CV cs.PF

    Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search

    Authors: Yash Akhauri, Mohamed S. Abdelfattah

    Abstract: Many hardware-aware neural architecture search (NAS) methods have been developed to optimize the topology of neural networks (NN) with the joint objectives of higher accuracy and lower latency. Recently, both accuracy and latency predictors have been used in NAS with great success, achieving high sample efficiency and accurate modeling of hardware (HW) device latency respectively. However, a new a… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

  6. arXiv:2209.07413  [pdf, other

    cs.LG cs.CV cs.NE

    EZNAS: Evolving Zero Cost Proxies For Neural Architecture Scoring

    Authors: Yash Akhauri, J. Pablo Munoz, Nilesh Jain, Ravi Iyer

    Abstract: Neural Architecture Search (NAS) has significantly improved productivity in the design and deployment of neural networks (NN). As NAS typically evaluates multiple models by training them partially or completely, the improved productivity comes at the cost of significant carbon footprint. To alleviate this expensive training routine, zero-shot/cost proxies analyze an NN at initialization to generat… ▽ More

    Submitted 21 December, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

  7. arXiv:2112.10878  [pdf, other

    cs.LG

    Enabling NAS with Automated Super-Network Generation

    Authors: J. Pablo Muñoz, Nikolay Lyalyushkin, Yash Akhauri, Anastasia Senina, Alexander Kozlov, Nilesh Jain

    Abstract: Recent Neural Architecture Search (NAS) solutions have produced impressive results training super-networks and then deriving subnetworks, a.k.a. child models that outperform expert-crafted models from a pre-defined search space. Efficient and robust subnetworks can be selected for resource-constrained edge devices, allowing them to perform well in the wild. However, constructing super-networks for… ▽ More

    Submitted 20 December, 2021; originally announced December 2021.

    Comments: Accepted at AAAI2022 - Practical Deep Learning in the Wild

    ACM Class: I.2; D.0; I.2.2

  8. arXiv:2106.09180  [pdf, other

    cs.LG cs.AR cs.NE

    RHNAS: Realizable Hardware and Neural Architecture Search

    Authors: Yash Akhauri, Adithya Niranjan, J. Pablo Muñoz, Suvadeep Banerjee, Abhijit Davare, Pasquale Cocchini, Anton A. Sorokin, Ravi Iyer, Nilesh Jain

    Abstract: The rapidly evolving field of Artificial Intelligence necessitates automated approaches to co-design neural network architecture and neural accelerators to maximize system efficiency and address productivity challenges. To enable joint optimization of this vast space, there has been growing interest in differentiable NN-HW co-design. Fully differentiable co-design has reduced the resource requirem… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 15 pages

  9. Exposing Hardware Building Blocks to Machine Learning Frameworks

    Authors: Yash Akhauri

    Abstract: There are a plethora of applications that demand high throughput and low latency algorithms leveraging machine learning methods. This need for real time processing can be seen in industries ranging from develo** neural network based pre-distortors for enhanced mobile broadband to designing FPGA-based triggers in major scientific efforts by CERN for particle physics. In this thesis, we explore ho… ▽ More

    Submitted 10 April, 2020; originally announced April 2020.

    Comments: 62 pages, 22 figures, 14 tables

  10. arXiv:2004.03021  [pdf, other

    eess.SP cs.AR cs.LG

    LogicNets: Co-Designed Neural Networks and Circuits for Extreme-Throughput Applications

    Authors: Yaman Umuroglu, Yash Akhauri, Nicholas J. Fraser, Michaela Blott

    Abstract: Deployment of deep neural networks for applications that require very high throughput or extremely low latency is a severe computational challenge, further exacerbated by inefficiencies in map** the computation to hardware. We present a novel method for designing neural network topologies that directly map to a highly efficient FPGA implementation. By exploiting the equivalence of artificial neu… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

  11. HadaNets: Flexible Quantization Strategies for Neural Networks

    Authors: Yash Akhauri

    Abstract: On-board processing elements on UAVs are currently inadequate for training and inference of Deep Neural Networks. This is largely due to the energy consumption of memory accesses in such a network. HadaNets introduce a flexible train-from-scratch tensor quantization scheme by pairing a full precision tensor to a binary tensor in the form of a Hadamard product. Unlike wider reduced precision neural… ▽ More

    Submitted 26 May, 2019; originally announced May 2019.

    Comments: Accepted in CVPR 2019, UAVision 2019