Skip to main content

Showing 1–1 of 1 results for author: Karle, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2209.12127  [pdf, other

    cs.LG

    SpeedLimit: Neural Architecture Search for Quantized Transformer Models

    Authors: Yuji Chai, Luke Bailey, Yunho **, Matthew Karle, Glenn G. Ko, David Brooks, Gu-Yeon Wei, H. T. Kung

    Abstract: While research in the field of transformer models has primarily focused on enhancing performance metrics such as accuracy and perplexity, practical applications in industry often necessitate a rigorous consideration of inference latency constraints. Addressing this challenge, we introduce SpeedLimit, a novel Neural Architecture Search (NAS) technique that optimizes accuracy whilst adhering to an u… ▽ More

    Submitted 13 October, 2023; v1 submitted 24 September, 2022; originally announced September 2022.