Skip to main content

Showing 1–1 of 1 results for author: Jabbour, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.17574  [pdf, other

    cs.CL cs.LG

    Scavenging Hyena: Distilling Transformers into Long Convolution Models

    Authors: Tokiniaina Raharison Ralambomihanta, Shahrad Mohammadzadeh, Mohammad Sami Nur Islam, Wassim Jabbour, Laurence Liang

    Abstract: The rapid evolution of Large Language Models (LLMs), epitomized by architectures like GPT-4, has reshaped the landscape of natural language processing. This paper introduces a pioneering approach to address the efficiency concerns associated with LLM pre-training, proposing the use of knowledge distillation for cross-architecture transfer. Leveraging insights from the efficient Hyena mechanism, ou… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 9 pages, 2 figures