Skip to main content

Showing 1–1 of 1 results for author: Tomut, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.14109  [pdf, other

    cs.CL cs.AI cs.LG quant-ph

    CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks

    Authors: Andrei Tomut, Saeed S. Jahromi, Abhijoy Sarkar, Uygar Kurt, Sukhbinder Singh, Faysal Ishtiaq, Cesar Muñoz, Prabdeep Singh Bajaj, Ali Elborady, Gianni del Bimbo, Mehrazin Alizadeh, David Montero, Pablo Martin-Ramiro, Muhammad Ibrahim, Oussama Tahiri Alaoui, John Malcolm, Samuel Mugel, Roman Orus

    Abstract: Large Language Models (LLMs) such as ChatGPT and LlaMA are advancing rapidly in generative Artificial Intelligence (AI), but their immense size poses significant challenges, such as huge training and inference costs, substantial energy demands, and limitations for on-site deployment. Traditional compression methods such as pruning, distillation, and low-rank approximation focus on reducing the eff… ▽ More

    Submitted 13 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures, 2 tables, and supplementary information of 2 pages and 1 figure. Revised version with new benchmarks for LlaMA2-7B