Skip to main content

Showing 1–6 of 6 results for author: Mullins, R D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.14963  [pdf, other

    cs.LG

    Optimised Grouped-Query Attention Mechanism for Transformers

    Authors: Yuang Chen, Cheng Zhang, Xitong Gao, Robert D. Mullins, George A. Constantinides, Yiren Zhao

    Abstract: Grouped-query attention (GQA) has been widely adopted in LLMs to mitigate the complexity of multi-head attention (MHA). To transform an MHA to a GQA, neighbour queries in MHA are evenly split into groups where each group shares the value and key layers. In this work, we propose AsymGQA, an activation-informed approach to asymmetrically grou** an MHA to a GQA for better model performance. Our Asy… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML2024 ES-FoMo-II Workshop

  2. arXiv:2406.14956  [pdf, other

    cs.LG cs.CL

    Unlocking the Global Synergies in Low-Rank Adapters

    Authors: Zixi Zhang, Cheng Zhang, Xitong Gao, Robert D. Mullins, George A. Constantinides, Yiren Zhao

    Abstract: Low-rank Adaption (LoRA) has been the de-facto parameter-efficient fine-tuning technique for large language models. We present HeteroLoRA, a light-weight search algorithm that leverages zero-cost proxies to allocate the limited LoRA trainable parameters across the model for better fine-tuned performance. In addition to the allocation for the standard LoRA-adapted models, we also demonstrate the ef… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML2024 ES-FoMo-II Workshop

  3. arXiv:2210.02570  [pdf, other

    cs.LG cs.AI cs.CL

    Revisiting Structured Dropout

    Authors: Yiren Zhao, Oluwatomisin Dada, Xitong Gao, Robert D Mullins

    Abstract: Large neural networks are often overparameterised and prone to overfitting, Dropout is a widely used regularization technique to combat overfitting and improve model generalization. However, unstructured Dropout is not always effective for specific network architectures and this has led to the formation of multiple structured Dropout approaches to improve model performance and, sometimes, reduce t… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  4. arXiv:2210.00641  [pdf, other

    cs.LG

    DARTFormer: Finding The Best Type Of Attention

    Authors: Jason Ross Brown, Yiren Zhao, Ilia Shumailov, Robert D Mullins

    Abstract: Given the wide and ever growing range of different efficient Transformer attention mechanisms, it is important to identify which attention is most effective when given a task. In this work, we are also interested in combining different attention types to build heterogeneous Transformers. We first propose a DARTS-like Neural Architecture Search (NAS) method to find the best attention for a given ta… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

    ACM Class: I.2.7; I.2.6

  5. arXiv:2210.00640  [pdf, other

    cs.LG

    Wide Attention Is The Way Forward For Transformers?

    Authors: Jason Ross Brown, Yiren Zhao, Ilia Shumailov, Robert D Mullins

    Abstract: The Transformer is an extremely powerful and prominent deep learning architecture. In this work, we challenge the commonly held belief in deep learning that going deeper is better, and show an alternative design approach that is building wider attention Transformers. We demonstrate that wide single layer Transformer models can compete with or outperform deeper ones in a variety of Natural Language… ▽ More

    Submitted 8 November, 2022; v1 submitted 2 October, 2022; originally announced October 2022.

    ACM Class: I.2.7

  6. arXiv:2209.09338  [pdf, other

    cs.LG

    Revisiting Embeddings for Graph Neural Networks

    Authors: S. Purchase, A. Zhao, R. D. Mullins

    Abstract: Current graph representation learning techniques use Graph Neural Networks (GNNs) to extract features from dataset embeddings. In this work, we examine the quality of these embeddings and assess how changing them can affect the accuracy of GNNs. We explore different embedding extraction techniques for both images and texts; and find that the performance of different GNN architectures is dependent… ▽ More

    Submitted 29 November, 2022; v1 submitted 19 September, 2022; originally announced September 2022.