-
Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead
Authors:
Rickard Brüel-Gabrielsson,
Jiacheng Zhu,
Onkar Bhardwaj,
Leshem Choshen,
Kristjan Greenewald,
Mikhail Yurochkin,
Justin Solomon
Abstract:
Fine-tuning large language models (LLMs) with low-rank adapters (LoRAs) has become common practice, often yielding numerous copies of the same LLM differing only in their LoRA updates. This paradigm presents challenges for systems that serve real-time responses to queries that each involve a different LoRA. Prior works optimize the design of such systems but still require continuous loading and of…
▽ More
Fine-tuning large language models (LLMs) with low-rank adapters (LoRAs) has become common practice, often yielding numerous copies of the same LLM differing only in their LoRA updates. This paradigm presents challenges for systems that serve real-time responses to queries that each involve a different LoRA. Prior works optimize the design of such systems but still require continuous loading and offloading of LoRAs, as it is infeasible to store thousands of LoRAs in GPU memory. To mitigate this issue, we investigate the efficacy of compression when serving LoRA adapters. We consider compressing adapters individually via SVD and propose a method for joint compression of LoRAs into a shared basis paired with LoRA-specific scaling matrices. Our experiments with up to 500 LoRAs demonstrate that compressed LoRAs preserve performance while offering major throughput gains in realistic serving scenarios with over a thousand LoRAs, maintaining 75% of the throughput of serving a single LoRA.
△ Less
Submitted 17 June, 2024;
originally announced July 2024.
-
Deep Augmentation: Self-Supervised Learning with Transformations in Activation Space
Authors:
Rickard Brüel-Gabrielsson,
Tongzhou Wang,
Manel Baradad,
Justin Solomon
Abstract:
We introduce Deep Augmentation, an approach to implicit data augmentation using dropout or PCA to transform a targeted layer within a neural network to improve performance and generalization. We demonstrate Deep Augmentation through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning. We observe substantial performance gains with Transformers, ResNets, a…
▽ More
We introduce Deep Augmentation, an approach to implicit data augmentation using dropout or PCA to transform a targeted layer within a neural network to improve performance and generalization. We demonstrate Deep Augmentation through extensive experiments on contrastive learning tasks in NLP, computer vision, and graph learning. We observe substantial performance gains with Transformers, ResNets, and Graph Neural Networks as the underlying models in contrastive learning, but observe inverse effects on the corresponding supervised problems. Our analysis suggests that Deep Augmentation alleviates co-adaption between layers, a form of "collapse." We use this observation to formulate a method for selecting which layer to target; in particular, our experimentation reveals that targeting deeper layers with Deep Augmentation outperforms augmenting the input data. The simple network- and modality-agnostic nature of this approach enables its integration into various machine learning pipelines.
△ Less
Submitted 26 February, 2024; v1 submitted 25 March, 2023;
originally announced March 2023.
-
Relative Position Prediction as Pre-training for Text Encoders
Authors:
Rickard Brüel-Gabrielsson,
Chris Scarvelis
Abstract:
Meaning is defined by the company it keeps. However, company is two-fold: It's based on the identity of tokens and also on their position (topology). We argue that a position-centric perspective is more general and useful. The classic MLM and CLM objectives in NLP are easily phrased as position predictions over the whole vocabulary. Adapting the relative position encoding paradigm in NLP to create…
▽ More
Meaning is defined by the company it keeps. However, company is two-fold: It's based on the identity of tokens and also on their position (topology). We argue that a position-centric perspective is more general and useful. The classic MLM and CLM objectives in NLP are easily phrased as position predictions over the whole vocabulary. Adapting the relative position encoding paradigm in NLP to create relative labels for self-supervised learning, we seek to show superior pre-training judged by performance on downstream tasks.
△ Less
Submitted 2 February, 2022;
originally announced February 2022.
-
Rewiring with Positional Encodings for Graph Neural Networks
Authors:
Rickard Brüel-Gabrielsson,
Mikhail Yurochkin,
Justin Solomon
Abstract:
Several recent works use positional encodings to extend the receptive fields of graph neural network (GNN) layers equipped with attention mechanisms. These techniques, however, extend receptive fields to the complete graph, at substantial computational cost and risking a change in the inductive biases of conventional GNNs, or require complex architecture adjustments. As a conservative alternative,…
▽ More
Several recent works use positional encodings to extend the receptive fields of graph neural network (GNN) layers equipped with attention mechanisms. These techniques, however, extend receptive fields to the complete graph, at substantial computational cost and risking a change in the inductive biases of conventional GNNs, or require complex architecture adjustments. As a conservative alternative, we use positional encodings to expand receptive fields to $r$-hop neighborhoods. More specifically, our method augments the input graph with additional nodes/edges and uses positional encodings as node and/or edge features. We thus modify graphs before inputting them to a downstream GNN model, instead of modifying the model itself. This makes our method model-agnostic, i.e., compatible with any of the existing GNN architectures. We also provide examples of positional encodings that are lossless with a one-to-one map between the original and the modified graphs. We demonstrate that extending receptive fields via positional encodings and a virtual fully-connected node significantly improves GNN performance and alleviates over-squashing using small $r$. We obtain improvements on a variety of models and datasets and reach competitive performance using traditional GNNs or graph Transformers.
△ Less
Submitted 13 December, 2023; v1 submitted 29 January, 2022;
originally announced January 2022.
-
Universal Function Approximation on Graphs
Authors:
Rickard Brüel-Gabrielsson
Abstract:
In this work we produce a framework for constructing universal function approximators on graph isomorphism classes. We prove how this framework comes with a collection of theoretically desirable properties and enables novel analysis. We show how this allows us to achieve state-of-the-art performance on four different well-known datasets in graph classification and separate classes of graphs that o…
▽ More
In this work we produce a framework for constructing universal function approximators on graph isomorphism classes. We prove how this framework comes with a collection of theoretically desirable properties and enables novel analysis. We show how this allows us to achieve state-of-the-art performance on four different well-known datasets in graph classification and separate classes of graphs that other graph-learning methods cannot. Our approach is inspired by persistent homology, dependency parsing for NLP, and multivalued functions. The complexity of the underlying algorithm is O(#edges x #nodes) and code is publicly available (https://github.com/bruel-gabrielsson/universal-function-approximation-on-graphs).
△ Less
Submitted 26 October, 2020; v1 submitted 14 March, 2020;
originally announced March 2020.
-
A Topology Layer for Machine Learning
Authors:
Rickard Brüel-Gabrielsson,
Bradley J. Nelson,
Anjan Dwaraknath,
Primoz Skraba,
Leonidas J. Guibas,
Gunnar Carlsson
Abstract:
Topology applied to real world data using persistent homology has started to find applications within machine learning, including deep learning. We present a differentiable topology layer that computes persistent homology based on level set filtrations and edge-based filtrations. We present three novel applications: the topological layer can (i) regularize data reconstruction or the weights of mac…
▽ More
Topology applied to real world data using persistent homology has started to find applications within machine learning, including deep learning. We present a differentiable topology layer that computes persistent homology based on level set filtrations and edge-based filtrations. We present three novel applications: the topological layer can (i) regularize data reconstruction or the weights of machine learning models, (ii) construct a loss on the output of a deep generative network to incorporate topological priors, and (iii) perform topological adversarial attacks on deep networks trained with persistence features. The code (www.github.com/bruel-gabrielsson/TopologyLayer) is publicly available and we hope its availability will facilitate the use of persistent homology in deep learning and other gradient based applications.
△ Less
Submitted 24 April, 2020; v1 submitted 28 May, 2019;
originally announced May 2019.
-
Topology-Aware Surface Reconstruction for Point Clouds
Authors:
Rickard Brüel-Gabrielsson,
Vignesh Ganapathi-Subramanian,
Primoz Skraba,
Leonidas J. Guibas
Abstract:
We present an approach to inform the reconstruction of a surface from a point scan through topological priors. The reconstruction is based on basis functions which are optimized to provide a good fit to the point scan while satisfying predefined topological constraints. We optimize the parameters of a model to obtain likelihood function over the reconstruction domain. The topological constraints a…
▽ More
We present an approach to inform the reconstruction of a surface from a point scan through topological priors. The reconstruction is based on basis functions which are optimized to provide a good fit to the point scan while satisfying predefined topological constraints. We optimize the parameters of a model to obtain likelihood function over the reconstruction domain. The topological constraints are captured by persistence diagrams which are incorporated in the optimization algorithm promote the correct topology. The result is a novel topology-aware technique which can: 1.) weed out topological noise from point scans, and 2.) capture certain nuanced properties of the underlying shape which could otherwise be lost while performing surface reconstruction. We showcase results reconstructing shapes with multiple potential topologies, compare to other classical surface construction techniques, and show the completion of real scan data.
△ Less
Submitted 15 September, 2021; v1 submitted 29 November, 2018;
originally announced November 2018.