Search | arXiv e-print repository

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

Authors: Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian L. Croci, Bo Li, Martin Jaggi, Dan Alistarh, Torsten Hoefler, James Hensman

Abstract: We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without changing the output, making quantization easier. This computational invariance is applied to the hidden state (residual) of the LLM, as well as to th… ▽ More We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot rotates LLMs in a way that removes outliers from the hidden state without changing the output, making quantization easier. This computational invariance is applied to the hidden state (residual) of the LLM, as well as to the activations of the feed-forward components, aspects of the attention mechanism and to the KV cache. The result is a quantized model where all matrix multiplications are performed in 4-bits, without any channels identified for retention in higher precision. Our quantized LLaMa2-70B model has losses of at most 0.29 WikiText-2 perplexity and retains 99% of the zero-shot performance. Code is available at: https://github.com/spcl/QuaRot. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 19 pages, 6 figures

arXiv:2402.19364 [pdf, other]

doi 10.1145/3627535.3638496

Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication

Authors: Lukas Gianinazzi, Alexandros Nikolaos Ziogas, Langwen Huang, Piotr Luczynski, Saleh Ashkboos, Florian Scheidl, Armon Carigiet, Chio Ge, Nabil Abubaker, Maciej Besta, Tal Ben-Nun, Torsten Hoefler

Abstract: We propose a novel approach to iterated sparse matrix dense matrix multiplication, a fundamental computational kernel in scientific computing and graph neural network training. In cases where matrix sizes exceed the memory of a single compute node, data transfer becomes a bottleneck. An approach based on dense matrix multiplication algorithms leads to suboptimal scalability and fails to exploit th… ▽ More We propose a novel approach to iterated sparse matrix dense matrix multiplication, a fundamental computational kernel in scientific computing and graph neural network training. In cases where matrix sizes exceed the memory of a single compute node, data transfer becomes a bottleneck. An approach based on dense matrix multiplication algorithms leads to suboptimal scalability and fails to exploit the sparsity in the problem. To address these challenges, we propose decomposing the sparse matrix into a small number of highly structured matrices called arrow matrices, which are connected by permutations. Our approach enables communication-avoiding multiplications, achieving a polynomial reduction in communication volume per iteration for matrices corresponding to planar graphs and other minor-excluded families of graphs. Our evaluation demonstrates that our approach outperforms a state-of-the-art method for sparse matrix multiplication on matrices with hundreds of millions of rows, offering near-linear strong and weak scaling. △ Less

Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

ACM Class: F.2.1

Journal ref: PPoPP'24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (2024) 404-416

arXiv:2401.15024 [pdf, other]

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Authors: Saleh Ashkboos, Maximilian L. Croci, Marcelo Gennari do Nascimento, Torsten Hoefler, James Hensman

Abstract: Large language models have become the cornerstone of natural language processing, but their use comes with substantial costs in terms of compute and memory resources. Sparsification provides a solution to alleviate these resource constraints, and recent works have shown that trained models can be sparsified post-hoc. Existing sparsification techniques face challenges as they need additional data s… ▽ More Large language models have become the cornerstone of natural language processing, but their use comes with substantial costs in terms of compute and memory resources. Sparsification provides a solution to alleviate these resource constraints, and recent works have shown that trained models can be sparsified post-hoc. Existing sparsification techniques face challenges as they need additional data structures and offer constrained speedup with current hardware. In this paper we present SliceGPT, a new post-training sparsification scheme which replaces each weight matrix with a smaller (dense) matrix, reducing the embedding dimension of the network. Through extensive experimentation, we show that SliceGPT can remove up to 25% of the model parameters (including embeddings) for LLAMA2-70B, OPT 66B and Phi-2 models while maintaining 99%, 99% and 90% zero-shot task performance of the dense model respectively. Our sliced models run on fewer GPUs and run faster without any additional code optimization: on 24GB consumer GPUs we reduce the total compute for inference on LLAMA2-70B to 64% of that of the dense model; on 40GB A100 GPUs we reduce it to 66%. We offer a new insight, computational invariance in transformer networks, which enables SliceGPT and we hope it will inspire and enable future avenues to reduce memory and computation demands for pre-trained models. Code is available at: https://github.com/microsoft/TransformerCompression △ Less

Submitted 9 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

Comments: 22 pages, 8 figures, accepted at ICLR24

arXiv:2310.09259 [pdf, other]

QUIK: Towards End-to-End 4-Bit Inference on Generative Large Language Models

Authors: Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh

Abstract: Large Language Models (LLMs) from the GPT family have become extremely popular, leading to a race towards reducing their inference costs to allow for efficient local computation. Yet, the vast majority of existing work focuses on weight-only quantization, which can reduce runtime costs in the memory-bound one-token-at-a-time generative setting, but does not address them in compute-bound scenarios,… ▽ More Large Language Models (LLMs) from the GPT family have become extremely popular, leading to a race towards reducing their inference costs to allow for efficient local computation. Yet, the vast majority of existing work focuses on weight-only quantization, which can reduce runtime costs in the memory-bound one-token-at-a-time generative setting, but does not address them in compute-bound scenarios, such as batched inference or prompt processing. In this paper, we address the general quantization problem, where both weights and activations should be quantized. We show, for the first time, that the majority of inference computations for large generative models such as LLaMA, OPT, and Falcon can be performed with both weights and activations being cast to 4 bits, in a way that leads to practical speedups, while at the same time maintaining good accuracy. We achieve this via a hybrid quantization strategy called QUIK, which compresses most of the weights and activations to 4-bit, while kee** some outlier weights and activations in higher-precision. The key feature of our scheme is that it is designed with computational efficiency in mind: we provide GPU kernels matching the QUIK format with highly-efficient layer-wise runtimes, which lead to practical end-to-end throughput improvements of up to 3.4x relative to FP16 execution. We provide detailed studies for models from the OPT, LLaMA-2 and Falcon families, as well as a first instance of accurate inference using quantization plus 2:4 sparsity. Code is available at: https://github.com/IST-DASLab/QUIK. △ Less

Submitted 2 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 16 pages

arXiv:2306.03078 [pdf, other]

SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression

Authors: Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh

Abstract: Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities. By compressing such LLMs via quantization to 3-4 bits per parameter, they can fit into memory-limited devices such as laptops and mobile phones, enabling personalized use. However, quantization down to 3-4 bits per parameter usually leads to moderate-to-high accuracy losses, especiall… ▽ More Recent advances in large language model (LLM) pretraining have led to high-quality LLMs with impressive abilities. By compressing such LLMs via quantization to 3-4 bits per parameter, they can fit into memory-limited devices such as laptops and mobile phones, enabling personalized use. However, quantization down to 3-4 bits per parameter usually leads to moderate-to-high accuracy losses, especially for smaller models in the 1-10B parameter range, which are well-suited for edge deployments. To address this accuracy issue, we introduce the Sparse-Quantized Representation (SpQR), a new compressed format and quantization technique which enables for the first time near-lossless compression of LLMs across model scales, while reaching similar compression levels to previous methods. SpQR works by identifying and isolating outlier weights, which cause particularly-large quantization errors, and storing them in higher precision, while compressing all other weights to 3-4 bits, and achieves relative accuracy losses of less than 1% in perplexity for highly-accurate LLaMA and Falcon LLMs. This makes it possible to run 33B parameter LLM on a single 24 GB consumer GPU without any performance degradation at 15% speedup thus making powerful LLMs available to consumer without any downsides. SpQR comes with efficient algorithms for both encoding weights into its format, as well as decoding them efficiently at runtime. Specifically, we provide an efficient GPU inference algorithm for SpQR which yields faster inference than 16-bit baselines at similar accuracy, while enabling memory compression gains of more than 4x. △ Less

Submitted 5 June, 2023; originally announced June 2023.

Comments: Extended preprint

arXiv:2304.07613 [pdf, other]

STen: Productive and Efficient Sparsity in PyTorch

Authors: Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Saleh Ashkboos, Torsten Hoefler

Abstract: As deep learning models grow, sparsity is becoming an increasingly critical component of deep neural networks, enabling improved performance and reduced storage. However, existing frameworks offer poor support for sparsity. Specialized sparsity engines focus exclusively on sparse inference, while general frameworks primarily focus on sparse tensors in classical formats and neglect the broader spar… ▽ More As deep learning models grow, sparsity is becoming an increasingly critical component of deep neural networks, enabling improved performance and reduced storage. However, existing frameworks offer poor support for sparsity. Specialized sparsity engines focus exclusively on sparse inference, while general frameworks primarily focus on sparse tensors in classical formats and neglect the broader sparsification pipeline necessary for using sparse models, especially during training. Further, existing frameworks are not easily extensible: adding a new sparse tensor format or operator is challenging and time-consuming. To address this, we propose STen, a sparsity programming model and interface for PyTorch, which incorporates sparsity layouts, operators, and sparsifiers, in an efficient, customizable, and extensible framework that supports virtually all sparsification methods. We demonstrate this by develo** a high-performance grouped n:m sparsity layout for CPU inference at moderate sparsity. STen brings high performance and ease of use to the ML community, making sparsity easily accessible. △ Less

Submitted 15 April, 2023; originally announced April 2023.

arXiv:2210.17323 [pdf, other]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Authors: Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh

Abstract: Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs, which limits the usability of such models.… ▽ More Generative Pre-trained Transformer models, known as GPT or OPT, set themselves apart through breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs. Specifically, due to their massive size, even inference for large, highly-accurate GPT models may require multiple performant GPUs, which limits the usability of such models. While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models. In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient. Specifically, GPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline. Our method more than doubles the compression gains relative to previously-proposed one-shot quantization methods, preserving accuracy, allowing us for the first time to execute an 175 billion-parameter model inside a single GPU for generative inference. Moreover, we also show that our method can still provide reasonable accuracy in the extreme quantization regime, in which weights are quantized to 2-bit or even ternary quantization levels. We show experimentally that these improvements can be leveraged for end-to-end inference speedups over FP16, of around 3.25x when using high-end GPUs (NVIDIA A100) and 4.5x when using more cost-effective ones (NVIDIA A6000). The implementation is available at https://github.com/IST-DASLab/gptq. △ Less

Submitted 22 March, 2023; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: ICLR 2023

arXiv:2208.11469 [pdf, other]

ProbGraph: High-Performance and High-Accuracy Graph Mining with Probabilistic Set Representations

Authors: Maciej Besta, Cesare Miglioli, Paolo Sylos Labini, Jakub Tětek, Patrick Iff, Raghavendra Kanakagiri, Saleh Ashkboos, Kacper Janda, Michal Podstawski, Grzegorz Kwasniewski, Niels Gleinig, Flavio Vella, Onur Mutlu, Torsten Hoefler

Abstract: Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as… ▽ More Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as Bloom filters. These representations are much faster to process than the original vertex sets thanks to vectorizability and small size. We use these representations as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. When enhanced with ProbGraph, these algorithms significantly outperform tuned parallel exact baselines (up to nearly 50x on 32 cores) while ensuring accuracy of more than 90% for many input graph datasets. Our novel bounds and algorithms based on probabilistic set representations with desirable statistical properties are of separate interest for the data analytics community. △ Less

Submitted 21 November, 2022; v1 submitted 24 August, 2022; originally announced August 2022.

Comments: Best Paper Award at ACM/IEEE Supercomputing'22 (SC22)

Journal ref: Proceedings of the ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis, November 2022

arXiv:2206.14786 [pdf, other]

ENS-10: A Dataset For Post-Processing Ensemble Weather Forecasts

Authors: Saleh Ashkboos, Langwen Huang, Nikoli Dryden, Tal Ben-Nun, Peter Dueben, Lukas Gianinazzi, Luca Kummer, Torsten Hoefler

Abstract: Post-processing ensemble prediction systems can improve the reliability of weather forecasting, especially for extreme event prediction. In recent years, different machine learning models have been developed to improve the quality of weather post-processing. However, these models require a comprehensive dataset of weather simulations to produce high-accuracy results, which comes at a high computat… ▽ More Post-processing ensemble prediction systems can improve the reliability of weather forecasting, especially for extreme event prediction. In recent years, different machine learning models have been developed to improve the quality of weather post-processing. However, these models require a comprehensive dataset of weather simulations to produce high-accuracy results, which comes at a high computational cost to generate. This paper introduces the ENS-10 dataset, consisting of ten ensemble members spanning 20 years (1998-2017). The ensemble members are generated by perturbing numerical weather simulations to capture the chaotic behavior of the Earth. To represent the three-dimensional state of the atmosphere, ENS-10 provides the most relevant atmospheric variables at 11 distinct pressure levels and the surface at 0.5-degree resolution for forecast lead times T=0, 24, and 48 hours (two data points per week). We propose the ENS-10 prediction correction task for improving the forecast quality at a 48-hour lead time through ensemble post-processing. We provide a set of baselines and compare their skill at correcting the predictions of three important atmospheric variables. Moreover, we measure the baselines' skill at improving predictions of extreme weather events using our dataset. The ENS-10 dataset is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. △ Less

Submitted 7 November, 2022; v1 submitted 29 June, 2022; originally announced June 2022.

Comments: Accepted version of the paper

arXiv:2205.04934 [pdf, other]

The spatial computer: A model for energy-efficient parallel computation

Authors: Lukas Gianinazzi, Tal Ben-Nun, Maciej Besta, Saleh Ashkboos, Yves Baumann, Piotr Luczynski, Torsten Hoefler

Abstract: We present a new parallel model of computation suitable for spatial architectures, for which the energy used for communication heavily depends on the distance of the communicating processors. In our model, processors have locations on a conceptual two-dimensional grid, and their distance therein determines their communication cost. In particular, we introduce the energy cost of a spatial computati… ▽ More We present a new parallel model of computation suitable for spatial architectures, for which the energy used for communication heavily depends on the distance of the communicating processors. In our model, processors have locations on a conceptual two-dimensional grid, and their distance therein determines their communication cost. In particular, we introduce the energy cost of a spatial computation, which measures the total distance traveled by all messages, and study the depth of communication, which measures the largest number of hops of a chain of messages. We show matching energy lower- and upper bounds for many foundational problems, including sorting, median selection, and matrix multiplication. Our model does not depend on any parameters other than the input shape and size, simplifying algorithm analysis. We also show how to simulate PRAM algorithms in our model and how to obtain results for a more complex model that introduces the size of the local memories of the processors as a parameter. △ Less

Submitted 17 January, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

ACM Class: F.2.0

arXiv:2106.15565 [pdf, other]

doi 10.1145/3458817.3476178

Flare: Flexible In-Network Allreduce

Authors: Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li, Torsten Hoefler

Abstract: The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this operation can be accelerated by offloading it to network switches, that aggregate the data received from the hosts, and send them back the aggregated result. However, existing solutions provide limited customization opportunities… ▽ More The allreduce operation is one of the most commonly used communication routines in distributed applications. To improve its bandwidth and to reduce network traffic, this operation can be accelerated by offloading it to network switches, that aggregate the data received from the hosts, and send them back the aggregated result. However, existing solutions provide limited customization opportunities and might provide suboptimal performance when dealing with custom operators and data types, with sparse data, or when reproducibility of the aggregation is a concern. To deal with these problems, in this work we design a flexible programmable switch by using as a building block PsPIN, a RISC-V architecture implementing the sPIN programming model. We then design, model, and analyze different algorithms for executing the aggregation on this architecture, showing performance improvements compared to state-of-the-art approaches. △ Less

Submitted 29 June, 2021; originally announced June 2021.

ACM Class: C.2.4; C.2.1; B.4.3

Journal ref: Published in Proceedings of The International Conference for High Performance Computing Networking, Storage, and Analysis (SC '21) (2021)

arXiv:2106.00761 [pdf, other]

Motif Prediction with Graph Neural Networks

Authors: Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwasniewski, Gabriel G**i, Raghavendra Kanakagiri, Saleh Ashkboos, Lukas Gianinazzi, Nikoli Dryden, Torsten Hoefler

Abstract: Link prediction is one of the central problems in graph mining. However, recent studies highlight the importance of higher-order network analysis, where complex structures called motifs are the first-class citizens. We first show that existing link prediction schemes fail to effectively predict motifs. To alleviate this, we establish a general motif prediction problem and we propose several heuris… ▽ More Link prediction is one of the central problems in graph mining. However, recent studies highlight the importance of higher-order network analysis, where complex structures called motifs are the first-class citizens. We first show that existing link prediction schemes fail to effectively predict motifs. To alleviate this, we establish a general motif prediction problem and we propose several heuristics that assess the chances for a specified motif to appear. To make the scores realistic, our heuristics consider - among others - correlations between links, i.e., the potential impact of some arriving links on the appearance of other links in a given motif. Finally, for highest accuracy, we develop a graph neural network (GNN) architecture for motif prediction. Our architecture offers vertex features and sampling schemes that capture the rich structural properties of motifs. While our heuristics are fast and do not need any training, GNNs ensure highest accuracy of predicting motifs, both for dense (e.g., k-cliques) and for sparse ones (e.g., k-stars). We consistently outperform the best available competitor by more than 10% on average and up to 32% in area under the curve. Importantly, the advantages of our approach over schemes based on uncorrelated link prediction increase with the increasing motif size and complexity. We also successfully apply our architecture for predicting more arbitrary clusters and communities, illustrating its potential for graph mining beyond motif analysis. △ Less

Submitted 21 May, 2022; v1 submitted 26 May, 2021; originally announced June 2021.

Journal ref: Proceedings of the 28th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'22), 2022

arXiv:2002.09268 [pdf, other]

New Bounds For Distributed Mean Estimation and Variance Reduction

Authors: Peter Davies, Vijaykrishna Gurunathan, Niusha Moshrefi, Saleh Ashkboos, Dan Alistarh

Abstract: We consider the problem of distributed mean estimation (DME), in which $n$ machines are each given a local $d$-dimensional vector $x_v \in \mathbb{R}^d$, and must cooperate to estimate the mean of their inputs $μ= \frac 1n\sum_{v = 1}^n x_v$, while minimizing total communication cost. DME is a fundamental construct in distributed machine learning, and there has been considerable work on variants… ▽ More We consider the problem of distributed mean estimation (DME), in which $n$ machines are each given a local $d$-dimensional vector $x_v \in \mathbb{R}^d$, and must cooperate to estimate the mean of their inputs $μ= \frac 1n\sum_{v = 1}^n x_v$, while minimizing total communication cost. DME is a fundamental construct in distributed machine learning, and there has been considerable work on variants of this problem, especially in the context of distributed variance reduction for stochastic gradients in parallel SGD. Previous work typically assumes an upper bound on the norm of the input vectors, and achieves an error bound in terms of this norm. However, in many real applications, the input vectors are concentrated around the correct output $μ$, but $μ$ itself has large norm. In such cases, previous output error bounds perform poorly. In this paper, we show that output error bounds need not depend on input norm. We provide a method of quantization which allows distributed mean estimation to be performed with solution quality dependent only on the distance between inputs, not on input norm, and show an analogous result for distributed variance reduction. The technique is based on a new connection with lattice theory. We also provide lower bounds showing that the communication to error trade-off of our algorithms is asymptotically optimal. As the lattices achieving optimal bounds under $\ell_2$-norm can be computationally impractical, we also present an extension which leverages easy-to-use cubic lattices, and is loose only up to a logarithmic factor in $d$. We show experimentally that our method yields practical improvements for common applications, relative to prior approaches. △ Less

Submitted 7 April, 2021; v1 submitted 21 February, 2020; originally announced February 2020.

Comments: 42 pages, 16 figures

arXiv:1802.08021 [pdf, other]

SparCML: High-Performance Sparse Communication for Machine Learning

Authors: Cedric Renggli, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan Alistarh, Torsten Hoefler

Abstract: Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learnin… ▽ More Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, SparCML, extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, SparCML and its techniques will form the basis of future highly-scalable machine learning frameworks. △ Less

Submitted 16 August, 2019; v1 submitted 22 February, 2018; originally announced February 2018.

arXiv:1702.05570 [pdf, ps, other]

Multi-way sparsest cut problem on trees with a control on the number of parts and outliers

Authors: Ramin Javadi, Saleh Ashkboos

Abstract: Given a graph, the sparsest cut problem asks for a subset of vertices whose edge expansion (the normalized cut given by the subset) is minimized. In this paper, we study a generalization of this problem seeking for $ k $ disjoint subsets of vertices (clusters) whose all edge expansions are small and furthermore, the number of vertices remained in the exterior of the subsets (outliers) is also smal… ▽ More Given a graph, the sparsest cut problem asks for a subset of vertices whose edge expansion (the normalized cut given by the subset) is minimized. In this paper, we study a generalization of this problem seeking for $ k $ disjoint subsets of vertices (clusters) whose all edge expansions are small and furthermore, the number of vertices remained in the exterior of the subsets (outliers) is also small. We prove that although this problem is $ NP-$hard for trees, it can be solved in polynomial time for all weighted trees, provided that we restrict the search space to subsets which induce connected subgraphs. The proposed algorithm is based on dynamic programming and runs in the worst case in $ O(k^2 n^3) $, when $ n $ is the number of vertices and $ k $ is the number of clusters. It also runs in linear time when the number of clusters and the number of outliers is bounded by a constant. △ Less

Submitted 17 February, 2017; originally announced February 2017.

Comments: 14 pages

MSC Class: 05C85; 68Q25; 68R10

arXiv:1702.04739 [pdf, other]

An Efficient Parallel Data Clustering Algorithm Using Isoperimetric Number of Trees

Authors: Ramin Javadi, Saleh Ashkboos

Abstract: We propose a parallel graph-based data clustering algorithm using CUDA GPU, based on exact clustering of the minimum spanning tree in terms of a minimum isoperimetric criteria. We also provide a comparative performance analysis of our algorithm with other related ones which demonstrates the general superiority of this parallel algorithm over other competing algorithms in terms of accuracy and spee… ▽ More We propose a parallel graph-based data clustering algorithm using CUDA GPU, based on exact clustering of the minimum spanning tree in terms of a minimum isoperimetric criteria. We also provide a comparative performance analysis of our algorithm with other related ones which demonstrates the general superiority of this parallel algorithm over other competing algorithms in terms of accuracy and speed. △ Less

Submitted 15 February, 2017; originally announced February 2017.

Comments: 16 pages, 6 figures

arXiv:1702.01253 [pdf, other]

Minimum edge cuts of distance-regular and strongly regular digraphs

Authors: S. Ashkboos, G. R. Omidi, F. Shafiei, K. Tajbakhsh

Abstract: In this paper, we show that the edge connectivity of a distance-regular digraph $Γ$ with valency $k$ is $k$ and for $k>2$, any minimum edge cut of $Γ$ is the set of all edges going into (or coming out of) a single vertex. Moreover we show that the same result holds for strongly regular digraphs. These results extend the same known results for undirected case with quite different proofs. In this paper, we show that the edge connectivity of a distance-regular digraph $Γ$ with valency $k$ is $k$ and for $k>2$, any minimum edge cut of $Γ$ is the set of all edges going into (or coming out of) a single vertex. Moreover we show that the same result holds for strongly regular digraphs. These results extend the same known results for undirected case with quite different proofs. △ Less

Submitted 4 February, 2017; originally announced February 2017.

Comments: 9 pages, 1 figure

Showing 1–17 of 17 results for author: Ashkboos, S