Skip to main content

Showing 1–14 of 14 results for author: Behdin, K

.
  1. arXiv:2406.07831  [pdf, other

    cs.LG

    ALPS: Improved Optimization for Highly Sparse One-Shot Pruning for Large Language Models

    Authors: Xiang Meng, Kayhan Behdin, Haoyue Wang, Rahul Mazumder

    Abstract: The impressive performance of Large Language Models (LLMs) across various natural language processing tasks comes at the cost of vast computational resources and storage requirements. One-shot pruning techniques offer a way to alleviate these burdens by removing redundant weights without the need for retraining. Yet, the massive scale of LLMs often forces current pruning approaches to rely on heur… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  2. arXiv:2403.12983  [pdf, other

    cs.CV cs.LG

    OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

    Authors: Xiang Meng, Shibal Ibrahim, Kayhan Behdin, Hussein Hazimeh, Natalia Ponomareva, Rahul Mazumder

    Abstract: Structured pruning is a promising approach for reducing the inference costs of large vision and language models. By removing carefully chosen structures, e.g., neurons or attention heads, the improvements from this approach can be realized on standard deep learning hardware. In this work, we focus on structured pruning in the one-shot (post-training) setting, which does not require model retrainin… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  3. arXiv:2310.18542  [pdf, other

    cs.LG

    End-to-end Feature Selection Approach for Learning Skinny Trees

    Authors: Shibal Ibrahim, Kayhan Behdin, Rahul Mazumder

    Abstract: Joint feature selection and tree ensemble learning is a challenging task. Popular tree ensemble toolkits e.g., Gradient Boosted Trees and Random Forests support feature selection post-training based on feature importances, which are known to be misleading, and can significantly hurt performance. We propose Skinny Trees: a toolkit for feature selection in tree ensembles, such that feature selection… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: Preprint

  4. arXiv:2309.01885  [pdf, other

    stat.ML cs.CL cs.LG

    QuantEase: Optimization-based Quantization for Language Models

    Authors: Kayhan Behdin, Ayan Acharya, Aman Gupta, Qingquan Song, Siyu Zhu, Sathiya Keerthi, Rahul Mazumder

    Abstract: With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is f… ▽ More

    Submitted 1 December, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

  5. arXiv:2307.09366  [pdf, other

    cs.LG stat.ME stat.ML

    Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives

    Authors: Kayhan Behdin, Wenyu Chen, Rahul Mazumder

    Abstract: We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model, a key problem in statistical machine learning. Given $n$ samples from a multivariate Gaussian distribution with $p$ variables, the goal is to estimate the $p \times p$ inverse covariance matrix (aka precision matrix), assuming it is sparse (i.e., has a few nonzero entries). We propose GraphL0BnB,… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  6. arXiv:2302.11836  [pdf, other

    stat.ML cs.LG

    On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees

    Authors: Kayhan Behdin, Rahul Mazumder

    Abstract: Sharpness-Aware Minimization (SAM) is a recent optimization framework aiming to improve the deep neural network generalization, through obtaining flatter (i.e. less sharp) solutions. As SAM has been numerically successful, recent papers have studied the theoretical aspects of the framework and have shown SAM solutions are indeed flat. However, there has been limited theoretical exploration regardi… ▽ More

    Submitted 19 May, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

  7. arXiv:2302.09693  [pdf, other

    stat.ML cs.LG

    mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

    Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, Sathiya Keerthi, Ayan Acharya, Borja Ocejo, Gregory Dexter, Rajiv Khanna, David Durfee, Rahul Mazumder

    Abstract: Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as… ▽ More

    Submitted 30 September, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2212.04343

  8. arXiv:2212.08697  [pdf, other

    stat.ME stat.ML

    Multi-Task Learning for Sparsity Pattern Heterogeneity: Statistical and Computational Perspectives

    Authors: Kayhan Behdin, Gabriel Loewinger, Kenneth T. Kishida, Giovanni Parmigiani, Rahul Mazumder

    Abstract: We consider a problem in Multi-Task Learning (MTL) where multiple linear models are jointly trained on a collection of datasets ("tasks"). A key novelty of our framework is that it allows the sparsity pattern of regression coefficients and the values of non-zero coefficients to differ across tasks while still leveraging partially shared structure. Our methods encourage models to share information… ▽ More

    Submitted 8 June, 2024; v1 submitted 16 December, 2022; originally announced December 2022.

  9. arXiv:2212.04343  [pdf, other

    cs.LG math.OC

    Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization

    Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, David Durfee, Ayan Acharya, Sathiya Keerthi, Rahul Mazumder

    Abstract: Modern deep learning models are over-parameterized, where the optimization setup strongly affects the generalization performance. A key element of reliable optimization for these systems is the modification of the loss function. Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima, which arguably have better generalization abiliti… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  10. arXiv:2109.11142  [pdf, other

    stat.ME math.ST

    Sparse PCA: A New Scalable Estimator Based On Integer Programming

    Authors: Kayhan Behdin, Rahul Mazumder

    Abstract: We consider the Sparse Principal Component Analysis (SPCA) problem under the well-known spiked covariance model. Recent work has shown that the SPCA problem can be reformulated as a Mixed Integer Program (MIP) and can be solved to global optimality, leading to estimators that are known to enjoy optimal statistical properties. However, current MIP algorithms for SPCA are unable to scale beyond inst… ▽ More

    Submitted 26 September, 2021; v1 submitted 23 September, 2021; originally announced September 2021.

  11. arXiv:2104.03527  [pdf, other

    stat.ML cs.LG

    Sparse NMF with Archetypal Regularization: Computational and Robustness Properties

    Authors: Kayhan Behdin, Rahul Mazumder

    Abstract: We consider the problem of sparse nonnegative matrix factorization (NMF) using archetypal regularization. The goal is to represent a collection of data points as nonnegative linear combinations of a few nonnegative sparse factors with appealing geometric properties, arising from the use of archetypal regularization. We generalize the notion of robustness studied in Javadi and Montanari (2019) (wit… ▽ More

    Submitted 10 February, 2024; v1 submitted 8 April, 2021; originally announced April 2021.

  12. arXiv:1810.03222  [pdf, ps, other

    stat.ML cs.LG

    Recovering Quantized Data with Missing Information Using Bilinear Factorization and Augmented Lagrangian Method

    Authors: Ashkan Esmaeili, Kayhan Behdin, Sina Al-E-Mohammad, Farokh Marvasti

    Abstract: In this paper, we propose a novel approach in order to recover a quantized matrix with missing information. We propose a regularized convex cost function composed of a log-likelihood term and a Trace norm term. The Bi-factorization approach and the Augmented Lagrangian Method (ALM) are applied to find the global minimizer of the cost function in order to recover the genuine data. We provide mathem… ▽ More

    Submitted 7 October, 2018; originally announced October 2018.

  13. arXiv:1805.07561  [pdf, ps, other

    cs.LG stat.ML

    Transduction with Matrix Completion Using Smoothed Rank Function

    Authors: Ashkan Esmaeili, Kayhan Behdin, Mohammad Amin Fakharian, Farokh Marvasti

    Abstract: In this paper, we propose two new algorithms for transduction with Matrix Completion (MC) problem. The joint MC and prediction tasks are addressed simultaneously to enhance the accuracy, i.e., the label matrix is concatenated to the data matrix forming a stacked matrix. Assuming the data matrix is of low rank, we propose new recommendation methods by posing the problem as a constrained minimizatio… ▽ More

    Submitted 19 May, 2018; originally announced May 2018.

  14. arXiv:1704.02216  [pdf

    cs.SD cs.IR cs.LG cs.MM

    OBTAIN: Real-Time Beat Tracking in Audio Signals

    Authors: Ali Mottaghi, Kayhan Behdin, Ashkan Esmaeili, Mohammadreza Heydari, Farokh Marvasti

    Abstract: In this paper, we design a system in order to perform the real-time beat tracking for an audio signal. We use Onset Strength Signal (OSS) to detect the onsets and estimate the tempos. Then, we form Cumulative Beat Strength Signal (CBSS) by taking advantage of OSS and estimated tempos. Next, we perform peak detection by extracting the periodic sequence of beats among all CBSS peaks. In simulations,… ▽ More

    Submitted 27 October, 2017; v1 submitted 7 April, 2017; originally announced April 2017.