Skip to main content

Showing 1–17 of 17 results for author: Hazimeh, H

.
  1. arXiv:2403.12983  [pdf, other

    cs.CV cs.LG

    OSSCAR: One-Shot Structured Pruning in Vision and Language Models with Combinatorial Optimization

    Authors: Xiang Meng, Shibal Ibrahim, Kayhan Behdin, Hussein Hazimeh, Natalia Ponomareva, Rahul Mazumder

    Abstract: Structured pruning is a promising approach for reducing the inference costs of large vision and language models. By removing carefully chosen structures, e.g., neurons or attention heads, the improvements from this approach can be realized on standard deep learning hardware. In this work, we focus on structured pruning in the one-shot (post-training) setting, which does not require model retrainin… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  2. arXiv:2402.11120  [pdf, other

    cs.LG cs.CV stat.ML

    DART: A Principled Approach to Adversarially Robust Unsupervised Domain Adaptation

    Authors: Yunjuan Wang, Hussein Hazimeh, Natalia Ponomareva, Alexey Kurakin, Ibrahim Hammoud, Raman Arora

    Abstract: Distribution shifts and adversarial examples are two major challenges for deploying machine learning models. While these challenges have been studied individually, their combination is an important topic that remains relatively under-explored. In this work, we study the problem of adversarial robustness under a common setting of distribution shift - unsupervised domain adaptation (UDA). Specifical… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  3. arXiv:2402.04177  [pdf, other

    cs.CL cs.LG stat.ML

    Scaling Laws for Downstream Task Performance of Large Language Models

    Authors: Berivan Isik, Natalia Ponomareva, Hussein Hazimeh, Dimitris Paparas, Sergei Vassilvitskii, Sanmi Koyejo

    Abstract: Scaling laws provide important insights that can guide the design of large language models (LLMs). Existing work has primarily focused on studying scaling laws for pretraining (upstream) loss. However, in transfer learning settings, in which LLMs are pretrained on an unsupervised dataset and then finetuned on a downstream task, we often also care about the downstream performance. In this work, we… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  4. COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search

    Authors: Shibal Ibrahim, Wenyu Chen, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder

    Abstract: The sparse Mixture-of-Experts (Sparse-MoE) framework efficiently scales up model capacity in various domains, such as natural language processing and vision. Sparse-MoEs select a subset of the "experts" (thus, only a portion of the overall network) for each input sample using a sparse, trainable gate. Existing sparse gates are prone to convergence and performance issues when training with first-or… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted in KDD 2023

  5. arXiv:2303.00654  [pdf, other

    cs.LG cs.CR stat.ML

    How to DP-fy ML: A Practical Guide to Machine Learning with Differential Privacy

    Authors: Natalia Ponomareva, Hussein Hazimeh, Alex Kurakin, Zheng Xu, Carson Denison, H. Brendan McMahan, Sergei Vassilvitskii, Steve Chien, Abhradeep Thakurta

    Abstract: ML models are ubiquitous in real world applications and are a constant focus of research. At the same time, the community has started to realize the importance of protecting the privacy of ML training data. Differential Privacy (DP) has become a gold standard for making formal statements about data anonymization. However, while some adoption of DP has happened in industry, attempts to apply DP t… ▽ More

    Submitted 31 July, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Journal ref: Journal of Artificial Intelligence Research 77 (2023) 1113-1201

  6. arXiv:2302.14623  [pdf, other

    cs.LG cs.CV math.OC

    Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

    Authors: Riade Benbaki, Wenyu Chen, Xiang Meng, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder

    Abstract: The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In this work, we propose a novel optimization-based pru… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

  7. arXiv:2302.00089  [pdf, other

    cs.LG cs.AI

    Mind the (optimality) Gap: A Gap-Aware Learning Rate Scheduler for Adversarial Nets

    Authors: Hussein Hazimeh, Natalia Ponomareva

    Abstract: Adversarial nets have proved to be powerful in various domains including generative modeling (GANs), transfer learning, and fairness. However, successfully training adversarial nets using first-order methods remains a major challenge. Typically, careful choices of the learning rates are needed to maintain the delicate balance between the competing networks. In this paper, we design a novel learnin… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

    Comments: Accepted to AISTATS 2023

  8. arXiv:2301.12993  [pdf, other

    cs.CV cs.LG

    Benchmarking Robustness to Adversarial Image Obfuscations

    Authors: Florian Stimberg, Ayan Chakrabarti, Chun-Ta Lu, Hussein Hazimeh, Otilia Stretcu, Wei Qiao, Yintao Liu, Merve Kaya, Cyrus Rashtchian, Ariel Fuxman, Mehmet Tek, Sven Gowal

    Abstract: Automated content filtering and moderation is an important tool that allows online platforms to build striving user communities that facilitate cooperation and prevent abuse. Unfortunately, resourceful actors try to bypass automated filters in a bid to post content that violate platform policies and codes of conduct. To reach this goal, these malicious actors may obfuscate policy violating images… ▽ More

    Submitted 29 November, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

    ACM Class: I.2.10; I.4.0

  9. arXiv:2205.09717  [pdf, other

    cs.LG stat.ML

    Flexible Modeling and Multitask Learning using Differentiable Tree Ensembles

    Authors: Shibal Ibrahim, Hussein Hazimeh, Rahul Mazumder

    Abstract: Decision tree ensembles are widely used and competitive learning models. Despite their success, popular toolkits for learning tree ensembles have limited modeling capabilities. For instance, these toolkits support a limited number of loss functions and are restricted to single task learning. We propose a flexible framework for learning tree ensembles, which goes beyond existing toolkits to support… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: Accepted at SIGKDD'2022

  10. arXiv:2202.04820  [pdf, ps, other

    cs.LG cs.MS stat.CO stat.ML

    L0Learn: A Scalable Package for Sparse Learning using L0 Regularization

    Authors: Hussein Hazimeh, Rahul Mazumder, Tim Nonet

    Abstract: We present L0Learn: an open-source package for sparse linear regression and classification using $\ell_0$ regularization. L0Learn implements scalable, approximate algorithms, based on coordinate descent and local combinatorial optimization. The package is built using C++ and has user-friendly R and Python interfaces. L0Learn can address problems with millions of features, achieving competitive run… ▽ More

    Submitted 9 June, 2023; v1 submitted 9 February, 2022; originally announced February 2022.

    Comments: Accepted to JMLR (MLOSS)

  11. arXiv:2106.03760  [pdf, other

    cs.LG math.OC stat.ML

    DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

    Authors: Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, Lichan Hong, Ed H. Chi

    Abstract: The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness ca… ▽ More

    Submitted 31 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Appeared in NeurIPS 2021

  12. arXiv:2104.07084  [pdf, other

    stat.ME cs.LG math.OC stat.CO stat.ML

    Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives

    Authors: Hussein Hazimeh, Rahul Mazumder, Peter Radchenko

    Abstract: We present a new algorithmic framework for grouped variable selection that is based on discrete mathematical optimization. While there exist several appealing approaches based on convex relaxations and nonconvex heuristics, we focus on optimal solutions for the $\ell_0$-regularized formulation, a problem that is relatively unexplored due to computational challenges. Our methodology covers both hig… ▽ More

    Submitted 17 October, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

  13. arXiv:2004.06152  [pdf, other

    stat.CO cs.LG math.OC stat.ML

    Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

    Authors: Hussein Hazimeh, Rahul Mazumder, Ali Saab

    Abstract: We consider the least squares regression problem, penalized with a combination of the $\ell_{0}$ and squared $\ell_{2}$ penalty functions (a.k.a. $\ell_0 \ell_2$ regularization). Recent work shows that the resulting estimators are of key importance in many high-dimensional statistical settings. However, exact computation of these estimators remains a major challenge. Indeed, modern exact methods,… ▽ More

    Submitted 14 April, 2021; v1 submitted 13 April, 2020; originally announced April 2020.

  14. arXiv:2002.07772  [pdf, other

    cs.LG cs.CV stat.ML

    The Tree Ensemble Layer: Differentiability meets Conditional Computation

    Authors: Hussein Hazimeh, Natalia Ponomareva, Petros Mol, Zhenyu Tan, Rahul Mazumder

    Abstract: Neural networks and tree ensembles are state-of-the-art learners, each with its unique statistical and computational advantages. We aim to combine these advantages by introducing a new layer for neural networks, composed of an ensemble of differentiable decision trees (a.k.a. soft trees). While differentiable trees demonstrate promising results in the literature, they are typically slow in trainin… ▽ More

    Submitted 10 July, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  15. arXiv:2001.06471  [pdf, other

    stat.ML cs.LG math.OC stat.CO

    Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

    Authors: Antoine Dedieu, Hussein Hazimeh, Rahul Mazumder

    Abstract: We consider a discrete optimization formulation for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized regression problems at scales much larger than what was conventionally considered possible. Despite their usefulness, M… ▽ More

    Submitted 6 June, 2021; v1 submitted 17 January, 2020; originally announced January 2020.

    Comments: To appear in JMLR

  16. arXiv:1902.01542  [pdf, other

    stat.ML cs.LG math.OC stat.CO

    Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

    Authors: Hussein Hazimeh, Rahul Mazumder

    Abstract: In many learning settings, it is beneficial to augment the main features with pairwise interactions. Such interaction models can be often enhanced by performing variable selection under the so-called strong hierarchy constraint: an interaction is non-zero only if its associated main features are non-zero. Existing convex optimization based algorithms face difficulties in handling problems where th… ▽ More

    Submitted 13 July, 2020; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: AISTATS 2020

  17. arXiv:1803.01454  [pdf, other

    stat.CO math.OC stat.ML

    Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

    Authors: Hussein Hazimeh, Rahul Mazumder

    Abstract: The $L_0$-regularized least squares problem (a.k.a. best subsets) is central to sparse statistical learning and has attracted significant attention across the wider statistics, machine learning, and optimization communities. Recent work has shown that modern mixed integer optimization (MIO) solvers can be used to address small to moderate instances of this problem. In spite of the usefulness of… ▽ More

    Submitted 24 January, 2020; v1 submitted 4 March, 2018; originally announced March 2018.

    Comments: To appear in Operations Research