Skip to main content

Showing 1–35 of 35 results for author: Mazumder, R

Searching in archive math. Search in all archives.
.
  1. arXiv:2310.02535  [pdf, other

    math.OC

    Linear programming using diagonal linear networks

    Authors: Haoyue Wang, Promit Ghosal, Rahul Mazumder

    Abstract: Linear programming has played a crucial role in sha** decision-making, resource allocation, and cost reduction in various domains. In this paper, we investigate the application of overparametrized neural networks and their implicit bias in solving linear programming problems. Specifically, our findings reveal that training diagonal linear networks with gradient descent, while optimizing the squa… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  2. arXiv:2303.07642  [pdf, other

    math.OC

    A Cyclic Coordinate Descent Method for Convex Optimization on Polytopes

    Authors: Rahul Mazumder, Haoyue Wang

    Abstract: Coordinate descent algorithms are popular for huge-scale optimization problems due to their low cost per-iteration. Coordinate descent methods apply to problems where the constraint set is separable across coordinates. In this paper, we propose a new variant of the cyclic coordinate descent method that can handle polyhedral constraints provided that the polyhedral set does not have too many extrem… ▽ More

    Submitted 26 April, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  3. arXiv:2302.14623  [pdf, other

    cs.LG cs.CV math.OC

    Fast as CHITA: Neural Network Pruning with Combinatorial Optimization

    Authors: Riade Benbaki, Wenyu Chen, Xiang Meng, Hussein Hazimeh, Natalia Ponomareva, Zhe Zhao, Rahul Mazumder

    Abstract: The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In this work, we propose a novel optimization-based pru… ▽ More

    Submitted 28 February, 2023; originally announced February 2023.

  4. arXiv:2212.04343  [pdf, other

    cs.LG math.OC

    Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization

    Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, David Durfee, Ayan Acharya, Sathiya Keerthi, Rahul Mazumder

    Abstract: Modern deep learning models are over-parameterized, where the optimization setup strongly affects the generalization performance. A key element of reliable optimization for these systems is the modification of the loss function. Sharpness-Aware Minimization (SAM) modifies the underlying loss function to guide descent methods towards flatter minima, which arguably have better generalization abiliti… ▽ More

    Submitted 6 December, 2022; originally announced December 2022.

  5. arXiv:2211.12409  [pdf, other

    math.OC

    A Light-speed Linear Program Solver for Personalized Recommendation with Diversity Constraints

    Authors: Haoyue Wang, Miao Cheng, Kinjal Basu, Aman Gupta, Keerthi Selvaraj, Rahul Mazumder

    Abstract: We study a structured linear program (LP) that emerges in the need of ranking candidates or items in personalized recommender systems. Since the candidate set is only known in real time, the LP also needs to be formed and solved in real time. Latency and user experience are major considerations, requiring the LP to be solved within just a few milliseconds. Although typical instances of the problem… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  6. arXiv:2109.11142  [pdf, other

    stat.ME math.ST

    Sparse PCA: A New Scalable Estimator Based On Integer Programming

    Authors: Kayhan Behdin, Rahul Mazumder

    Abstract: We consider the Sparse Principal Component Analysis (SPCA) problem under the well-known spiked covariance model. Recent work has shown that the SPCA problem can be reformulated as a Mixed Integer Program (MIP) and can be solved to global optimality, leading to estimators that are known to enjoy optimal statistical properties. However, current MIP algorithms for SPCA are unable to scale beyond inst… ▽ More

    Submitted 26 September, 2021; v1 submitted 23 September, 2021; originally announced September 2021.

  7. arXiv:2107.08535  [pdf, other

    stat.CO math.OC

    Nonparametric Finite Mixture Models with Possible Shape Constraints: A Cubic Newton Approach

    Authors: Haoyue Wang, Shibal Ibrahim, Rahul Mazumder

    Abstract: We explore computational aspects of maximum likelihood estimation of the mixture proportions of a nonparametric finite mixture model -- a convex optimization problem with old roots in statistics and a key member of the modern data analysis toolkit. Motivated by problems in shape constrained inference, we consider structured variants of this problem with additional convex polyhedral constraints. We… ▽ More

    Submitted 8 December, 2023; v1 submitted 18 July, 2021; originally announced July 2021.

    Comments: 31 pages, 6 figures

    MSC Class: 90C06; 90C25; 90C90; 62G07

  8. arXiv:2106.03760  [pdf, other

    cs.LG math.OC stat.ML

    DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning

    Authors: Hussein Hazimeh, Zhe Zhao, Aakanksha Chowdhery, Maheswaran Sathiamoorthy, Yihua Chen, Rahul Mazumder, Lichan Hong, Ed H. Chi

    Abstract: The Mixture-of-Experts (MoE) architecture is showing promising results in improving parameter sharing in multi-task learning (MTL) and in scaling high-capacity neural networks. State-of-the-art MoE models use a trainable sparse gate to select a subset of the experts for each input example. While conceptually appealing, existing sparse gates, such as Top-k, are not smooth. The lack of smoothness ca… ▽ More

    Submitted 31 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

    Comments: Appeared in NeurIPS 2021

  9. arXiv:2106.02175  [pdf, other

    math.OC stat.CO stat.ME stat.ML

    Linear regression with partially mismatched data: local search with theoretical guarantees

    Authors: Rahul Mazumder, Haoyue Wang

    Abstract: Linear regression is a fundamental modeling tool in statistics and related fields. In this paper, we study an important variant of linear regression in which the predictor-response pairs are partially mismatched. We use an optimization formulation to simultaneously learn the underlying regression coefficients and the permutation corresponding to the mismatches. The combinatorial structure of the p… ▽ More

    Submitted 31 October, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

  10. arXiv:2105.11387  [pdf, other

    stat.CO math.OC stat.ME

    A new computational framework for log-concave density estimation

    Authors: Wenyu Chen, Rahul Mazumder, Richard J. Samworth

    Abstract: In Statistics, log-concave density estimation is a central problem within the field of nonparametric inference under shape constraints. Despite great progress in recent years on the statistical theory of the canonical estimator, namely the log-concave maximum likelihood estimator, adoption of this method has been hampered by the complexities of the non-smooth convex optimization problem that under… ▽ More

    Submitted 28 February, 2023; v1 submitted 24 May, 2021; originally announced May 2021.

  11. arXiv:2104.07084  [pdf, other

    stat.ME cs.LG math.OC stat.CO stat.ML

    Grouped Variable Selection with Discrete Optimization: Computational and Statistical Perspectives

    Authors: Hussein Hazimeh, Rahul Mazumder, Peter Radchenko

    Abstract: We present a new algorithmic framework for grouped variable selection that is based on discrete mathematical optimization. While there exist several appealing approaches based on convex relaxations and nonconvex heuristics, we focus on optimal solutions for the $\ell_0$-regularized formulation, a problem that is relatively unexplored due to computational challenges. Our methodology covers both hig… ▽ More

    Submitted 17 October, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

  12. arXiv:2012.15361  [pdf, other

    math.OC

    Frank-Wolfe Methods with an Unbounded Feasible Region and Applications to Structured Learning

    Authors: Haoyue Wang, Haihao Lu, Rahul Mazumder

    Abstract: The Frank-Wolfe (FW) method is a popular algorithm for solving large-scale convex optimization problems appearing in structured statistical learning. However, the traditional Frank-Wolfe method can only be applied when the feasible region is bounded, which limits its applicability in practice. Motivated by two applications in statistical learning, the $\ell_1$ trend filtering problem and matrix op… ▽ More

    Submitted 7 October, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

    Comments: 31 pages, 6 figures

    MSC Class: 90C25; 90C06; 90C90

  13. arXiv:2005.11588  [pdf, other

    math.OC stat.CO stat.ML

    Subgradient Regularized Multivariate Convex Regression at Scale

    Authors: Wenyu Chen, Rahul Mazumder

    Abstract: We present new large-scale algorithms for fitting a subgradient regularized multivariate convex regression function to $n$ samples in $d$ dimensions -- a key problem in shape constrained nonparametric regression with applications in statistics, engineering and the applied sciences. The infinite-dimensional learning task can be expressed via a convex quadratic program (QP) with $O(nd)$ decision var… ▽ More

    Submitted 4 December, 2023; v1 submitted 23 May, 2020; originally announced May 2020.

  14. arXiv:2004.06152  [pdf, other

    stat.CO cs.LG math.OC stat.ML

    Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

    Authors: Hussein Hazimeh, Rahul Mazumder, Ali Saab

    Abstract: We consider the least squares regression problem, penalized with a combination of the $\ell_{0}$ and squared $\ell_{2}$ penalty functions (a.k.a. $\ell_0 \ell_2$ regularization). Recent work shows that the resulting estimators are of key importance in many high-dimensional statistical settings. However, exact computation of these estimators remains a major challenge. Indeed, modern exact methods,… ▽ More

    Submitted 14 April, 2021; v1 submitted 13 April, 2020; originally announced April 2020.

  15. arXiv:2001.06471  [pdf, other

    stat.ML cs.LG math.OC stat.CO

    Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

    Authors: Antoine Dedieu, Hussein Hazimeh, Rahul Mazumder

    Abstract: We consider a discrete optimization formulation for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) $\ell_0$-regularized regression problems at scales much larger than what was conventionally considered possible. Despite their usefulness, M… ▽ More

    Submitted 6 June, 2021; v1 submitted 17 January, 2020; originally announced January 2020.

    Comments: To appear in JMLR

  16. arXiv:1909.10143  [pdf, other

    math.ST stat.ME

    Computing the degrees of freedom of rank-regularized estimators and cousins

    Authors: Rahul Mazumder, Haolei Weng

    Abstract: Estimating a low rank matrix from its linear measurements is a problem of central importance in contemporary statistical analysis. The choice of tuning parameters for estimators remains an important challenge from a theoretical and practical perspective. To this end, Stein's Unbiased Risk Estimate (SURE) framework provides a well-grounded statistical framework for degrees of freedom estimation. In… ▽ More

    Submitted 22 September, 2019; originally announced September 2019.

  17. arXiv:1908.06515  [pdf, other

    stat.CO math.OC stat.ML

    Computing Estimators of Dantzig Selector type via Column and Constraint Generation

    Authors: Rahul Mazumder, Stephen Wright, Andrew Zheng

    Abstract: We consider a class of linear-programming based estimators in reconstructing a sparse signal from linear measurements. Specific formulations of the reconstruction problem considered here include Dantzig selector, basis pursuit (for the case in which the measurements contain no errors), and the fused Dantzig selector (for the case in which the underlying signal is piecewise constant). In spite of b… ▽ More

    Submitted 18 August, 2019; originally announced August 2019.

  18. arXiv:1902.01542  [pdf, other

    stat.ML cs.LG math.OC stat.CO

    Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

    Authors: Hussein Hazimeh, Rahul Mazumder

    Abstract: In many learning settings, it is beneficial to augment the main features with pairwise interactions. Such interaction models can be often enhanced by performing variable selection under the so-called strong hierarchy constraint: an interaction is non-zero only if its associated main features are non-zero. Existing convex optimization based algorithms face difficulties in handling problems where th… ▽ More

    Submitted 13 July, 2020; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: AISTATS 2020

  19. arXiv:1810.10158  [pdf, other

    cs.LG math.OC stat.ML

    Randomized Gradient Boosting Machine

    Authors: Haihao Lu, Rahul Mazumder

    Abstract: Gradient Boosting Machine (GBM) introduced by Friedman is a powerful supervised learning algorithm that is very widely used in practice---it routinely features as a leading algorithm in machine learning competitions such as Kaggle and the KDDCup. In spite of the usefulness of GBM in practice, our current theoretical understanding of this method is rather limited. In this work, we propose Randomize… ▽ More

    Submitted 15 September, 2020; v1 submitted 23 October, 2018; originally announced October 2018.

  20. arXiv:1810.09062  [pdf, ps, other

    math.OC

    Using L1-relaxation and integer programming to obtain dual bounds for sparse PCA

    Authors: Santanu S. Dey, Rahul Mazumder, Guanyi Wang

    Abstract: Principal component analysis (PCA) is one of the most widely used dimensionality reduction tools in data analysis. The PCA direction is a linear combination of all features with nonzero loadings -- this impedes interpretability. Sparse PCA (SPCA) is a framework that enhances interpretability by incorporating an additional sparsity requirement in the feature weights. However, unlike PCA, the SPCA p… ▽ More

    Submitted 17 August, 2021; v1 submitted 21 October, 2018; originally announced October 2018.

  21. arXiv:1810.08727  [pdf, ps, other

    math.OC cs.LG stat.CO stat.ML

    Condition Number Analysis of Logistic Regression, and its Implications for Standard First-Order Solution Methods

    Authors: Robert M. Freund, Paul Grigas, Rahul Mazumder

    Abstract: Logistic regression is one of the most popular methods in binary classification, wherein estimation of model parameters is carried out by solving the maximum likelihood (ML) optimization problem, and the ML estimator is defined to be the optimal solution of this problem. It is well known that the ML estimator exists when the data is non-separable, but fails to exist when the data is separable. Fir… ▽ More

    Submitted 19 October, 2018; originally announced October 2018.

    Comments: 38 pages

  22. arXiv:1803.01454  [pdf, other

    stat.CO math.OC stat.ML

    Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

    Authors: Hussein Hazimeh, Rahul Mazumder

    Abstract: The $L_0$-regularized least squares problem (a.k.a. best subsets) is central to sparse statistical learning and has attracted significant attention across the wider statistics, machine learning, and optimization communities. Recent work has shown that modern mixed integer optimization (MIO) solvers can be used to address small to moderate instances of this problem. In spite of the usefulness of… ▽ More

    Submitted 24 January, 2020; v1 submitted 4 March, 2018; originally announced March 2018.

    Comments: To appear in Operations Research

  23. arXiv:1801.05935  [pdf, other

    math.OC stat.CO stat.ML

    Computation of the Maximum Likelihood estimator in low-rank Factor Analysis

    Authors: Koulik Khamaru, Rahul Mazumder

    Abstract: Factor analysis, a classical multivariate statistical technique is popularly used as a fundamental tool for dimensionality reduction in statistics, econometrics and data science. Estimation is often carried out via the Maximum Likelihood (ML) principle, which seeks to maximize the likelihood under the assumption that the positive definite covariance matrix can be decomposed as the sum of a low ran… ▽ More

    Submitted 17 January, 2018; originally announced January 2018.

    Comments: 22 pages, 4 figures

  24. arXiv:1712.00800  [pdf, ps, other

    math.OC

    Sparse principal component analysis and its $l_1$-relaxation

    Authors: Santanu S. Dey, Rahul Mazumder, Marco Molinaro, Guanyi Wang

    Abstract: Principal component analysis (PCA) is one of the most widely used dimensionality reduction methods in scientific data analysis. In many applications, for additional interpretability, it is desirable for the factor loadings to be sparse, that is, we solve PCA with an additional cardinality (l0) constraint. The resulting optimization problem is called the sparse principal component analysis (SPCA).… ▽ More

    Submitted 3 December, 2017; originally announced December 2017.

  25. arXiv:1708.04527  [pdf, other

    stat.ME math.OC math.ST stat.CO stat.ML

    The Trimmed Lasso: Sparsity and Robustness

    Authors: Dimitris Bertsimas, Martin S. Copenhaver, Rahul Mazumder

    Abstract: Nonconvex penalty methods for sparse modeling in linear regression have been a topic of fervent interest in recent years. Herein, we study a family of nonconvex penalty functions that we call the trimmed Lasso and that offers exact control over the desired level of sparsity of estimators. We analyze its structural properties and in doing so show the following: 1) Drawing parallels between robust… ▽ More

    Submitted 15 August, 2017; originally announced August 2017.

    Comments: 32 pages (excluding appendix); 4 figures

  26. arXiv:1708.03288  [pdf, other

    stat.ME math.OC math.ST stat.CO stat.ML

    Subset Selection with Shrinkage: Sparse Linear Modeling when the SNR is low

    Authors: Rahul Mazumder, Peter Radchenko, Antoine Dedieu

    Abstract: We study a seemingly unexpected and relatively less understood overfitting aspect of a fundamental tool in sparse linear modeling - best subset selection, which minimizes the residual sum of squares subject to a constraint on the number of nonzero coefficients. While the best subset selection procedure is often perceived as the "gold standard" in sparse learning when the signal to noise ratio (SNR… ▽ More

    Submitted 7 January, 2022; v1 submitted 10 August, 2017; originally announced August 2017.

  27. arXiv:1604.06837  [pdf, other

    stat.ME math.OC stat.CO

    Certifiably Optimal Low Rank Factor Analysis

    Authors: Dimitris Bertsimas, Martin S. Copenhaver, Rahul Mazumder

    Abstract: Factor Analysis (FA) is a technique of fundamental importance that is widely used in classical and modern multivariate statistics, psychometrics and econometrics. In this paper, we revisit the classical rank-constrained FA problem, which seeks to approximate an observed covariance matrix ($\boldsymbolΣ$), by the sum of a Positive Semidefinite (PSD) low-rank component ($\boldsymbolΘ$) and a diagona… ▽ More

    Submitted 22 April, 2016; originally announced April 2016.

    Journal ref: JMLR 18(29) (2017)

  28. arXiv:1511.02204  [pdf, other

    math.OC stat.CO stat.ML

    An Extended Frank-Wolfe Method with "In-Face" Directions, and its Application to Low-Rank Matrix Completion

    Authors: Robert M. Freund, Paul Grigas, Rahul Mazumder

    Abstract: Motivated principally by the low-rank matrix completion problem, we present an extension of the Frank-Wolfe method that is designed to induce near-optimal solutions on low-dimensional faces of the feasible region. This is accomplished by a new approach to generating ``in-face" directions at each iteration, as well as through new choice rules for selecting between in-face and ``regular" Frank-Wolfe… ▽ More

    Submitted 6 November, 2015; originally announced November 2015.

    Comments: 25 pages, 3 tables and 2 figues

    MSC Class: 90C25 ACM Class: G.1.6

  29. arXiv:1509.08165  [pdf, other

    stat.CO math.OC stat.ME

    A Computational Framework for Multivariate Convex Regression and its Variants

    Authors: Rahul Mazumder, Arkopal Choudhury, Garud Iyengar, Bodhisattva Sen

    Abstract: We study the nonparametric least squares estimator (LSE) of a multivariate convex regression function. The LSE, given as the solution to a quadratic program with $O(n^2)$ linear constraints ($n$ being the sample size), is difficult to compute for large problems. Exploiting problem specific structure, we propose a scalable algorithmic framework based on the augmented Lagrangian method to compute th… ▽ More

    Submitted 27 September, 2015; originally announced September 2015.

  30. arXiv:1509.00426  [pdf, other

    math.ST math.OC

    Scalable Computation of Regularized Precision Matrices via Stochastic Optimization

    Authors: Yves F. Atchadé, Rahul Mazumder, Jie Chen

    Abstract: We consider the problem of computing a positive definite $p \times p$ inverse covariance matrix aka precision matrix $θ=(θ_{ij})$ which optimizes a regularized Gaussian maximum likelihood problem, with the elastic-net regularizer $\sum_{i,j=1}^{p} λ(α|θ_{ij}| + \frac{1}{2}(1- α) θ_{ij}^2),$ with regularization parameters $α\in [0,1]$ and $λ>0$. The associated convex semidefinite optimization probl… ▽ More

    Submitted 1 September, 2015; originally announced September 2015.

    Comments: 42 pages

  31. arXiv:1508.01922  [pdf, other

    stat.ME math.OC math.ST stat.CO stat.ML

    The Discrete Dantzig Selector: Estimating Sparse Linear Models via Mixed Integer Linear Optimization

    Authors: Rahul Mazumder, Peter Radchenko

    Abstract: We propose a novel high-dimensional linear regression estimator: the Discrete Dantzig Selector, which minimizes the number of nonzero regression coefficients subject to a budget on the maximal absolute correlation between the features and residuals. Motivated by the significant advances in integer optimization over the past 10-15 years, we present a Mixed Integer Linear Optimization (MILO) approac… ▽ More

    Submitted 19 January, 2017; v1 submitted 8 August, 2015; originally announced August 2015.

  32. arXiv:1507.03133  [pdf, other

    stat.ME math.OC stat.CO stat.ML

    Best Subset Selection via a Modern Optimization Lens

    Authors: Dimitris Bertsimas, Angela King, Rahul Mazumder

    Abstract: In the last twenty-five years (1990-2014), algorithmic advances in integer optimization combined with hardware improvements have resulted in an astonishing 200 billion factor speedup in solving Mixed Integer Optimization (MIO) problems. We present a MIO approach for solving the classical best subset selection problem of choosing $k$ out of $p$ features in linear regression given $n$ observations.… ▽ More

    Submitted 11 July, 2015; originally announced July 2015.

    Comments: This is a revised version (May, 2015) of the first submission in June 2014

  33. arXiv:1505.04243  [pdf, other

    math.ST cs.LG math.OC stat.ML

    A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

    Authors: Robert M. Freund, Paul Grigas, Rahul Mazumder

    Abstract: In this paper we analyze boosting algorithms in linear regression from a new perspective: that of modern first-order methods in convex optimization. We show that classic boosting algorithms in linear regression, namely the incremental forward stagewise algorithm (FS$_\varepsilon$) and least squares boosting (LS-Boost($\varepsilon$)), can be viewed as subgradient descent to minimize the loss functi… ▽ More

    Submitted 16 May, 2015; originally announced May 2015.

    MSC Class: 62J05; 62J07; 90C25

  34. arXiv:1307.1192  [pdf, ps, other

    stat.ML cs.LG math.OC

    AdaBoost and Forward Stagewise Regression are First-Order Convex Optimization Methods

    Authors: Robert M. Freund, Paul Grigas, Rahul Mazumder

    Abstract: Boosting methods are highly popular and effective supervised learning methods which combine weak learners into a single accurate model with good statistical performance. In this paper, we analyze two well-known boosting methods, AdaBoost and Incremental Forward Stagewise Regression (FS$_\varepsilon$), by establishing their precise connections to the Mirror Descent algorithm, which is a first-order… ▽ More

    Submitted 3 July, 2013; originally announced July 2013.

    MSC Class: 68Q32; 68T05; 62J05; 90C25 ACM Class: I.2.6; I.5.1; G.3; G.1.6

  35. Projected likelihood contrasts for testing homogeneity in finite mixture models with nuisance parameters

    Authors: Debapriya Sengupta, Rahul Mazumder

    Abstract: This paper develops a test for homogeneity in finite mixture models where the mixing proportions are known a priori (taken to be 0.5) and a common nuisance parameter is present. Statistical tests based on the notion of Projected Likelihood Contrasts (PLC) are considered. The PLC is a slight modification of the usual likelihood ratio statistic or the Wilk's $Λ$ and is similar in spirit to the Rao… ▽ More

    Submitted 16 May, 2008; originally announced May 2008.

    Comments: Published in at http://dx.doi.org/10.1214/193940307000000194 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org)

    Report number: IMS-COLL1-IMSCOLL120 MSC Class: 62G08; 60G35 (Primary) 60J55 (Secondary)

    Journal ref: IMS Collections 2008, Vol. 1, 272-281