Skip to main content

Showing 1–50 of 86 results for author: Pilanci, M

Searching in archive cs. Search in all archives.
  1. arXiv:2406.19328  [pdf, other

    cs.SD cs.LG eess.AS

    Subtractive Training for Music Stem Insertion using Latent Diffusion Models

    Authors: Ivan Villa-Renteria, Mason L. Wang, Zachary Shah, Zhe Li, Soohyun Kim, Neelesh Ramachandran, Mert Pilanci

    Abstract: We present Subtractive Training, a simple and novel method for synthesizing individual musical instrument stems given other instruments as context. This method pairs a dataset of complete music mixes with 1) a variant of the dataset lacking a specific stem, and 2) LLM-generated instructions describing how the missing stem should be reintroduced. We then fine-tune a pretrained text-to-audio diffusi… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  2. arXiv:2406.10254  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Towards Signal Processing In Large Language Models

    Authors: Prateek Verma, Mert Pilanci

    Abstract: This paper introduces the idea of applying signal processing inside a Large Language Model (LLM). With the recent explosion of generative AI, our work can help bridge two fields together, namely the field of signal processing and large language models. We draw parallels between classical Fourier-Transforms and Fourier Transform-like learnable time-frequency representations for every intermediate a… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 12 pages, 3 figures

  3. arXiv:2406.08904  [pdf, other

    cs.LG cs.SD eess.AS

    AdaPTwin: Low-Cost Adaptive Compression of Product Twins in Transformers

    Authors: Emil Biju, Anirudh Sriram, Mert Pilanci

    Abstract: While large transformer-based models have exhibited remarkable performance in speaker-independent speech recognition, their large size and computational requirements make them expensive or impractical to use in resource-constrained settings. In this work, we propose a low-rank adaptive compression technique called AdaPTwin that jointly compresses product-dependent pairs of weight matrices in the t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 12 pages, 3 figures, submitted to NeurIPS 2024

  4. arXiv:2406.02806  [pdf, other

    cs.LG math.OC stat.ML

    Randomized Geometric Algebra Methods for Convex Neural Networks

    Authors: Yifei Wang, Sungyoon Kim, Paul Chu, Indu Subramaniam, Mert Pilanci

    Abstract: We introduce randomized algorithms to Clifford's Geometric Algebra, generalizing randomized linear algebra to hypercomplex vector spaces. This novel approach has many implications in machine learning, including training neural networks to global optimality via convex optimization. Additionally, we consider fine-tuning large language model (LLM) embeddings as a key application area, exploring the i… ▽ More

    Submitted 8 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  5. arXiv:2405.18886  [pdf, ps, other

    cs.LG cs.AI math.OC stat.ML

    Compressing Large Language Models using Low Rank and Low Precision Decomposition

    Authors: Rajarshi Saha, Naomi Sagan, Varun Srivastava, Andrea J. Goldsmith, Mert Pilanci

    Abstract: The prohibitive sizes of Large Language Models (LLMs) today make it difficult to deploy them on memory-constrained edge devices. This work introduces $\rm CALDERA$ -- a new post-training LLM compression algorithm that harnesses the inherent low-rank structure of a weight matrix $\mathbf{W}$ by approximating it via a low-rank, low-precision decomposition as… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 30 pages, 9 figures, 7 tables

  6. arXiv:2405.14033  [pdf, other

    cs.LG math.OC

    Adversarial Training of Two-Layer Polynomial and ReLU Activation Networks via Convex Optimization

    Authors: Daniel Kuelbs, Sanjay Lall, Mert Pilanci

    Abstract: Training neural networks which are robust to adversarial attacks remains an important problem in deep learning, especially as heavily overparameterized models are adopted in safety-critical settings. Drawing from recent work which reformulates the training problems for two-layer ReLU and polynomial activation networks as convex programs, we devise a convex semidefinite program (SDP) for adversaria… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 6 pages, 4 figures

  7. arXiv:2405.13952  [pdf, other

    cs.LG cs.AI

    Spectral Adapter: Fine-Tuning in Spectral Space

    Authors: Fangzhao Zhang, Mert Pilanci

    Abstract: Recent developments in Parameter-Efficient Fine-Tuning (PEFT) methods for pretrained deep neural networks have captured widespread interest. In this work, we study the enhancement of current PEFT methods by incorporating the spectral information of pretrained weight matrices into the fine-tuning procedure. We investigate two spectral adaptation mechanisms, namely additive tuning and orthogonal rot… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  8. arXiv:2404.02378  [pdf, ps, other

    math.OC cs.LG

    Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

    Authors: Aaron Mishkin, Mert Pilanci, Mark Schmidt

    Abstract: We prove new convergence rates for a generalized version of stochastic Nesterov acceleration under interpolation conditions. Unlike previous analyses, our approach accelerates any stochastic gradient method which makes sufficient progress in expectation. The proof, which proceeds using the estimating sequences framework, applies to both convex and strongly convex functions and is easily specialize… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Results extend work from Aaron Mishkin's master's thesis

  9. arXiv:2403.01046  [pdf, other

    cs.LG cs.AI cs.NE math.OC stat.ML

    A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features

    Authors: Emi Zeger, Yifei Wang, Aaron Mishkin, Tolga Ergen, Emmanuel Candès, Mert Pilanci

    Abstract: We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2-layer networks with piecewise linear activations, deep narrow ReLU networks with up to 4 layers, and rectangular and tree networks with sign activation and arbitr… ▽ More

    Submitted 18 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  10. arXiv:2402.04359  [pdf, other


    Adaptive Inference: Theoretical Limits and Unexplored Opportunities

    Authors: Soheil Hor, Ying Qian, Mert Pilanci, Amin Arbabian

    Abstract: This paper introduces the first theoretical framework for quantifying the efficiency and performance gain opportunity size of adaptive inference algorithms. We provide new approximate and exact bounds for the achievable efficiency and performance gains, supported by empirical evidence demonstrating the potential for 10-100x efficiency improvements in both Computer Vision and Natural Language Proce… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  11. arXiv:2402.03625  [pdf, other

    cs.LG math.OC

    Convex Relaxations of ReLU Neural Networks Approximate Global Optima in Polynomial Time

    Authors: Sungyoon Kim, Mert Pilanci

    Abstract: In this paper, we study the optimality gap between two-layer ReLU networks regularized with weight decay and their convex relaxations. We show that when the training data is random, the relative optimality gap between the original problem and its relaxation can be bounded by a factor of O(log n^0.5), where n is the number of training samples. A simple application leads to a tractable polynomial-ti… ▽ More

    Submitted 5 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

  12. arXiv:2402.02347  [pdf, other

    cs.LG math.NA math.OC

    Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

    Authors: Fangzhao Zhang, Mert Pilanci

    Abstract: Low-Rank Adaptation (LoRA) emerges as a popular parameter-efficient fine-tuning (PEFT) method, which proposes to freeze pretrained model weights and update an additive low-rank trainable matrix. In this work, we study the enhancement of LoRA training by introducing an $r \times r$ preconditioner in each gradient step where $r$ is the LoRA rank. We theoretically verify that the proposed preconditio… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  13. arXiv:2402.01965  [pdf, other

    cs.LG math.OC

    Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization

    Authors: Fangzhao Zhang, Mert Pilanci

    Abstract: Diffusion models are gaining widespread use in cutting-edge image, video, and audio generation. Score-based diffusion models stand out among these methods, necessitating the estimation of score function of the input data distribution. In this study, we present a theoretical framework to analyze two-layer neural network-based diffusion models by reframing score matching and denoising score matching… ▽ More

    Submitted 22 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  14. arXiv:2401.15838  [pdf, other

    stat.ML cs.LG cs.MA math.OC stat.CO

    Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers

    Authors: Alexandros E. Tzikas, Licio Romao, Mert Pilanci, Alessandro Abate, Mykel J. Kochenderfer

    Abstract: Many machine learning applications require operating on a spatially distributed dataset. Despite technological advances, privacy considerations and communication constraints may prevent gathering the entire dataset in a central unit. In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers, which is commonly used in the optimization literatur… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  15. arXiv:2312.12657  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

    Authors: Tolga Ergen, Mert Pilanci

    Abstract: Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this study, we examine the use of convex optimization theory and sparse recovery models to… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: A preliminary version of part of this work was published at ICML 2020 with the title "Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-layer Networks"

  16. arXiv:2311.13177  [pdf, other cs.CV

    Volumetric Reconstruction Resolves Off-Resonance Artifacts in Static and Dynamic PROPELLER MRI

    Authors: Annesha Ghosh, Gordon Wetzstein, Mert Pilanci, Sara Fridovich-Keil

    Abstract: Off-resonance artifacts in magnetic resonance imaging (MRI) are visual distortions that occur when the actual resonant frequencies of spins within the imaging volume differ from the expected frequencies used to encode spatial information. These discrepancies can be caused by a variety of factors, including magnetic field inhomogeneities, chemical shifts, or susceptibility differences within the ti… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Code is available at

  17. arXiv:2311.10972  [pdf, other

    cs.LG cs.CC stat.ML

    Polynomial-Time Solutions for ReLU Network Training: A Complexity Classification via Max-Cut and Zonotopes

    Authors: Yifei Wang, Mert Pilanci

    Abstract: We investigate the complexity of training a two-layer ReLU neural network with weight decay regularization. Previous research has shown that the optimal solution of this problem can be found by solving a standard cone-constrained convex program. Using this convex formulation, we prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but als… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  18. arXiv:2310.11028  [pdf, other

    cs.LG cs.IT stat.ML

    Matrix Compression via Randomized Low Rank and Low Precision Factorization

    Authors: Rajarshi Saha, Varun Srivastava, Mert Pilanci

    Abstract: Matrices are exceptionally useful in various fields of study as they provide a convenient framework to organize and manipulate data in a structured manner. However, modern matrices can involve billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage. Although prohibitively large, such matrices are often approximately low rank. W… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted to the 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  19. arXiv:2309.16512  [pdf, other

    cs.LG cs.AI cs.NE math.OC stat.ML

    From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity

    Authors: Mert Pilanci

    Abstract: In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometri… ▽ More

    Submitted 22 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

  20. arXiv:2309.15096  [pdf, other

    cs.LG stat.ML

    Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs

    Authors: Rajat Vadiraj Dwaraknath, Tolga Ergen, Mert Pilanci

    Abstract: Recently, theoretical analyses of deep neural networks have broadly focused on two directions: 1) Providing insight into neural network training by SGD in the limit of infinite hidden-layer width and infinitesimally small learning rate (also known as gradient flow) via the Neural Tangent Kernel (NTK), and 2) Globally optimizing the regularized training objective via cone-constrained convex reformu… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to Neurips 2023

  21. arXiv:2309.00682  [pdf, other

    cs.DC cs.IT cs.LG

    Randomized Polar Codes for Anytime Distributed Machine Learning

    Authors: Burak Bartan, Mert Pilanci

    Abstract: We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. The proposed mechanism integrates the concepts of randomized sketching and polar codes in the context of coded computation. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computa… ▽ More

    Submitted 1 September, 2023; originally announced September 2023.

  22. arXiv:2308.04185  [pdf, other

    cs.IT cs.CR cs.DC cs.LG math.NA

    Iterative Sketching for Secure Coded Regression

    Authors: Neophytos Charalambides, Hessam Mahdavifar, Mert Pilanci, Alfred O. Hero III

    Abstract: Linear regression is a fundamental and primitive problem in supervised machine learning, with applications ranging from epidemiology to finance. In this work, we propose methods for speeding up distributed linear regression. We do so by leveraging randomized techniques, while also ensuring security and straggler resiliency in asynchronous distributed computing systems. Specifically, we randomly ro… ▽ More

    Submitted 31 March, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

    Comments: 29 pages, 8 figures. arXiv admin note: substantial text overlap with arXiv:2201.08522

    MSC Class: 65B99; 68P20; 68P25; 68P27; 68P30; 94-10; 94A11; 94A16; 94B60 ACM Class: E.3; E.4; F.2.1; G.1.3

  23. arXiv:2308.03096  [pdf, other

    cs.IT cs.DC cs.IR cs.LG math.NA

    Gradient Coding with Iterative Block Leverage Score Sampling

    Authors: Neophytos Charalambides, Mert Pilanci, Alfred Hero

    Abstract: We generalize the leverage score sampling sketch for $\ell_2$-subspace embeddings, to accommodate sampling subsets of the transformed data, so that the sketching approach is appropriate for distributed settings. This is then used to derive an approximate coded computing approach for first-order methods; known as gradient coding, to accelerate linear regression in the presence of failures in distri… ▽ More

    Submitted 25 June, 2024; v1 submitted 6 August, 2023; originally announced August 2023.

    Comments: 26 pages, 6 figures, 1 table

    MSC Class: 65B99; 65F10; 65F20; 65F45; 65F55; 68W20; 68W25; 94A20; 68P30; 68P20 ACM Class: G.1.2; G.1.3; G.1.6; G.3; E.4

  24. arXiv:2306.00119  [pdf, other


    Optimal Sets and Solution Paths of ReLU Networks

    Authors: Aaron Mishkin, Mert Pilanci

    Abstract: We develop an analytical framework to characterize the set of optimal ReLU neural networks by reformulating the non-convex training problem as a convex program. We show that the global optima of the convex parameterization are given by a polyhedral set and then extend this characterization to the optimal set of the non-convex training objective. Since all stationary points of the ReLU training pro… ▽ More

    Submitted 19 January, 2024; v1 submitted 31 May, 2023; originally announced June 2023.

    Comments: Minor updates and corrections to clarify the role of merge/split symmetries in formation of ReLU optimal set and add missing sufficient conditions for all minimal models to have the same cardinality

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR 202:24888-24924, 2023

  25. arXiv:2303.03382  [pdf, other

    cs.LG stat.ML

    Globally Optimal Training of Neural Networks with Threshold Activation Functions

    Authors: Tolga Ergen, Halil Ibrahim Gulluk, Jonathan Lacotte, Mert Pilanci

    Abstract: Threshold activation functions are highly preferable in neural networks due to their efficiency in hardware implementations. Moreover, their mode of operation is more interpretable and resembles that of biological neurons. However, traditional gradient based algorithms such as Gradient Descent cannot be used to train the parameters of neural networks with threshold activations since the activation… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted to ICLR 2023

  26. arXiv:2302.13527  [pdf

    eess.AS cs.SD eess.SP

    Complex Clip** for Improved Generalization in Machine Learning

    Authors: Les Atlas, Nicholas Rasmussen, Felix Schwock, Mert Pilanci

    Abstract: For many machine learning applications, a common input representation is a spectrogram. The underlying representation for a spectrogram is a short time Fourier transform (STFT) which gives complex values. The spectrogram uses the magnitude of these complex values, a commonly used detector. Modern machine learning systems are commonly overparameterized, where possible ill-conditioning problems are… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

    Comments: Submitted to IEEE Signal Processing Letters

  27. arXiv:2301.03539  [pdf, other

    cs.IT cs.CR math.NA

    Securely Aggregated Coded Matrix Inversion

    Authors: Neophytos Charalambides, Mert Pilanci, Alfred Hero

    Abstract: Coded computing is a method for mitigating straggling workers in a centralized computing network, by using erasure-coding techniques. Federated learning is a decentralized model for training data distributed across client devices. In this work we propose approximating the inverse of an aggregated data matrix, where the data is generated by clients; similar to the federated learning paradigm, while… ▽ More

    Submitted 4 September, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2207.06271

    MSC Class: 12-08; 15A29; 94B05; 94A15 ACM Class: E.4; E.3; G.1.2; G.1.3

  28. arXiv:2209.15265  [pdf, other

    cs.LG cs.IT math.OC stat.ML

    Overparameterized ReLU Neural Networks Learn the Simplest Models: Neural Isometry and Exact Recovery

    Authors: Yifei Wang, Yixuan Hua, Emmanuel Candés, Mert Pilanci

    Abstract: The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the trai… ▽ More

    Submitted 17 February, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

  29. arXiv:2207.08393  [pdf, other

    eess.IV cs.CV

    GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction

    Authors: Batu Ozturkler, Arda Sahiner, Tolga Ergen, Arjun D Desai, Christopher M Sandino, Shreyas Vasanawala, John M Pauly, Morteza Mardani, Mert Pilanci

    Abstract: Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction. These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization. However, they require several iterations of a large neural network to handle high-dimensional imaging tasks such as 3D MRI. This limits traditional training… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  30. arXiv:2207.06271  [pdf, other

    cs.IT cs.CR math.NA

    Secure Linear MDS Coded Matrix Inversion

    Authors: Neophytos Charalambides, Mert Pilanci, Alfred Hero

    Abstract: A cumbersome operation in many scientific fields, is inverting large full-rank matrices. In this paper, we propose a coded computing approach for recovering matrix inverse approximations. We first present an approximate matrix inversion algorithm which does not require a matrix factorization, but uses a black-box least squares optimization solver as a subroutine, to give an estimate of the inverse… ▽ More

    Submitted 20 December, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

    MSC Class: 12-08; 15A29; 94B05; 94A15 ACM Class: G.1.2; G.1.6; E.4

  31. arXiv:2205.13098  [pdf, other

    cs.LG math.OC stat.ML

    Optimal Neural Network Approximation of Wasserstein Gradient Direction via Convex Optimization

    Authors: Yifei Wang, Peng Chen, Mert Pilanci, Wuchen Li

    Abstract: The computation of Wasserstein gradient direction is essential for posterior sampling problems and scientific computing. The approximation of the Wasserstein gradient with finite samples requires solving a variational problem. We study the variational problem in the family of two-layer networks with squared-ReLU activations, towards which we derive a semi-definite programming (SDP) relaxation. Thi… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  32. arXiv:2205.08078  [pdf, other

    cs.LG cs.CV math.OC

    Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

    Authors: Arda Sahiner, Tolga Ergen, Batu Ozturkler, John Pauly, Morteza Mardani, Mert Pilanci

    Abstract: Vision transformers using self-attention or its proposed alternatives have demonstrated promising results in many image related tasks. However, the underpinning inductive bias of attention is not well understood. To address this issue, this paper analyzes attention through the lens of convex duality. For the non-linear dot-product self-attention, and alternative mechanisms such as MLP-mixer and Fo… ▽ More

    Submitted 20 May, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: 38 pages, 2 figures. To appear in ICML 2022

  33. arXiv:2204.10436  [pdf, other

    eess.IV cs.CV cs.LG

    Scale-Equivariant Unrolled Neural Networks for Data-Efficient Accelerated MRI Reconstruction

    Authors: Beliz Gunel, Arda Sahiner, Arjun D. Desai, Akshay S. Chaudhari, Shreyas Vasanawala, Mert Pilanci, John Pauly

    Abstract: Unrolled neural networks have enabled state-of-the-art reconstruction performance and fast inference times for the accelerated magnetic resonance imaging (MRI) reconstruction task. However, these approaches depend on fully-sampled scans as ground truth data which is either costly or not possible to acquire in many clinical medical imaging applications; hence, reducing dependence on data is desirab… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

  34. arXiv:2203.10124  [pdf, other


    Approximate Function Evaluation via Multi-Armed Bandits

    Authors: Tavor Z. Baharav, Gary Cheng, Mert Pilanci, David Tse

    Abstract: We study the problem of estimating the value of a known smooth function $f$ at an unknown point $\boldsymbolμ \in \mathbb{R}^n$, where each component $μ_i$ can be sampled via a noisy oracle. Sampling more frequently components of $\boldsymbolμ$ corresponding to directions of the function with larger directional derivatives is more sample-efficient. However, as $\boldsymbolμ$ is unknown, the optima… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: To appear in AISTATS 2022

  35. arXiv:2203.09755  [pdf, other

    math.OC cs.DC cs.IT cs.LG

    Distributed Sketching for Randomized Optimization: Exact Characterization, Concentration and Lower Bounds

    Authors: Burak Bartan, Mert Pilanci

    Abstract: We consider distributed optimization methods for problems where forming the Hessian is computationally challenging and communication is a significant bottleneck. We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems. We derive novel approximation guarantees for classical sketching met… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2002.06540

  36. arXiv:2202.11277  [pdf, other

    cs.IT cs.LG eess.SP stat.ML

    Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms

    Authors: Rajarshi Saha, Mert Pilanci, Andrea J. Goldsmith

    Abstract: High-dimensional models often have a large memory footprint and must be quantized after training before being deployed on resource-constrained edge devices for inference tasks. In this work, we develop an information-theoretic framework for the problem of quantizing a linear regressor learned from training data $(\mathbf{X}, \mathbf{y})$, for some underlying statistical relationship… ▽ More

    Submitted 30 August, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: 50 pages, 31 figures, 9 tables

  37. arXiv:2202.01331  [pdf, other


    Fast Convex Optimization for Two-Layer ReLU Networks: Equivalent Model Classes and Cone Decompositions

    Authors: Aaron Mishkin, Arda Sahiner, Mert Pilanci

    Abstract: We develop fast algorithms and robust software for convex optimization of two-layer neural networks with ReLU activation functions. Our work leverages a convex reformulation of the standard weight-decay penalized training problem as a set of group-$\ell_1$-regularized data-local models, where locality is enforced by polyhedral cone constraints. In the special case of zero-regularization, we show t… ▽ More

    Submitted 31 August, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

    Comments: Camera ready version for ICML 2022

  38. arXiv:2201.11109  [pdf, other

    cs.AI cs.LG

    Using a Novel COVID-19 Calculator to Measure Positive U.S. Socio-Economic Impact of a COVID-19 Pre-Screening Solution (AI/ML)

    Authors: Richard Swartzbaugh, Amil Khanzada, Praveen Govindan, Mert Pilanci, Ayomide Owoyemi, Les Atlas, Hugo Estrada, Richard Nall, Michael Lotito, Rich Falcone, Jennifer Ranjani J

    Abstract: The COVID-19 pandemic has been a scourge upon humanity, claiming the lives of more than 5.1 million people worldwide; the global economy contracted by 3.5% in 2020. This paper presents a COVID-19 calculator, synthesizing existing published calculators and data points, to measure the positive U.S. socio-economic impact of a COVID-19 AI/ML pre-screening solution (algorithm & application).

    Submitted 4 April, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

  39. arXiv:2201.08522  [pdf, other

    cs.IT cs.CR eess.SP math.NA

    Orthonormal Sketches for Secure Coded Regression

    Authors: Neophytos Charalambides, Hessam Mahdavifar, Mert Pilanci, Alfred O. Hero III

    Abstract: In this work, we propose a method for speeding up linear regression distributively, while ensuring security. We leverage randomized sketching techniques, and improve straggler resilience in asynchronous systems. Specifically, we apply a random orthonormal matrix and then subsample in \textit{blocks}, to simultaneously secure the information and reduce the dimension of the regression problem. In ou… ▽ More

    Submitted 22 February, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: 3 figures, 5 pages excluding appendices

    MSC Class: 65F10; 65F45; 68W15; 68W20; 68W25; 68P27; 68P30; ACM Class: E.3; E.4; G.1.2; G.1.3

  40. arXiv:2201.01669  [pdf, other

    eess.AS cs.LG cs.SD

    Using Deep Learning with Large Aggregated Datasets for COVID-19 Classification from Cough

    Authors: Esin Darici Haritaoglu, Nicholas Rasmussen, Daniel C. H. Tan, Jennifer Ranjani J., Jaclyn Xiao, Gunvant Chaudhari, Akanksha Rajput, Praveen Govindan, Christian Canham, Wei Chen, Minami Yamaura, Laura Gomezjurado, Aaron Broukhim, Amil Khanzada, Mert Pilanci

    Abstract: The Covid-19 pandemic has been one of the most devastating events in recent history, claiming the lives of more than 5 million people worldwide. Even with the worldwide distribution of vaccines, there is an apparent need for affordable, reliable, and accessible screening techniques to serve parts of the World that do not have access to Western medicine. Artificial Intelligence can provide a soluti… ▽ More

    Submitted 29 March, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

  41. arXiv:2110.09548  [pdf, other

    cs.LG cs.AI stat.ML

    Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks

    Authors: Tolga Ergen, Mert Pilanci

    Abstract: Understanding the fundamental principles behind the success of deep neural networks is one of the most important open questions in the current literature. To this end, we study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape. We consider a deep parallel ReLU network architecture, which also includes standard d… ▽ More

    Submitted 25 September, 2023; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2023

  42. arXiv:2110.06488  [pdf, other

    cs.LG math.OC

    The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program

    Authors: Yifei Wang, Mert Pilanci

    Abstract: We study non-convex subgradient flows for training two-layer ReLU neural networks from a convex geometry and duality perspective. We characterize the implicit bias of unregularized non-convex gradient flow as convex regularization of an equivalent convex model. We then show that the limit points of non-convex subgradient flows can be identified via primal-dual correspondence in this convex optimiz… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  43. arXiv:2110.06482  [pdf, other

    cs.LG math.OC

    Parallel Deep Neural Networks Have Zero Duality Gap

    Authors: Yifei Wang, Tolga Ergen, Mert Pilanci

    Abstract: Training deep neural networks is a challenging non-convex optimization problem. Recent work has proven that the strong duality holds (which means zero duality gap) for regularized finite-width two-layer ReLU networks and consequently provided an equivalent convex training problem. However, extending this result to deeper networks remains to be an open problem. In this paper, we prove that the dual… ▽ More

    Submitted 6 March, 2023; v1 submitted 13 October, 2021; originally announced October 2021.

  44. arXiv:2110.05518  [pdf, other

    cs.LG cs.AI cs.CC stat.ML

    Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs

    Authors: Tolga Ergen, Mert Pilanci

    Abstract: Understanding the fundamental mechanism behind the success of deep neural networks is one of the key challenges in the modern machine learning literature. Despite numerous attempts, a solid theoretical analysis is yet to be developed. In this paper, we develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization. We first show that the train… ▽ More

    Submitted 12 January, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted to ICML 2021

  45. arXiv:2109.03877  [pdf, other

    cs.IT cs.DC math.PR

    Computational Polarization: An Information-theoretic Method for Resilient Computing

    Authors: Mert Pilanci

    Abstract: We introduce an error resilient distributed computing method based on an extension of the channel polarization phenomenon to distributed algorithms. The method leverages an algorithmic split operation that transforms two identical compute nodes to slow and fast workers, which parallels the channel split operation in Polar Codes. This operation preserves the average runtime, analogous to the conser… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

  46. arXiv:2107.07480  [pdf, other

    math.OC cs.DS cs.LG stat.ML

    Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update

    Authors: Michał Dereziński, Jonathan Lacotte, Mert Pilanci, Michael W. Mahoney

    Abstract: In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computation… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

  47. arXiv:2107.05680  [pdf, other

    cs.LG cs.CV eess.IV math.OC stat.ML

    Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

    Authors: Arda Sahiner, Tolga Ergen, Batu Ozturkler, Burak Bartan, John Pauly, Morteza Mardani, Mert Pilanci

    Abstract: Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GD… ▽ More

    Submitted 21 March, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: Published as paper in ICLR 2022. First two authors contributed equally to this work; 34 pages, 11 figures

  48. arXiv:2105.07291  [pdf, other

    math.OC cs.LG

    Adaptive Newton Sketch: Linear-time Optimization with Quadratic Convergence and Effective Hessian Dimensionality

    Authors: Jonathan Lacotte, Yifei Wang, Mert Pilanci

    Abstract: We propose a randomized algorithm with quadratic convergence rate for convex optimization problems with a self-concordant, composite, strongly convex objective function. Our method is based on performing an approximate Newton step using a random projection of the Hessian. Our first contribution is to show that, at each iteration, the embedding dimension (or sketch size) can be as small as the effe… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

  49. arXiv:2105.01420  [pdf, ps, other

    cs.LG stat.ML

    Training Quantized Neural Networks to Global Optimality via Semidefinite Programming

    Authors: Burak Bartan, Mert Pilanci

    Abstract: Neural networks (NNs) have been extremely successful across many tasks in machine learning. Quantization of NN weights has become an important topic due to its impact on their energy efficiency, inference time and deployment on hardware. Although post-training quantization is well-studied, training optimal quantized NNs involves combinatorial non-convex optimization problems which appear intractab… ▽ More

    Submitted 5 May, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: v2: Minor edits in the text. The results are unchanged

  50. arXiv:2104.14101  [pdf, other


    Fast Convex Quadratic Optimization Solvers with Adaptive Sketching-based Preconditioners

    Authors: Jonathan Lacotte, Mert Pilanci

    Abstract: We consider least-squares problems with quadratic regularization and propose novel sketching-based iterative methods with an adaptive sketch size. The sketch size can be as small as the effective dimension of the data matrix to guarantee linear convergence. However, a major difficulty in choosing the sketch size in terms of the effective dimension lies in the fact that the latter is usually unknow… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.