Skip to main content

Showing 1–19 of 19 results for author: Ergen, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06357  [pdf, other

    cs.CL cs.AI

    MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows

    Authors: Xingjian Zhang, Yutong Xie, ** Huang, **ge Ma, Zhaoying Pan, Qijia Liu, Ziyang Xiong, Tolga Ergen, Dongsub Shim, Honglak Lee, Qiaozhu Mei

    Abstract: Scientific innovation relies on detailed workflows, which include critical steps such as analyzing literature, generating ideas, validating these ideas, interpreting results, and inspiring follow-up research. However, scientific publications that document these workflows are extensive and unstructured. This makes it difficult for both human researchers and AI systems to effectively navigate and ex… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: text overlap with arXiv:1706.03762 by other authors

  2. arXiv:2403.01046  [pdf, other

    cs.LG cs.AI cs.NE math.OC stat.ML

    A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features

    Authors: Emi Zeger, Yifei Wang, Aaron Mishkin, Tolga Ergen, Emmanuel Candès, Mert Pilanci

    Abstract: We prove that training neural networks on 1-D data is equivalent to solving a convex Lasso problem with a fixed, explicitly defined dictionary matrix of features. The specific dictionary depends on the activation and depth. We consider 2 and 3-layer networks with piecewise linear activations, and rectangular and tree networks with sign activation and arbitrary depth. Interestingly in absolute valu… ▽ More

    Submitted 29 June, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  3. arXiv:2312.12657  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Convex Landscape of Neural Networks: Characterizing Global Optima and Stationary Points via Lasso Models

    Authors: Tolga Ergen, Mert Pilanci

    Abstract: Due to the non-convex nature of training Deep Neural Network (DNN) models, their effectiveness relies on the use of non-convex optimization heuristics. Traditional methods for training DNNs often require costly empirical methods to produce successful models and do not have a clear theoretical foundation. In this study, we examine the use of convex optimization theory and sparse recovery models to… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: A preliminary version of part of this work was published at ICML 2020 with the title "Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-layer Networks"

  4. arXiv:2309.15096  [pdf, other

    cs.LG stat.ML

    Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs

    Authors: Rajat Vadiraj Dwaraknath, Tolga Ergen, Mert Pilanci

    Abstract: Recently, theoretical analyses of deep neural networks have broadly focused on two directions: 1) Providing insight into neural network training by SGD in the limit of infinite hidden-layer width and infinitesimally small learning rate (also known as gradient flow) via the Neural Tangent Kernel (NTK), and 2) Globally optimizing the regularized training objective via cone-constrained convex reformu… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to Neurips 2023

  5. arXiv:2303.03382  [pdf, other

    cs.LG stat.ML

    Globally Optimal Training of Neural Networks with Threshold Activation Functions

    Authors: Tolga Ergen, Halil Ibrahim Gulluk, Jonathan Lacotte, Mert Pilanci

    Abstract: Threshold activation functions are highly preferable in neural networks due to their efficiency in hardware implementations. Moreover, their mode of operation is more interpretable and resembles that of biological neurons. However, traditional gradient based algorithms such as Gradient Descent cannot be used to train the parameters of neural networks with threshold activations since the activation… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted to ICLR 2023

  6. arXiv:2211.11052  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Convexifying Transformers: Improving optimization and understanding of transformer networks

    Authors: Tolga Ergen, Behnam Neyshabur, Harsh Mehta

    Abstract: Understanding the fundamental mechanism behind the success of transformer networks is still an open problem in the deep learning literature. Although their remarkable performance has been mostly attributed to the self-attention mechanism, the literature still lacks a solid analysis of these networks and interpretation of the functions learned by them. To this end, we study the training problem of… ▽ More

    Submitted 20 November, 2022; originally announced November 2022.

  7. arXiv:2207.08393  [pdf, other

    eess.IV cs.CV

    GLEAM: Greedy Learning for Large-Scale Accelerated MRI Reconstruction

    Authors: Batu Ozturkler, Arda Sahiner, Tolga Ergen, Arjun D Desai, Christopher M Sandino, Shreyas Vasanawala, John M Pauly, Morteza Mardani, Mert Pilanci

    Abstract: Unrolled neural networks have recently achieved state-of-the-art accelerated MRI reconstruction. These networks unroll iterative optimization algorithms by alternating between physics-based consistency and neural-network based regularization. However, they require several iterations of a large neural network to handle high-dimensional imaging tasks such as 3D MRI. This limits traditional training… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  8. arXiv:2205.08078  [pdf, other

    cs.LG cs.CV math.OC

    Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

    Authors: Arda Sahiner, Tolga Ergen, Batu Ozturkler, John Pauly, Morteza Mardani, Mert Pilanci

    Abstract: Vision transformers using self-attention or its proposed alternatives have demonstrated promising results in many image related tasks. However, the underpinning inductive bias of attention is not well understood. To address this issue, this paper analyzes attention through the lens of convex duality. For the non-linear dot-product self-attention, and alternative mechanisms such as MLP-mixer and Fo… ▽ More

    Submitted 20 May, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: 38 pages, 2 figures. To appear in ICML 2022

  9. arXiv:2110.09548  [pdf, other

    cs.LG cs.AI stat.ML

    Path Regularization: A Convexity and Sparsity Inducing Regularization for Parallel ReLU Networks

    Authors: Tolga Ergen, Mert Pilanci

    Abstract: Understanding the fundamental principles behind the success of deep neural networks is one of the most important open questions in the current literature. To this end, we study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape. We consider a deep parallel ReLU network architecture, which also includes standard d… ▽ More

    Submitted 25 September, 2023; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: Accepted to NeurIPS 2023

  10. arXiv:2110.06482  [pdf, other

    cs.LG math.OC

    Parallel Deep Neural Networks Have Zero Duality Gap

    Authors: Yifei Wang, Tolga Ergen, Mert Pilanci

    Abstract: Training deep neural networks is a challenging non-convex optimization problem. Recent work has proven that the strong duality holds (which means zero duality gap) for regularized finite-width two-layer ReLU networks and consequently provided an equivalent convex training problem. However, extending this result to deeper networks remains to be an open problem. In this paper, we prove that the dual… ▽ More

    Submitted 6 March, 2023; v1 submitted 13 October, 2021; originally announced October 2021.

  11. arXiv:2110.05518  [pdf, other

    cs.LG cs.AI cs.CC stat.ML

    Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs

    Authors: Tolga Ergen, Mert Pilanci

    Abstract: Understanding the fundamental mechanism behind the success of deep neural networks is one of the key challenges in the modern machine learning literature. Despite numerous attempts, a solid theoretical analysis is yet to be developed. In this paper, we develop a novel unified framework to reveal a hidden regularization mechanism through the lens of convex optimization. We first show that the train… ▽ More

    Submitted 12 January, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Comments: Accepted to ICML 2021

  12. arXiv:2107.05680  [pdf, other

    cs.LG cs.CV eess.IV math.OC stat.ML

    Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

    Authors: Arda Sahiner, Tolga Ergen, Batu Ozturkler, Burak Bartan, John Pauly, Morteza Mardani, Mert Pilanci

    Abstract: Generative Adversarial Networks (GANs) are commonly used for modeling complex distributions of data. Both the generators and discriminators of GANs are often modeled by neural networks, posing a non-transparent optimization problem which is non-convex and non-concave over the generator and discriminator, respectively. Such networks are often heuristically optimized with gradient descent-ascent (GD… ▽ More

    Submitted 21 March, 2022; v1 submitted 12 July, 2021; originally announced July 2021.

    Comments: Published as paper in ICLR 2022. First two authors contributed equally to this work; 34 pages, 11 figures

  13. arXiv:2103.01499  [pdf, other

    cs.LG math.OC stat.ML

    Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization

    Authors: Tolga Ergen, Arda Sahiner, Batu Ozturkler, John Pauly, Morteza Mardani, Mert Pilanci

    Abstract: Batch Normalization (BN) is a commonly used technique to accelerate and stabilize training of deep neural networks. Despite its empirical success, a full theoretical understanding of BN is yet to be developed. In this work, we analyze BN through the lens of convex optimization. We introduce an analytic framework based on convex duality to obtain exact convex representations of weight-decay regular… ▽ More

    Submitted 21 March, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: Accepted to ICLR 2022. First two authors contributed equally to this work; 36 pages, 13 figures

  14. arXiv:2012.13329  [pdf, other

    cs.LG cs.AI cs.CC stat.ML

    Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms

    Authors: Arda Sahiner, Tolga Ergen, John Pauly, Mert Pilanci

    Abstract: We describe the convex semi-infinite dual of the two-layer vector-output ReLU neural network training problem. This semi-infinite dual admits a finite dimensional representation, but its support is over a convex set which is difficult to characterize. In particular, we demonstrate that the non-convex neural network training problem is equivalent to a finite-dimensional convex copositive program. O… ▽ More

    Submitted 20 December, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

    Comments: 25 pages, 6 figures

  15. arXiv:2006.14798  [pdf, other

    cs.LG cs.CC stat.ML

    Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time

    Authors: Tolga Ergen, Mert Pilanci

    Abstract: We study training of Convolutional Neural Networks (CNNs) with ReLU activations and introduce exact convex optimization formulations with a polynomial complexity with respect to the number of data samples, the number of neurons, and data dimension. More specifically, we develop a convex analytic framework utilizing semi-infinite duality to obtain equivalent convex optimization problems for several… ▽ More

    Submitted 18 March, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

    Comments: Accepted for Spotlight Presentation at ICLR 2021

    Journal ref: International Conference on Learning Representations (ICLR), 2021

  16. arXiv:2002.11219  [pdf, ps, other

    cs.LG stat.ML

    Convex Geometry and Duality of Over-parameterized Neural Networks

    Authors: Tolga Ergen, Mert Pilanci

    Abstract: We develop a convex analytic approach to analyze finite width two-layer ReLU networks. We first prove that an optimal solution to the regularized training problem can be characterized as extreme points of a convex set, where simple solutions are encouraged via its convex geometrical properties. We then leverage this characterization to show that an optimal set of parameters yield linear spline int… ▽ More

    Submitted 30 August, 2021; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: Accepted to the Journal of Machine Learning Research (JMLR)

  17. arXiv:2002.10553  [pdf, other

    cs.LG cs.CC stat.ML

    Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-layer Networks

    Authors: Mert Pilanci, Tolga Ergen

    Abstract: We develop exact representations of training two-layer neural networks with rectified linear units (ReLUs) in terms of a single convex program with number of variables polynomial in the number of training samples and the number of hidden neurons. Our theory utilizes semi-infinite duality and minimum norm regularization. We show that ReLU networks trained with standard weight decay are equivalent t… ▽ More

    Submitted 15 August, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

  18. arXiv:2002.09773  [pdf, other

    cs.LG stat.ML

    Revealing the Structure of Deep Neural Networks via Convex Duality

    Authors: Tolga Ergen, Mert Pilanci

    Abstract: We study regularized deep neural networks (DNNs) and introduce a convex analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weights for a norm regularized DNN training problem can be explicitly found as the extreme points of a convex set. For the special case of deep linear networks, we prove that each optimal weight matrix aligns with… ▽ More

    Submitted 11 June, 2021; v1 submitted 22 February, 2020; originally announced February 2020.

    Comments: Accepted to ICML 2021

  19. arXiv:1710.09207  [pdf, other

    eess.SP cs.LG stat.ML

    Unsupervised and Semi-supervised Anomaly Detection with LSTM Neural Networks

    Authors: Tolga Ergen, Ali Hassan Mirza, Suleyman Serdar Kozat

    Abstract: We investigate anomaly detection in an unsupervised framework and introduce Long Short Term Memory (LSTM) neural network based algorithms. In particular, given variable length data sequences, we first pass these sequences through our LSTM based structure and obtain fixed length sequences. We then find a decision function for our anomaly detectors based on the One Class Support Vector Machines (OC-… ▽ More

    Submitted 25 October, 2017; originally announced October 2017.