Skip to main content

Showing 1–16 of 16 results for author: Serra, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.03451  [pdf, other

    math.OC cs.LG

    Optimization Over Trained Neural Networks: Taking a Relaxing Walk

    Authors: Jiatai Tong, Junyang Cai, Thiago Serra

    Abstract: Besides training, mathematical optimization is also used in deep learning to model and solve formulations over trained neural networks for purposes such as verification, compression, and optimization with learned constraints. However, solving these formulations soon becomes difficult as the network size grows due to the weak linear relaxation and dense constraint matrix. We have seen improvements… ▽ More

    Submitted 28 January, 2024; v1 submitted 7 January, 2024; originally announced January 2024.

  2. arXiv:2312.16699  [pdf, other

    math.OC cs.LG

    Computational Tradeoffs of Optimization-Based Bound Tightening in ReLU Networks

    Authors: Fabian Badilla, Marcos Goycoolea, Gonzalo Muñoz, Thiago Serra

    Abstract: The use of Mixed-Integer Linear Programming (MILP) models to represent neural networks with Rectified Linear Unit (ReLU) activations has become increasingly widespread in the last decade. This has enabled the use of MILP technology to test-or stress-their behavior, to adversarially improve their training, and to embed them in optimization models leveraging their predictive power. Many of these MIL… ▽ More

    Submitted 30 January, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: 8 pages, 4 figures

  3. arXiv:2305.00241  [pdf, other

    math.OC cs.LG

    When Deep Learning Meets Polyhedral Theory: A Survey

    Authors: Joey Huchette, Gonzalo Muñoz, Thiago Serra, Calvin Tsay

    Abstract: In the past decade, deep learning became the prevalent methodology for predictive modeling thanks to the remarkable accuracy of deep neural networks in tasks such as computer vision and natural language processing. Meanwhile, the structure of neural networks converged back to simpler representations based on piecewise constant and piecewise linear functions such as the Rectified Linear Unit (ReLU)… ▽ More

    Submitted 31 August, 2023; v1 submitted 29 April, 2023; originally announced May 2023.

  4. arXiv:2301.07966  [pdf, ps, other

    cs.LG math.OC

    Getting Away with More Network Pruning: From Sparsity to Geometry and Linear Regions

    Authors: Junyang Cai, Khai-Nguyen Nguyen, Nishant Shrestha, Aidan Good, Ruisen Tu, Xin Yu, Shandian Zhe, Thiago Serra

    Abstract: One surprising trait of neural networks is the extent to which their connections can be pruned with little to no effect on accuracy. But when we cross a critical level of parameter sparsity, pruning any further leads to a sudden drop in accuracy. This drop plausibly reflects a loss in model complexity, which we aim to avoid. In this work, we explore how sparsity also affects the geometry of the li… ▽ More

    Submitted 19 January, 2023; originally announced January 2023.

    Comments: (Under review)

  5. arXiv:2206.02976  [pdf, other

    cs.LG

    Recall Distortion in Neural Network Pruning and the Undecayed Pruning Algorithm

    Authors: Aidan Good, Jiaqi Lin, Hannah Sieg, Mikey Ferguson, Xin Yu, Shandian Zhe, Jerzy Wieczorek, Thiago Serra

    Abstract: Pruning techniques have been successfully used in neural networks to trade accuracy for sparsity. However, the impact of network pruning is not uniform: prior work has shown that the recall for underrepresented classes in a dataset may be more negatively affected. In this work, we study such relative distortions in recall by hypothesizing an intensification effect that is inherent to the model. Na… ▽ More

    Submitted 12 November, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  6. arXiv:2205.14500  [pdf, other

    cs.LG

    Optimal Decision Diagrams for Classification

    Authors: Alexandre M. Florio, Pedro Martins, Maximilian Schiffer, Thiago Serra, Thibaut Vidal

    Abstract: Decision diagrams for classification have some notable advantages over decision trees, as their internal connections can be determined at training time and their width is not bound to grow exponentially with their depth. Accordingly, decision diagrams are usually less prone to data fragmentation in internal nodes. However, the inherent complexity of training these classifiers acted as a long-stand… ▽ More

    Submitted 28 May, 2022; originally announced May 2022.

    MSC Class: 68T99 ACM Class: I.2.6

  7. arXiv:2203.04466  [pdf, other

    cs.LG cs.CV

    The Combinatorial Brain Surgeon: Pruning Weights That Cancel One Another in Neural Networks

    Authors: Xin Yu, Thiago Serra, Srikumar Ramalingam, Shandian Zhe

    Abstract: Neural networks tend to achieve better accuracy with training if they are larger -- even if the resulting models are overparameterized. Nevertheless, carefully removing such excess parameters before, during, or after training may also produce models with similar or even improved accuracy. In many cases, that can be curiously achieved by heuristics as simple as removing a percentage of the weights… ▽ More

    Submitted 19 June, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

  8. arXiv:2201.12795  [pdf, other

    cs.LG math.OC

    Training Thinner and Deeper Neural Networks: Jumpstart Regularization

    Authors: Carles Riera, Camilo Rey, Thiago Serra, Eloi Puertas, Oriol Pujol

    Abstract: Neural networks are more expressive when they have multiple layers. In turn, conventional training methods are only successful if the depth does not lead to numerical issues such as exploding or vanishing gradients, which occur less frequently when the layers are sufficiently wide. However, increasing width to attain greater depth entails the use of heavier computational resources and leads to ove… ▽ More

    Submitted 5 June, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: CPAIOR 2022 (to appear)

  9. arXiv:2102.07804  [pdf, other

    cs.LG math.OC

    Scaling Up Exact Neural Network Compression by ReLU Stability

    Authors: Thiago Serra, Xin Yu, Abhinav Kumar, Srikumar Ramalingam

    Abstract: We can compress a rectifier network while exactly preserving its underlying functionality with respect to a given input domain if some of its neurons are stable. However, current approaches to determine the stability of neurons with Rectified Linear Unit (ReLU) activations require solving or finding a good approximation to multiple discrete optimization problems. In this work, we introduce an algo… ▽ More

    Submitted 28 October, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021

  10. arXiv:2001.00218  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Lossless Compression of Deep Neural Networks

    Authors: Thiago Serra, Abhinav Kumar, Srikumar Ramalingam

    Abstract: Deep neural networks have been successful in many predictive modeling tasks, such as image and language recognition, where large neural networks are often used to obtain good accuracy. Consequently, it is challenging to deploy these networks under limited computational resources, such as in mobile devices. In this work, we introduce an algorithm that removes units and layers of a neural network wh… ▽ More

    Submitted 22 February, 2020; v1 submitted 1 January, 2020; originally announced January 2020.

    Comments: CPAIOR 2020 (to appear)

  11. arXiv:1910.02179  [pdf, other

    cs.DS math.OC

    Template-based Minor Embedding for Adiabatic Quantum Optimization

    Authors: Thiago Serra, Teng Huang, Arvind Raghunathan, David Bergman

    Abstract: Quantum Annealing (QA) can be used to quickly obtain near-optimal solutions for Quadratic Unconstrained Binary Optimization (QUBO) problems. In QA hardware, each decision variable of a QUBO should be mapped to one or more adjacent qubits in such a way that pairs of variables defining a quadratic term in the objective function are mapped to some pair of adjacent qubits. However, qubits have limited… ▽ More

    Submitted 19 January, 2021; v1 submitted 4 October, 2019; originally announced October 2019.

    Comments: INFORMS Journal on Computing (to appear)

  12. arXiv:1905.11428  [pdf, other

    cs.LG cs.CV stat.ML

    Equivalent and Approximate Transformations of Deep Neural Networks

    Authors: Abhinav Kumar, Thiago Serra, Srikumar Ramalingam

    Abstract: Two networks are equivalent if they produce the same output for any given input. In this paper, we study the possibility of transforming a deep neural network to another network with a different number of units or layers, which can be either equivalent, a local exact approximation, or a global linear approximation of the original network. On the practical side, we show that certain rectified linea… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

  13. arXiv:1810.03370  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Empirical Bounds on Linear Regions of Deep Rectifier Networks

    Authors: Thiago Serra, Srikumar Ramalingam

    Abstract: We can compare the expressiveness of neural networks that use rectified linear units (ReLUs) by the number of linear regions, which reflect the number of pieces of the piecewise linear functions modeled by such networks. However, enumerating these regions is prohibitive and the known analytical bounds are identical for networks with same dimensions. In this work, we approximate the number of linea… ▽ More

    Submitted 14 December, 2019; v1 submitted 8 October, 2018; originally announced October 2018.

    Comments: AAAI 2020

  14. arXiv:1809.05794  [pdf, other

    math.OC cs.MS

    When Lift-and-Project Cuts are Different

    Authors: Egon Balas, Thiago Serra

    Abstract: In this paper, we present a method to determine if a lift-and-project cut for a mixed-integer linear program is irregular, in which case the cut is not equivalent to any intersection cut from the bases of the linear relaxation. This is an important question due to the intense research activity for the past decade on cuts from multiple rows of simplex tableau as well as on lift-and-project cuts fro… ▽ More

    Submitted 24 January, 2020; v1 submitted 15 September, 2018; originally announced September 2018.

    Comments: INFORMS Journal on Computing (to appear)

  15. arXiv:1806.06365  [pdf, ps, other

    math.OC cs.LG stat.ML

    How Could Polyhedral Theory Harness Deep Learning?

    Authors: Thiago Serra, Christian Tjandraatmadja, Srikumar Ramalingam

    Abstract: The holy grail of deep learning is to come up with an automatic method to design optimal architectures for different applications. In other words, how can we effectively dimension and organize neurons along the network layers based on the computational resources, input size, and amount of training data? We outline promising research directions based on polyhedral theory and mixed-integer represent… ▽ More

    Submitted 17 June, 2018; originally announced June 2018.

    Journal ref: Scientific Machine Learning Workshop, U.S. Department of Energy Office of Advanced Scientific Computing Research, January 30 -- February 1, 2018

  16. arXiv:1711.02114  [pdf, other

    cs.LG cs.AI cs.NE math.OC stat.ML

    Bounding and Counting Linear Regions of Deep Neural Networks

    Authors: Thiago Serra, Christian Tjandraatmadja, Srikumar Ramalingam

    Abstract: We investigate the complexity of deep neural networks (DNN) that represent piecewise linear (PWL) functions. In particular, we study the number of linear regions, i.e. pieces, that a PWL function represented by a DNN can attain, both theoretically and empirically. We present (i) tighter upper and lower bounds for the maximum number of linear regions on rectifier networks, which are exact for input… ▽ More

    Submitted 15 September, 2018; v1 submitted 6 November, 2017; originally announced November 2017.

    Comments: ICML 2018