Search | arXiv e-print repository

A Margin-based Multiclass Generalization Bound via Geometric Complexity

Authors: Michael Munn, Benoit Dherin, Javier Gonzalvo

Abstract: There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper, we investigate margin-based multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric comple… ▽ More There has been considerable effort to better understand the generalization capabilities of deep neural networks both as a means to unlock a theoretical understanding of their success as well as providing directions for further improvements. In this paper, we investigate margin-based multiclass generalization bounds for neural networks which rely on a recent complexity measure, the geometric complexity, developed for neural networks. We derive a new upper bound on the generalization error which scales with the margin-normalized geometric complexity of the network and which holds for a broad family of data distributions and model classes. Our generalization bound is empirically investigated for a ResNet-18 model trained with SGD on the CIFAR-10 and CIFAR-100 datasets with both original and random labels. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted as an ICML 2023 workshop paper (Topology, Algebra and Geometry in Machine Learning)

Journal ref: Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML), PMLR 221:189-205, 2023

arXiv:2405.15706 [pdf, other]

The Impact of Geometric Complexity on Neural Collapse in Transfer Learning

Authors: Michael Munn, Benoit Dherin, Javier Gonzalvo

Abstract: Many of the recent remarkable advances in computer vision and language models can be attributed to the success of transfer learning via the pre-training of large foundation models. However, a theoretical framework which explains this empirical success is incomplete and remains an active area of research. Flatness of the loss surface and neural collapse have recently emerged as useful pre-training… ▽ More Many of the recent remarkable advances in computer vision and language models can be attributed to the success of transfer learning via the pre-training of large foundation models. However, a theoretical framework which explains this empirical success is incomplete and remains an active area of research. Flatness of the loss surface and neural collapse have recently emerged as useful pre-training metrics which shed light on the implicit biases underlying pre-training. In this paper, we explore the geometric complexity of a model's learned representations as a fundamental mechanism that relates these two concepts. We show through experiments and theory that mechanisms which affect the geometric complexity of the pre-trained network also influence the neural collapse. Furthermore, we show how this effect of the geometric complexity generalizes to the neural collapse of new classes as well, thus encouraging better performance on downstream tasks, particularly in the few-shot setting. △ Less

Submitted 28 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

arXiv:1906.01550 [pdf, other]

Towards Task and Architecture-Independent Generalization Gap Predictors

Authors: Scott Yak, Javier Gonzalvo, Hanna Mazzawi

Abstract: Can we use deep learning to predict when deep learning works? Our results suggest the affirmative. We created a dataset by training 13,500 neural networks with different architectures, on different variations of spiral datasets, and using different optimization parameters. We used this dataset to train task-independent and architecture-independent generalization gap predictors for those neural net… ▽ More Can we use deep learning to predict when deep learning works? Our results suggest the affirmative. We created a dataset by training 13,500 neural networks with different architectures, on different variations of spiral datasets, and using different optimization parameters. We used this dataset to train task-independent and architecture-independent generalization gap predictors for those neural networks. We extend Jiang et al. (2018) to also use DNNs and RNNs and show that they outperform the linear model, obtaining $R^2=0.965$. We also show results for architecture-independent, task-independent, and out-of-distribution generalization gap prediction tasks. Both DNNs and RNNs consistently and significantly outperform linear models, with RNNs obtaining $R^2=0.584$. △ Less

Submitted 4 June, 2019; originally announced June 2019.

Comments: 8 pages, 6 figures, 2 tables. To be presented at ICML 2019 "Understanding and Improving Generalization in Deep Learning" Workshop (poster)

arXiv:1905.00080 [pdf, other]

AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles

Authors: Charles Weill, Javier Gonzalvo, Vitaly Kuznetsov, Scott Yang, Scott Yak, Hanna Mazzawi, Eugen Hotaj, Ghassen Jerfel, Vladimir Macko, Ben Adlam, Mehryar Mohri, Corinna Cortes

Abstract: AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, (2) offer sensible de… ▽ More AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention. Our framework is inspired by the AdaNet algorithm (Cortes et al., 2017) which learns the structure of a neural network as an ensemble of subnetworks. We designed it to: (1) integrate with the existing TensorFlow ecosystem, (2) offer sensible default search spaces to perform well on novel datasets, (3) present a flexible API to utilize expert information when available, and (4) efficiently accelerate training with distributed CPU, GPU, and TPU hardware. The code is open-source and available at: https://github.com/tensorflow/adanet. △ Less

Submitted 30 April, 2019; originally announced May 2019.

arXiv:1903.06236 [pdf, other]

Improving Neural Architecture Search Image Classifiers via Ensemble Learning

Authors: Vladimir Macko, Charles Weill, Hanna Mazzawi, Javier Gonzalvo

Abstract: Finding the best neural network architecture requires significant time, resources, and human expertise. These challenges are partially addressed by neural architecture search (NAS) which is able to find the best convolutional layer or cell that is then used as a building block for the network. However, once a good building block is found, manual design is still required to assemble the final archi… ▽ More Finding the best neural network architecture requires significant time, resources, and human expertise. These challenges are partially addressed by neural architecture search (NAS) which is able to find the best convolutional layer or cell that is then used as a building block for the network. However, once a good building block is found, manual design is still required to assemble the final architecture as a combination of multiple blocks under a predefined parameter budget constraint. A common solution is to stack these blocks into a single tower and adjust the width and depth to fill the parameter budget. However, these single tower architectures may not be optimal. Instead, in this paper we present the AdaNAS algorithm, that uses ensemble techniques to compose a neural network as an ensemble of smaller networks automatically. Additionally, we introduce a novel technique based on knowledge distillation to iteratively train the smaller networks using the previous ensemble as a teacher. Our experiments demonstrate that ensembles of networks improve accuracy upon a single neural network while kee** the same number of parameters. Our models achieve comparable results with the state-of-the-art on CIFAR-10 and sets a new state-of-the-art on CIFAR-100. △ Less

Submitted 14 March, 2019; originally announced March 2019.

arXiv:1812.06018 [pdf, other]

doi 10.23731/CYRM-2018-002

The Compact Linear Collider (CLIC) - 2018 Summary Report

Authors: The CLIC, CLICdp collaborations, :, T. K. Charles, P. J. Giansiracusa, T. G. Lucas, R. P. Rassool, M. Volpi, C. Balazs, K. Afanaciev, V. Makarenko, A. Patapenka, I. Zhuk, C. Collette, M. J. Boland, A. C. Abusleme Hoffman, M. A. Diaz, F. Garay, Y. Chi, X. He, G. Pei, S. Pei, G. Shu, X. Wang, J. Zhang , et al. (671 additional authors not shown)

Abstract: The Compact Linear Collider (CLIC) is a TeV-scale high-luminosity linear $e^+e^-$ collider under development at CERN. Following the CLIC conceptual design published in 2012, this report provides an overview of the CLIC project, its current status, and future developments. It presents the CLIC physics potential and reports on design, technology, and implementation aspects of the accelerator and the… ▽ More The Compact Linear Collider (CLIC) is a TeV-scale high-luminosity linear $e^+e^-$ collider under development at CERN. Following the CLIC conceptual design published in 2012, this report provides an overview of the CLIC project, its current status, and future developments. It presents the CLIC physics potential and reports on design, technology, and implementation aspects of the accelerator and the detector. CLIC is foreseen to be built and operated in stages, at centre-of-mass energies of 380 GeV, 1.5 TeV and 3 TeV, respectively. CLIC uses a two-beam acceleration scheme, in which 12 GHz accelerating structures are powered via a high-current drive beam. For the first stage, an alternative with X-band klystron powering is also considered. CLIC accelerator optimisation, technical developments and system tests have resulted in an increased energy efficiency (power around 170 MW) for the 380 GeV stage, together with a reduced cost estimate at the level of 6 billion CHF. The detector concept has been refined using improved software tools. Significant progress has been made on detector technology developments for the tracking and calorimetry systems. A wide range of CLIC physics studies has been conducted, both through full detector simulations and parametric studies, together providing a broad overview of the CLIC physics potential. Each of the three energy stages adds cornerstones of the full CLIC physics programme, such as Higgs width and couplings, top-quark properties, Higgs self-coupling, direct searches, and many precision electroweak measurements. The interpretation of the combined results gives crucial and accurate insight into new physics, largely complementary to LHC and HL-LHC. The construction of the first CLIC energy stage could start by 2026. First beams would be available by 2035, marking the beginning of a broad CLIC physics programme spanning 25-30 years. △ Less

Submitted 6 May, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

Comments: 112 pages, 59 figures; published as CERN Yellow Report Monograph Vol. 2/2018; corresponding editors: Philip N. Burrows, Nuria Catalan Lasheras, Lucie Linssen, Marko Petrič, Aidan Robson, Daniel Schulte, Eva Sicking, Steinar Stapnes

Report number: CERN-2018-005-M

Showing 1–6 of 6 results for author: Gonzalvo, J