Skip to main content

Showing 1–26 of 26 results for author: Sagun, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.17417  [pdf, other

    cs.LG cs.CY cs.SI

    Networked Inequality: Preferential Attachment Bias in Graph Neural Network Link Prediction

    Authors: Arjun Subramonian, Levent Sagun, Yizhou Sun

    Abstract: Graph neural network (GNN) link prediction is increasingly deployed in citation, collaboration, and online social networks to recommend academic literature, collaborators, and friends. While prior research has investigated the dyadic fairness of GNN link prediction, the within-group (e.g., queer women) fairness and "rich get richer" dynamics of link prediction remain underexplored. However, these… ▽ More

    Submitted 4 June, 2024; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Accepted to ICML 2024

  2. arXiv:2307.05775  [pdf, other

    cs.LG cs.SI

    Weisfeiler and Leman Go Measurement Modeling: Probing the Validity of the WL Test

    Authors: Arjun Subramonian, Adina Williams, Maximilian Nickel, Yizhou Sun, Levent Sagun

    Abstract: The expressive power of graph neural networks is usually measured by comparing how many pairs of graphs or nodes an architecture can possibly distinguish as non-isomorphic to those distinguishable by the $k$-dimensional Weisfeiler-Leman ($k$-WL) test. In this paper, we uncover misalignments between graph machine learning practitioners' conceptualizations of expressive power and $k$-WL through a sy… ▽ More

    Submitted 31 March, 2024; v1 submitted 11 July, 2023; originally announced July 2023.

  3. Simplicity Bias Leads to Amplified Performance Disparities

    Authors: Samuel J. Bell, Levent Sagun

    Abstract: Which parts of a dataset will a given model find difficult? Recent work has shown that SGD-trained models have a bias towards simplicity, leading them to prioritize learning a majority class, or to rely upon harmful spurious correlations. Here, we show that the preference for "easy" runs far deeper: A model may prioritize any class or group of the dataset that it finds simple-at the expense of wha… ▽ More

    Submitted 8 June, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: In 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT '23). ACM

  4. arXiv:2207.09960  [pdf, other

    stat.ML cs.CY cs.LG

    Measuring and signing fairness as performance under multiple stakeholder distributions

    Authors: David Lopez-Paz, Diane Bouchacourt, Levent Sagun, Nicolas Usunier

    Abstract: As learning machines increase their influence on decisions concerning human lives, analyzing their fairness properties becomes a subject of central importance. Yet, our best tools for measuring the fairness of learning systems are rigid fairness metrics encapsulated as mathematical one-liners, offer limited power to the stakeholders involved in the prediction task, and are easy to manipulate when… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  5. arXiv:2203.15100  [pdf, other

    cs.LG cs.CV

    Understanding out-of-distribution accuracies through quantifying difficulty of test samples

    Authors: Berfin Simsek, Melissa Hall, Levent Sagun

    Abstract: Existing works show that although modern neural networks achieve remarkable generalization performance on the in-distribution (ID) dataset, the accuracy drops significantly on the out-of-distribution (OOD) datasets \cite{recht2018cifar, recht2019imagenet}. To understand why a variety of models consistently make more mistakes in the OOD datasets, we propose a new metric to quantify the difficulty o… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: 18 pages, 15 figures

  6. arXiv:2202.08360  [pdf, other

    cs.CV cs.AI cs.CY

    Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

    Authors: Priya Goyal, Quentin Duval, Isaac Seessel, Mathilde Caron, Ishan Misra, Levent Sagun, Armand Joulin, Piotr Bojanowski

    Abstract: Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images. Applied to ImageNet, this leads to object centric features that perform on par with supervised features on most object-centric downstream tasks. In this work, we question if using this ability, we can learn any… ▽ More

    Submitted 22 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

  7. arXiv:2202.07603  [pdf, other

    cs.CV cs.AI cs.CY

    Fairness Indicators for Systematic Assessments of Visual Feature Extractors

    Authors: Priya Goyal, Adriana Romero Soriano, Caner Hazirbas, Levent Sagun, Nicolas Usunier

    Abstract: Does everyone equally benefit from computer vision systems? Answers to this question become more and more important as computer vision systems are deployed at large scale, and can spark major concerns when they exhibit vast performance discrepancies between people from various demographic and social backgrounds. Systematic diagnosis of fairness, harms, and biases of computer vision systems is an i… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  8. arXiv:2106.05795  [pdf, other

    cs.LG

    Transformed CNNs: recasting pre-trained convolutional layers with self-attention

    Authors: Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Ari Morcos

    Abstract: Vision Transformers (ViT) have recently emerged as a powerful alternative to convolutional networks (CNNs). Although hybrid models attempt to bridge the gap between these two architectures, the self-attention layers they rely on induce a strong computational bottleneck, especially at large spatial resolutions. In this work, we explore the idea of reducing the time spent training these layers by in… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  9. arXiv:2103.10697  [pdf, other

    cs.CV cs.LG stat.ML

    ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

    Authors: Stéphane d'Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, Levent Sagun

    Abstract: Convolutional architectures have proven extremely successful for vision tasks. Their hard inductive biases enable sample-efficient learning, but come at the cost of a potentially lower performance ceiling. Vision Transformers (ViTs) rely on more flexible self-attention layers, and have recently outperformed CNNs for image classification. However, they require costly pre-training on large external… ▽ More

    Submitted 10 June, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

  10. arXiv:2103.05524  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    On the interplay between data structure and loss function in classification problems

    Authors: Stéphane d'Ascoli, Marylou Gabrié, Levent Sagun, Giulio Biroli

    Abstract: One of the central puzzles in modern machine learning is the ability of heavily overparametrized models to generalize well. Although the low-dimensional structure of typical datasets is key to this behavior, most theoretical studies of overparametrization focus on isotropic inputs. In this work, we instead consider an analytically tractable model of structured data, where the input covariance is b… ▽ More

    Submitted 12 October, 2021; v1 submitted 9 March, 2021; originally announced March 2021.

  11. arXiv:2007.13483  [pdf, other

    cs.LG cs.AI

    Post-Workshop Report on Science meets Engineering in Deep Learning, NeurIPS 2019, Vancouver

    Authors: Levent Sagun, Caglar Gulcehre, Adriana Romero, Negar Rostamzadeh, Stefano Sarao Mannelli

    Abstract: Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019. As organizers of the workshop, we created the following report in an attempt to isolate emerging topics and recurring themes that have been presented throughout the event. Deep learning can still be a complex mix of art and engineering despite its tremendous success in recent years.… ▽ More

    Submitted 29 July, 2020; v1 submitted 25 June, 2020; originally announced July 2020.

    Comments: Report of NeurIPS 2019 workshop SEDL

  12. arXiv:2006.03509  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Triple descent and the two kinds of overfitting: Where & why do they appear?

    Authors: Stéphane d'Ascoli, Levent Sagun, Giulio Biroli

    Abstract: A recent line of research has highlighted the existence of a "double descent" phenomenon in deep learning, whereby increasing the number of training examples $N$ causes the generalization error of neural networks to peak when $N$ is of the same order as the number of parameters $P$. In earlier works, a similar phenomenon was shown to exist in simpler models such as linear regression, where the pea… ▽ More

    Submitted 13 October, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

  13. arXiv:1912.00018  [pdf, other

    stat.ML cs.LG math.CA

    On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

    Authors: Umut Şimşekli, Mert Gürbüzbalaban, Thanh Huy Nguyen, Gaël Richard, Levent Sagun

    Abstract: The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the \emph{classical} central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Ga… ▽ More

    Submitted 29 November, 2019; originally announced December 2019.

    Comments: 32 pages. arXiv admin note: substantial text overlap with arXiv:1901.06053

  14. arXiv:1906.06766  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech stat.ML

    Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias

    Authors: Stéphane d'Ascoli, Levent Sagun, Joan Bruna, Giulio Biroli

    Abstract: Despite the phenomenal success of deep neural networks in a broad range of learning tasks, there is a lack of theory to understand the way they work. In particular, Convolutional Neural Networks (CNNs) are known to perform much better than Fully-Connected Networks (FCNs) on spatially structured data: the architectural structure of CNNs benefits from prior knowledge on the features of the data, for… ▽ More

    Submitted 4 February, 2020; v1 submitted 16 June, 2019; originally announced June 2019.

    Comments: Update for the camera ready version - NeurIPS 2019

  15. arXiv:1901.06053  [pdf, other

    cs.LG stat.ML

    A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks

    Authors: Umut Simsekli, Levent Sagun, Mert Gurbuzbalaban

    Abstract: The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussiani… ▽ More

    Submitted 17 January, 2019; originally announced January 2019.

  16. arXiv:1901.01608  [pdf, other

    cond-mat.dis-nn cs.LG

    Scaling description of generalization with number of parameters in deep learning

    Authors: Mario Geiger, Arthur Jacot, Stefano Spigler, Franck Gabriel, Levent Sagun, Stéphane d'Ascoli, Giulio Biroli, Clément Hongler, Matthieu Wyart

    Abstract: Supervised deep learning involves the training of neural networks with a large number $N$ of parameters. For large enough $N$, in the so-called over-parametrized regime, one can essentially fit the training data points. Sparsity-based arguments would suggest that the generalization error increases as $N$ grows past a certain threshold $N^{*}$. Instead, empirical studies have shown that in the over… ▽ More

    Submitted 8 October, 2019; v1 submitted 6 January, 2019; originally announced January 2019.

    Comments: The clarity of the text has been improved: the section "Related works" has been updated and the section "3.1 Regression task" has been added

  17. arXiv:1810.09665  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    A jamming transition from under- to over-parametrization affects loss landscape and generalization

    Authors: Stefano Spigler, Mario Geiger, Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart

    Abstract: We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved. Under some general conditions, we show that this transition is sharp for the hinge loss. In the whole over-parametrized regime, poor minima of the loss are not encountered during training since the number of constraints to satisfy is too small to h… ▽ More

    Submitted 18 June, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

    Comments: arXiv admin note: text overlap with arXiv:1809.09349

  18. arXiv:1809.09349  [pdf, other

    cond-mat.dis-nn cs.LG

    The jamming transition as a paradigm to understand the loss landscape of deep neural networks

    Authors: Mario Geiger, Stefano Spigler, Stéphane d'Ascoli, Levent Sagun, Marco Baity-Jesi, Giulio Biroli, Matthieu Wyart

    Abstract: Deep learning has been immensely successful at a variety of tasks, ranging from classification to AI. Learning corresponds to fitting training data, which is implemented by descending a very high-dimensional loss function. Understanding under which conditions neural networks do not get stuck in poor minima of the loss, and how the landscape of that loss evolves as depth is increased remains a chal… ▽ More

    Submitted 17 June, 2019; v1 submitted 25 September, 2018; originally announced September 2018.

    Journal ref: Phys. Rev. E 100, 012115 (2019)

  19. arXiv:1803.06969  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Comparing Dynamics: Deep Neural Networks versus Glassy Systems

    Authors: M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli

    Abstract: We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur… ▽ More

    Submitted 7 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: 10 pages, 5 figures. Version accepted at ICML 2018

    Journal ref: PMLR 80:324-333, 2018; Republication with DOI (cite this one): J. Stat. Mech. (2019) 124013

  20. arXiv:1706.04454  [pdf, other

    cs.LG

    Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

    Authors: Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou

    Abstract: We study the properties of common loss surfaces through their Hessian matrix. In particular, in the context of deep learning, we empirically show that the spectrum of the Hessian is composed of two parts: (1) the bulk centered near zero, (2) and outliers away from the bulk. We present numerical evidence and mathematical justifications to the following conjectures laid out by Sagun et al. (2016): F… ▽ More

    Submitted 7 May, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: Minor update for ICLR 2018 Workshop Track presentation

  21. arXiv:1704.05179  [pdf, other

    cs.CL

    SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

    Authors: Matthew Dunn, Levent Sagun, Mike Higgins, V. Ugur Guney, Volkan Cirik, Kyunghyun Cho

    Abstract: We publicly release a new large-scale dataset, called SearchQA, for machine comprehension, or question-answering. Unlike recently released datasets, such as DeepMind CNN/DailyMail and SQuAD, the proposed SearchQA was constructed to reflect a full pipeline of general question-answering. That is, we start not from an existing article and generate a question-answer pair, but start from an existing qu… ▽ More

    Submitted 11 June, 2017; v1 submitted 17 April, 2017; originally announced April 2017.

  22. arXiv:1703.07915  [pdf, other

    stat.ML cond-mat.dis-nn cs.CV cs.LG hep-th

    Perspective: Energy Landscapes for Machine Learning

    Authors: Andrew J. Ballard, Ritankar Das, Stefano Martiniani, Dhagash Mehta, Levent Sagun, Jacob D. Stevenson, David J. Wales

    Abstract: Machine learning techniques are being increasingly used as flexible non-linear fitting and prediction tools in the physical sciences. Fitting functions that exhibit multiple solutions as local minima can be analysed in terms of the corresponding machine learning landscape. Methods to explore and visualise molecular potential energy landscapes can be applied to these machine learning landscapes to… ▽ More

    Submitted 22 March, 2017; originally announced March 2017.

    Comments: 41 pages, 25 figures. Accepted for publication in Physical Chemistry Chemical Physics, 2017

  23. arXiv:1611.07476  [pdf, other

    cs.LG

    Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

    Authors: Levent Sagun, Leon Bottou, Yann LeCun

    Abstract: We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data.

    Submitted 5 October, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

    Comments: ICLR submission, 2016 - updated to match the openreview.net version

  24. arXiv:1611.01838  [pdf, other

    cs.LG stat.ML

    Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

    Authors: Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina

    Abstract: This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based object… ▽ More

    Submitted 21 April, 2017; v1 submitted 6 November, 2016; originally announced November 2016.

    Comments: ICLR '17

  25. arXiv:1511.06444  [pdf, other

    cs.LG math.NA math.PR

    Universal halting times in optimization and machine learning

    Authors: Levent Sagun, Thomas Trogdon, Yann LeCun

    Abstract: The authors present empirical distributions for the halting time (measured by the number of iterations to reach a given accuracy) of optimization algorithms applied to two random systems: spin glasses and deep learning. Given an algorithm, which we take to be both the optimization routine and the form of the random landscape, the fluctuations of the halting time follow a distribution that, after c… ▽ More

    Submitted 20 February, 2017; v1 submitted 19 November, 2015; originally announced November 2015.

    MSC Class: 65K10; 82D30; 37E20

    Journal ref: Quart. Appl. Math. 76 (2018), 289-301

  26. arXiv:1412.6615  [pdf, other

    stat.ML cs.LG

    Explorations on high dimensional landscapes

    Authors: Levent Sagun, V. Ugur Guney, Gerard Ben Arous, Yann LeCun

    Abstract: Finding minima of a real valued non-convex function over a high dimensional space is a major challenge in science. We provide evidence that some such functions that are defined on high dimensional domains have a narrow band of values whose pre-image contains the bulk of its critical points. This is in contrast with the low dimensional picture in which this band is wide. Our simulations agree with… ▽ More

    Submitted 6 April, 2015; v1 submitted 20 December, 2014; originally announced December 2014.

    Comments: 11 pages, 8 figures, workshop contribution at ICLR 2015