Skip to main content

Showing 1–18 of 18 results for author: Dauphin, Y N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.10809  [pdf, other

    cs.LG

    Neglected Hessian component explains mysteries in Sharpness regularization

    Authors: Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

    Abstract: Recent work has shown that methods like SAM which either explicitly or implicitly penalize second order information can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We show that these differences can be explained by the structure of the Hessian of the loss. First, we show that a common decomposition… ▽ More

    Submitted 24 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

  2. arXiv:2306.03262  [pdf, other

    cs.LG cs.DL

    Has the Machine Learning Review Process Become More Arbitrary as the Field Has Grown? The NeurIPS 2021 Consistency Experiment

    Authors: Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan

    Abstract: We present the NeurIPS 2021 consistency experiment, a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process. We observe that the two committees disagree on their accept/reject recommendations for 23% of the papers and that, consistent with the results from 2014, approxi… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  3. arXiv:2302.08692  [pdf, other

    cs.LG

    SAM operates far from home: eigenvalue regularization as a dynamical phenomenon

    Authors: Atish Agarwala, Yann N. Dauphin

    Abstract: The Sharpness Aware Minimization (SAM) optimization algorithm has been shown to control large eigenvalues of the loss Hessian and provide generalization benefits in a variety of settings. The original motivation for SAM was a modified loss function which penalized sharp minima; subsequent analyses have also focused on the behavior near minima. However, our work reveals that SAM provides a strong r… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  4. arXiv:2211.12966  [pdf, other

    cs.LG cs.DB cs.DL

    How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

    Authors: Charvi Rastogi, Ivan Stelmakh, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, Zhenyu Xue, Hal Daumé III, Emma Pierson, Nihar B. Shah

    Abstract: How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  5. arXiv:1908.05731  [pdf, ps, other

    cs.CL

    Simple and Effective Noisy Channel Modeling for Neural Machine Translation

    Authors: Kyra Yee, Nathan Ng, Yann N. Dauphin, Michael Auli

    Abstract: Previous work on neural noisy channel modeling relied on latent variable models that incrementally process the source and target sentence. This makes decoding decisions based on partial source prefixes even though the full source is available. We pursue an alternative approach based on standard sequence to sequence models which utilize the entire source. These models perform remarkably well as cha… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

    Comments: EMNLP 2019

  6. arXiv:1901.10430  [pdf, other

    cs.CL

    Pay Less Attention with Lightweight and Dynamic Convolutions

    Authors: Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli

    Abstract: Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. Next, we introduce dynamic convolutions which are simpler and more efficient tha… ▽ More

    Submitted 22 February, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: 14 pages, ICLR oral

  7. arXiv:1901.09321  [pdf, other

    cs.LG cs.CV stat.ML

    Fixup Initialization: Residual Learning Without Normalization

    Authors: Hongyi Zhang, Yann N. Dauphin, Tengyu Ma

    Abstract: Normalization layers are a staple in state-of-the-art deep neural network architectures. They are widely believed to stabilize training, enable higher learning rate, accelerate convergence and improve generalization, though the reason for their effectiveness is still an active research topic. In this work, we challenge the commonly-held beliefs by showing that none of the perceived benefits is uni… ▽ More

    Submitted 11 March, 2019; v1 submitted 27 January, 2019; originally announced January 2019.

    Comments: Updating reference. Accepted for publication at ICLR 2019; see https://openreview.net/forum?id=H1gsz30cKX

  8. arXiv:1710.09412  [pdf, other

    cs.LG stat.ML

    mixup: Beyond Empirical Risk Minimization

    Authors: Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz

    Abstract: Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear… ▽ More

    Submitted 27 April, 2018; v1 submitted 25 October, 2017; originally announced October 2017.

    Comments: ICLR camera ready version. Changes vs V1: fix repo URL; add ablation studies; add mixup + dropout etc

  9. arXiv:1706.05125  [pdf, ps, other

    cs.AI cs.CL

    Deal or No Deal? End-to-End Learning for Negotiation Dialogues

    Authors: Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra

    Abstract: Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions. Negotiations require complex communication and reasoning skills, but success is easy to measure, making this an interesting task for AI. We gather a large dataset of human-human negotiations on a multi-issue bargaining task, where agents who cannot observe each other'… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

  10. arXiv:1705.03122  [pdf, other

    cs.CL

    Convolutional Sequence to Sequence Learning

    Authors: Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

    Abstract: The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed… ▽ More

    Submitted 24 July, 2017; v1 submitted 8 May, 2017; originally announced May 2017.

  11. arXiv:1612.08083  [pdf, other

    cs.CL

    Language Modeling with Gated Convolutional Networks

    Authors: Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier

    Abstract: The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism tha… ▽ More

    Submitted 8 September, 2017; v1 submitted 23 December, 2016; originally announced December 2016.

  12. arXiv:1611.02344  [pdf, other

    cs.CL

    A Convolutional Encoder Model for Neural Machine Translation

    Authors: Jonas Gehring, Michael Auli, David Grangier, Yann N. Dauphin

    Abstract: The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. In this paper we present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the entire source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT'16 English-Rom… ▽ More

    Submitted 24 July, 2017; v1 submitted 7 November, 2016; originally announced November 2016.

    Comments: 13 pages

  13. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  14. arXiv:1511.05622  [pdf, other

    cs.LG cs.CV

    Predicting distributions with Linearizing Belief Networks

    Authors: Yann N. Dauphin, David Grangier

    Abstract: Conditional belief networks introduce stochastic binary variables in neural networks. Contrary to a classical neural network, a belief network can predict more than the expected value of the output $Y$ given the input $X$. It can predict a distribution of outputs $Y$ which is useful when an input can admit multiple outputs whose average is not necessarily a valid answer. Such networks are particul… ▽ More

    Submitted 1 May, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

  15. arXiv:1502.04390  [pdf, other

    cs.LG math.NA

    Equilibrated adaptive learning rates for non-convex optimization

    Authors: Yann N. Dauphin, Harm de Vries, Yoshua Bengio

    Abstract: Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the critical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help us d… ▽ More

    Submitted 29 August, 2015; v1 submitted 15 February, 2015; originally announced February 2015.

  16. arXiv:1405.4604  [pdf, other

    cs.LG cs.NE

    On the saddle point problem for non-convex optimization

    Authors: Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio

    Abstract: A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of l… ▽ More

    Submitted 27 May, 2014; v1 submitted 19 May, 2014; originally announced May 2014.

    Comments: 11 pages, 8 figures

  17. arXiv:1401.0509  [pdf, other

    cs.CL cs.LG

    Zero-Shot Learning for Semantic Utterance Classification

    Authors: Yann N. Dauphin, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck

    Abstract: We propose a novel zero-shot learning method for semantic utterance classification (SUC). It learns a classifier $f: X \to Y$ for problems where none of the semantic categories $Y$ are present in the training set. The framework uncovers the link between categories and utterances using a semantic space. We show that this semantic space can be learned by deep neural networks trained on large amounts… ▽ More

    Submitted 7 March, 2014; v1 submitted 20 December, 2013; originally announced January 2014.

  18. arXiv:1301.3583  [pdf, other

    cs.LG cs.CV

    Big Neural Networks Waste Capacity

    Authors: Yann N. Dauphin, Yoshua Bengio

    Abstract: This article exposes the failure of some big neural networks to leverage added capacity to reduce underfitting. Past research suggest diminishing returns when increasing the size of neural networks. Our experiments on ImageNet LSVRC-2010 show that this may be due to the fact there are highly diminishing returns for capacity in terms of training error, leading to underfitting. This suggests that th… ▽ More

    Submitted 14 March, 2013; v1 submitted 15 January, 2013; originally announced January 2013.