Skip to main content

Showing 1–39 of 39 results for author: Dauphin, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.10809  [pdf, other

    cs.LG

    Neglected Hessian component explains mysteries in Sharpness regularization

    Authors: Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

    Abstract: Recent work has shown that methods like SAM which either explicitly or implicitly penalize second order information can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We show that these differences can be explained by the structure of the Hessian of the loss. First, we show that a common decomposition… ▽ More

    Submitted 24 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

  2. arXiv:2311.14115  [pdf, other

    cs.LG cs.AI cs.CL

    A density estimation perspective on learning from pairwise human preferences

    Authors: Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo Larochelle, Yann Dauphin

    Abstract: Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted… ▽ More

    Submitted 10 January, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  3. arXiv:2306.03262  [pdf, other

    cs.LG cs.DL

    Has the Machine Learning Review Process Become More Arbitrary as the Field Has Grown? The NeurIPS 2021 Consistency Experiment

    Authors: Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan

    Abstract: We present the NeurIPS 2021 consistency experiment, a larger-scale variant of the 2014 NeurIPS experiment in which 10% of conference submissions were reviewed by two independent committees to quantify the randomness in the review process. We observe that the two committees disagree on their accept/reject recommendations for 23% of the papers and that, consistent with the results from 2014, approxi… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  4. arXiv:2305.13520  [pdf, other

    cs.CV cs.AI cs.LG

    Tied-Augment: Controlling Representation Similarity Improves Data Augmentation

    Authors: Emirhan Kurtulus, Zichao Li, Yann Dauphin, Ekin Dogus Cubuk

    Abstract: Data augmentation methods have played an important role in the recent advance of deep learning models, and have become an indispensable component of state-of-the-art models in semi-supervised, self-supervised, and supervised training for vision. Despite incurring no additional latency at test time, data augmentation often requires more epochs of training to be effective. For example, even the simp… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 14 pages, 2 figures, ICML 2023

  5. arXiv:2304.14082  [pdf, other

    cs.LG cs.SE

    JaxPruner: A concise library for sparsity research

    Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

    Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

  6. arXiv:2304.02847  [pdf, other

    cs.CV cs.AI cs.LG

    Robustmix: Improving Robustness by Regularizing the Frequency Bias of Deep Nets

    Authors: Jonas Ngnawe, Marianne ABEMGNIGNI NJIFON, Jonathan Heek, Yann Dauphin

    Abstract: Deep networks have achieved impressive results on a range of well-curated benchmark datasets. Surprisingly, their performance remains sensitive to perturbations that have little effect on human performance. In this work, we propose a novel extension of Mixup called Robustmix that regularizes networks to classify based on lower-frequency spatial features. We show that this type of regularization im… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Accepted at: Workshop on Distribution Shifts, 36th Conference on Neural Information Processing Systems (NeurIPS 2022). https://openreview.net/forum?id=Na64z0YpOx

  7. arXiv:2302.08692  [pdf, other

    cs.LG

    SAM operates far from home: eigenvalue regularization as a dynamical phenomenon

    Authors: Atish Agarwala, Yann N. Dauphin

    Abstract: The Sharpness Aware Minimization (SAM) optimization algorithm has been shown to control large eigenvalues of the loss Hessian and provide generalization benefits in a variety of settings. The original motivation for SAM was a modified loss function which penalized sharp minima; subsequent analyses have also focused on the behavior near minima. However, our work reveals that SAM provides a strong r… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

  8. arXiv:2211.12966  [pdf, other

    cs.LG cs.DB cs.DL

    How do Authors' Perceptions of their Papers Compare with Co-authors' Perceptions and Peer-review Decisions?

    Authors: Charvi Rastogi, Ivan Stelmakh, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan, Zhenyu Xue, Hal Daumé III, Emma Pierson, Nihar B. Shah

    Abstract: How do author perceptions match up to the outcomes of the peer-review process and perceptions of others? In a top-tier computer science conference (NeurIPS 2021) with more than 23,000 submitting authors and 9,000 submitted papers, we survey the authors on three questions: (i) their predicted probability of acceptance for each of their papers, (ii) their perceived ranking of their own papers based… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  9. arXiv:2110.12899  [pdf, other

    cs.LG

    No One Representation to Rule Them All: Overlap** Features of Training Methods

    Authors: Raphael Gontijo-Lopes, Yann Dauphin, Ekin D. Cubuk

    Abstract: Despite being able to capture a range of features of the data, high accuracy models trained with supervision tend to make similar predictions. This seemingly implies that high-performing models share similar biases regardless of training methodology, which would limit ensembling benefits and render low-accuracy models as having little practical use. Against this backdrop, recent work has developed… ▽ More

    Submitted 25 April, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

    Journal ref: International Conference on Learning Representations (ICLR) 2022

  10. arXiv:2108.11346  [pdf, other

    cs.LG

    Auxiliary Task Update Decomposition: The Good, The Bad and The Neutral

    Authors: Lucio M. Dery, Yann Dauphin, David Grangier

    Abstract: While deep learning has been very beneficial in data-rich settings, tasks with smaller training set often resort to pre-training or multitask learning to leverage data from other tasks. In this case, careful consideration is needed to select tasks and model parameterizations such that updates from the auxiliary tasks actually help the primary task. We seek to alleviate this burden by formulating a… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: 15 pages, 3 figures, Accepted to International Conference on Learning Representations (ICLR) 2021 See https://github.com/ldery/ATTITTUD}{https://github.com/ldery/ATTITTUD for associated code

  11. arXiv:2107.12283  [pdf, other

    cs.CV

    Continental-Scale Building Detection from High Resolution Satellite Imagery

    Authors: Wojciech Sirko, Sergii Kashubin, Marvin Ritter, Abigail Annkah, Yasser Salah Eddine Bouchareb, Yann Dauphin, Daniel Keysers, Maxim Neumann, Moustapha Cisse, John Quinn

    Abstract: Identifying the locations and footprints of buildings is vital for many practical and scientific purposes. Such information can be particularly useful in develo** regions where alternative data sources may be scarce. In this work, we describe a model training pipeline for detecting buildings across the entire continent of Africa, using 50 cm satellite imagery. Starting with the U-Net model, wide… ▽ More

    Submitted 29 July, 2021; v1 submitted 26 July, 2021; originally announced July 2021.

  12. arXiv:2010.07344  [pdf, other

    cs.LG cs.AI

    Temperature check: theory and practice for training models with softmax-cross-entropy losses

    Authors: Atish Agarwala, Jeffrey Pennington, Yann Dauphin, Sam Schoenholz

    Abstract: The softmax function combined with a cross-entropy loss is a principled approach to modeling probability distributions that has become ubiquitous in deep learning. The softmax function is defined by a lone hyperparameter, the temperature, that is commonly set to one or regarded as a way to tune model confidence after training; however, less is known about how the temperature impacts training dynam… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

  13. arXiv:2010.03533  [pdf, other

    cs.LG cs.CV

    Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win

    Authors: Utku Evci, Yani A. Ioannou, Cem Keskin, Yann Dauphin

    Abstract: Sparse Neural Networks (NNs) can match the generalization of dense NNs using a fraction of the compute/storage for inference, and also have the potential to enable efficient training. However, naively training unstructured sparse NNs from random initialization results in significantly worse generalization, with the notable exceptions of Lottery Tickets (LTs) and Dynamic Sparse Training (DST). Thro… ▽ More

    Submitted 15 March, 2022; v1 submitted 7 October, 2020; originally announced October 2020.

    Comments: Published in AAAI 2022. Code can be found at https://github.com/google-research/rigl/tree/master/rigl/rigl_tf2

    MSC Class: 68T07

  14. arXiv:2003.10647  [pdf, other

    cs.LG cs.CV eess.IV

    Robust and On-the-fly Dataset Denoising for Image Classification

    Authors: Jiaming Song, Lunjia Hu, Michael Auli, Yann Dauphin, Tengyu Ma

    Abstract: Memorization in over-parameterized neural networks could severely hurt generalization in the presence of mislabeled examples. However, mislabeled examples are hard to avoid in extremely large datasets collected with weak supervision. We address this problem by reasoning counterfactually about the loss distribution of examples with uniform random labels had they were trained with the real examples,… ▽ More

    Submitted 9 April, 2020; v1 submitted 23 March, 2020; originally announced March 2020.

  15. arXiv:1911.05248  [pdf, other

    cs.LG cs.AI cs.CV cs.HC stat.ML

    What Do Compressed Deep Neural Networks Forget?

    Authors: Sara Hooker, Aaron Courville, Gregory Clark, Yann Dauphin, Andrea Frome

    Abstract: Deep neural network pruning and quantization techniques have demonstrated it is possible to achieve high levels of compression with surprisingly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques. We find that models with radically different numbers of weight… ▽ More

    Submitted 5 September, 2021; v1 submitted 12 November, 2019; originally announced November 2019.

  16. arXiv:1908.05731  [pdf, ps, other

    cs.CL

    Simple and Effective Noisy Channel Modeling for Neural Machine Translation

    Authors: Kyra Yee, Nathan Ng, Yann N. Dauphin, Michael Auli

    Abstract: Previous work on neural noisy channel modeling relied on latent variable models that incrementally process the source and target sentence. This makes decoding decisions based on partial source prefixes even though the full source is available. We pursue an alternative approach based on standard sequence to sequence models which utilize the entire source. These models perform remarkably well as cha… ▽ More

    Submitted 15 August, 2019; originally announced August 2019.

    Comments: EMNLP 2019

  17. arXiv:1903.05168  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    On the Pitfalls of Measuring Emergent Communication

    Authors: Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, Yann Dauphin

    Abstract: How do we know if communication is emerging in a multi-agent system? The vast majority of recent papers on emergent communication show that adding a communication channel leads to an increase in reward or task success. This is a useful indicator, but provides only a coarse measure of the agent's learned communication abilities. As we move towards more complex environments, it becomes imperative to… ▽ More

    Submitted 12 March, 2019; originally announced March 2019.

    Comments: AAMAS 2019. 13 pages

  18. arXiv:1902.01109  [pdf, other

    cs.CL

    Strategies for Structuring Story Generation

    Authors: Angela Fan, Mike Lewis, Yann Dauphin

    Abstract: Writers generally rely on plans or sketches to write long stories, but most current language models generate word by word from left to right. We explore coarse-to-fine models for creating narrative texts of several hundred words, and introduce new models which decompose stories by abstracting over actions and entities. The model first generates the predicate-argument structure of the text, where d… ▽ More

    Submitted 15 June, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

  19. arXiv:1901.10430  [pdf, other

    cs.CL

    Pay Less Attention with Lightweight and Dynamic Convolutions

    Authors: Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli

    Abstract: Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. Next, we introduce dynamic convolutions which are simpler and more efficient tha… ▽ More

    Submitted 22 February, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: 14 pages, ICLR oral

  20. arXiv:1901.09321  [pdf, other

    cs.LG cs.CV stat.ML

    Fixup Initialization: Residual Learning Without Normalization

    Authors: Hongyi Zhang, Yann N. Dauphin, Tengyu Ma

    Abstract: Normalization layers are a staple in state-of-the-art deep neural network architectures. They are widely believed to stabilize training, enable higher learning rate, accelerate convergence and improve generalization, though the reason for their effectiveness is still an active research topic. In this work, we challenge the commonly-held beliefs by showing that none of the perceived benefits is uni… ▽ More

    Submitted 11 March, 2019; v1 submitted 27 January, 2019; originally announced January 2019.

    Comments: Updating reference. Accepted for publication at ICLR 2019; see https://openreview.net/forum?id=H1gsz30cKX

  21. arXiv:1805.04833  [pdf, other

    cs.CL

    Hierarchical Neural Story Generation

    Authors: Angela Fan, Mike Lewis, Yann Dauphin

    Abstract: We explore story generation: creative systems that can build coherent and fluent passages of text about a topic. We collect a large dataset of 300K human-written stories paired with writing prompts from an online forum. Our dataset enables hierarchical story generation, where the model first generates a premise, and then transforms it into a passage of text. We gain further improvements with a nov… ▽ More

    Submitted 13 May, 2018; originally announced May 2018.

  22. arXiv:1710.09412  [pdf, other

    cs.LG stat.ML

    mixup: Beyond Empirical Risk Minimization

    Authors: Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz

    Abstract: Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear… ▽ More

    Submitted 27 April, 2018; v1 submitted 25 October, 2017; originally announced October 2017.

    Comments: ICLR camera ready version. Changes vs V1: fix repo URL; add ablation studies; add mixup + dropout etc

  23. arXiv:1706.05125  [pdf, ps, other

    cs.AI cs.CL

    Deal or No Deal? End-to-End Learning for Negotiation Dialogues

    Authors: Mike Lewis, Denis Yarats, Yann N. Dauphin, Devi Parikh, Dhruv Batra

    Abstract: Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions. Negotiations require complex communication and reasoning skills, but success is easy to measure, making this an interesting task for AI. We gather a large dataset of human-human negotiations on a multi-issue bargaining task, where agents who cannot observe each other'… ▽ More

    Submitted 15 June, 2017; originally announced June 2017.

  24. arXiv:1706.04454  [pdf, other

    cs.LG

    Empirical Analysis of the Hessian of Over-Parametrized Neural Networks

    Authors: Levent Sagun, Utku Evci, V. Ugur Guney, Yann Dauphin, Leon Bottou

    Abstract: We study the properties of common loss surfaces through their Hessian matrix. In particular, in the context of deep learning, we empirically show that the spectrum of the Hessian is composed of two parts: (1) the bulk centered near zero, (2) and outliers away from the bulk. We present numerical evidence and mathematical justifications to the following conjectures laid out by Sagun et al. (2016): F… ▽ More

    Submitted 7 May, 2018; v1 submitted 14 June, 2017; originally announced June 2017.

    Comments: Minor update for ICLR 2018 Workshop Track presentation

  25. arXiv:1706.03643  [pdf, other

    cs.LG

    Tackling Over-pruning in Variational Autoencoders

    Authors: Serena Yeung, Anitha Kannan, Yann Dauphin, Li Fei-Fei

    Abstract: Variational autoencoders (VAE) are directed generative models that learn factorial latent variables. As noted by Burda et al. (2015), these models exhibit the problem of factor over-pruning where a significant number of stochastic factors fail to learn anything and become inactive. This can limit their modeling power and their ability to learn diverse and meaningful latent representations. In this… ▽ More

    Submitted 6 August, 2017; v1 submitted 9 June, 2017; originally announced June 2017.

  26. arXiv:1705.03122  [pdf, other

    cs.CL

    Convolutional Sequence to Sequence Learning

    Authors: Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, Yann N. Dauphin

    Abstract: The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed… ▽ More

    Submitted 24 July, 2017; v1 submitted 8 May, 2017; originally announced May 2017.

  27. arXiv:1704.08847  [pdf, other

    stat.ML cs.AI cs.CR cs.LG

    Parseval Networks: Improving Robustness to Adversarial Examples

    Authors: Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, Nicolas Usunier

    Abstract: We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Parseval networks are empirically and theoretically motivated by an analysis of the robustness of the predictions made by deep neural networks when their input is subject to an adversarial perturbation. The most importan… ▽ More

    Submitted 1 May, 2017; v1 submitted 28 April, 2017; originally announced April 2017.

    Comments: submitted

  28. arXiv:1612.08083  [pdf, other

    cs.CL

    Language Modeling with Gated Convolutional Networks

    Authors: Yann N. Dauphin, Angela Fan, Michael Auli, David Grangier

    Abstract: The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism tha… ▽ More

    Submitted 8 September, 2017; v1 submitted 23 December, 2016; originally announced December 2016.

  29. arXiv:1611.02344  [pdf, other

    cs.CL

    A Convolutional Encoder Model for Neural Machine Translation

    Authors: Jonas Gehring, Michael Auli, David Grangier, Yann N. Dauphin

    Abstract: The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. In this paper we present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the entire source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT'16 English-Rom… ▽ More

    Submitted 24 July, 2017; v1 submitted 7 November, 2016; originally announced November 2016.

    Comments: 13 pages

  30. arXiv:1605.02688  [pdf, other

    cs.SC cs.LG cs.MS

    Theano: A Python framework for fast computation of mathematical expressions

    Authors: The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano , et al. (88 additional authors not shown)

    Abstract: Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, mu… ▽ More

    Submitted 9 May, 2016; originally announced May 2016.

    Comments: 19 pages, 5 figures

  31. arXiv:1511.05622  [pdf, other

    cs.LG cs.CV

    Predicting distributions with Linearizing Belief Networks

    Authors: Yann N. Dauphin, David Grangier

    Abstract: Conditional belief networks introduce stochastic binary variables in neural networks. Contrary to a classical neural network, a belief network can predict more than the expected value of the output $Y$ given the input $X$. It can predict a distribution of outputs $Y$ which is useful when an input can admit multiple outputs whose average is not necessarily a valid answer. Such networks are particul… ▽ More

    Submitted 1 May, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

  32. arXiv:1503.01800  [pdf, other

    cs.LG cs.CV

    EmoNets: Multimodal deep learning approaches for emotion recognition in video

    Authors: Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Caglar Gulcehre, Vincent Michalski, Kishore Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, Raul Chandias Ferrari, Mehdi Mirza, David Warde-Farley, Aaron Courville, Pascal Vincent, Roland Memisevic, Christopher Pal, Yoshua Bengio

    Abstract: The task of the emotion recognition in the wild (EmotiW) Challenge is to assign one of seven emotions to short video clips extracted from Hollywood style movies. The videos depict acted-out emotions under realistic conditions with a large degree of variation in attributes such as pose and illumination, making it worthwhile to explore approaches which consider combinations of features from multiple… ▽ More

    Submitted 29 March, 2015; v1 submitted 5 March, 2015; originally announced March 2015.

  33. arXiv:1502.04390  [pdf, other

    cs.LG math.NA

    Equilibrated adaptive learning rates for non-convex optimization

    Authors: Yann N. Dauphin, Harm de Vries, Yoshua Bengio

    Abstract: Parameter-specific adaptive learning rate methods are computationally efficient ways to reduce the ill-conditioning problems encountered when training large deep networks. Following recent work that strongly suggests that most of the critical points encountered when training such networks are saddle points, we find how considering the presence of negative eigenvalues of the Hessian could help us d… ▽ More

    Submitted 29 August, 2015; v1 submitted 15 February, 2015; originally announced February 2015.

  34. arXiv:1406.2572  [pdf, other

    cs.LG math.OC stat.ML

    Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

    Authors: Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

    Abstract: A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima wit… ▽ More

    Submitted 10 June, 2014; originally announced June 2014.

    Comments: The theoretical review and analysis in this article draw heavily from arXiv:1405.4604 [cs.LG]

  35. arXiv:1405.4604  [pdf, other

    cs.LG cs.NE

    On the saddle point problem for non-convex optimization

    Authors: Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio

    Abstract: A central challenge to many fields of science and engineering involves minimizing non-convex error functions over continuous, high dimensional spaces. Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of l… ▽ More

    Submitted 27 May, 2014; v1 submitted 19 May, 2014; originally announced May 2014.

    Comments: 11 pages, 8 figures

  36. arXiv:1401.0509  [pdf, other

    cs.CL cs.LG

    Zero-Shot Learning for Semantic Utterance Classification

    Authors: Yann N. Dauphin, Gokhan Tur, Dilek Hakkani-Tur, Larry Heck

    Abstract: We propose a novel zero-shot learning method for semantic utterance classification (SUC). It learns a classifier $f: X \to Y$ for problems where none of the semantic categories $Y$ are present in the training set. The framework uncovers the link between categories and utterances using a semantic space. We show that this semantic space can be learned by deep neural networks trained on large amounts… ▽ More

    Submitted 7 March, 2014; v1 submitted 20 December, 2013; originally announced January 2014.

  37. arXiv:1301.3583  [pdf, other

    cs.LG cs.CV

    Big Neural Networks Waste Capacity

    Authors: Yann N. Dauphin, Yoshua Bengio

    Abstract: This article exposes the failure of some big neural networks to leverage added capacity to reduce underfitting. Past research suggest diminishing returns when increasing the size of neural networks. Our experiments on ImageNet LSVRC-2010 show that this may be due to the fact there are highly diminishing returns for capacity in terms of training error, leading to underfitting. This suggests that th… ▽ More

    Submitted 14 March, 2013; v1 submitted 15 January, 2013; originally announced January 2013.

  38. arXiv:1207.4404  [pdf, other

    cs.LG

    Better Mixing via Deep Representations

    Authors: Yoshua Bengio, Grégoire Mesnil, Yann Dauphin, Salah Rifai

    Abstract: It has previously been hypothesized, and supported with some experimental evidence, that deeper representations, when well trained, tend to do a better job at disentangling the underlying factors of variation. We study the following related conjecture: better representations, in the sense of better disentangling, can be exploited to produce faster-mixing Markov chains. Consequently, mixing would b… ▽ More

    Submitted 18 July, 2012; originally announced July 2012.

  39. arXiv:1206.6434  [pdf

    cs.LG stat.ML

    A Generative Process for Sampling Contractive Auto-Encoders

    Authors: Salah Rifai, Yoshua Bengio, Yann Dauphin, Pascal Vincent

    Abstract: The contractive auto-encoder learns a representation of the input data that captures the local manifold structure around each data point, through the leading singular vectors of the Jacobian of the transformation from input to representation. The corresponding singular values specify how much local variation is plausible in directions associated with the corresponding singular vectors, while remai… ▽ More

    Submitted 27 June, 2012; originally announced June 2012.

    Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)