Skip to main content

Showing 51–100 of 156 results for author: LeCun, Y

.
  1. arXiv:2206.08954  [pdf, other

    cs.CV cs.LG

    Bag of Image Patch Embedding Behind the Success of Self-Supervised Learning

    Authors: Yubei Chen, Adrien Bardes, Zengyi Li, Yann LeCun

    Abstract: Self-supervised learning (SSL) has recently achieved tremendous empirical advancements in learning image representation. However, our understanding of the principle behind learning such a representation is still limited. This work shows that joint-embedding SSL approaches primarily learn a representation of image patches, which reflects their co-occurrence. Such a connection to co-occurrence model… ▽ More

    Submitted 12 June, 2023; v1 submitted 17 June, 2022; originally announced June 2022.

  2. arXiv:2206.07700  [pdf, other

    cs.CV cs.AI cs.LG

    Masked Siamese ConvNets

    Authors: Li **g, Jiachen Zhu, Yann LeCun

    Abstract: Self-supervised learning has shown superior performances over supervised methods on various vision benchmarks. The siamese network, which encourages embeddings to be invariant to distortions, is one of the most successful self-supervised visual representation learning approaches. Among all the augmentation methods, masking is the most general and straightforward method that has the potential to be… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

  3. arXiv:2206.07643  [pdf, other

    cs.CV cs.CL cs.LG

    Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

    Authors: Zi-Yi Dou, Aishwarya Kamath, Zhe Gan, Pengchuan Zhang, Jianfeng Wang, Linjie Li, Zicheng Liu, Ce Liu, Yann LeCun, Nanyun Peng, Jianfeng Gao, Lijuan Wang

    Abstract: Vision-language (VL) pre-training has recently received considerable attention. However, most existing end-to-end pre-training approaches either only aim to tackle VL tasks such as image-text retrieval, visual question answering (VQA) and image captioning that test high-level understanding of images, or only target region-level understanding for tasks such as phrase grounding and object detection.… ▽ More

    Submitted 18 November, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022. Project Website: https://ashkamath.github.io/FIBER_page

  4. On the duality between contrastive and non-contrastive self-supervised learning

    Authors: Quentin Garrido, Yubei Chen, Adrien Bardes, Laurent Najman, Yann Lecun

    Abstract: Recent approaches in self-supervised learning of image representations can be categorized into different families of methods and, in particular, can be divided into contrastive and non-contrastive approaches. While differences between the two families have been thoroughly discussed to motivate new approaches, we focus more on the theoretical similarities between them. By designing contrastive and… ▽ More

    Submitted 26 June, 2023; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: The Eleventh International Conference on Learning Representations, 2023, Kigali, Rwanda

  5. arXiv:2205.11508  [pdf, other

    cs.LG cs.AI cs.CV math.SP stat.ML

    Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods

    Authors: Randall Balestriero, Yann LeCun

    Abstract: Self-Supervised Learning (SSL) surmises that inputs and pairwise positive relationships are enough to learn meaningful representations. Although SSL has recently reached a milestone: outperforming supervised methods in many modalities\dots the theoretical foundations are limited, method-specific, and fail to provide principled design guidelines to practitioners. In this paper, we propose a unifyin… ▽ More

    Submitted 10 June, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

  6. arXiv:2205.10279  [pdf, other

    cs.LG cs.CV

    Pre-Train Your Loss: Easy Bayesian Transfer Learning with Informative Priors

    Authors: Ravid Shwartz-Ziv, Micah Goldblum, Hossein Souri, Sanyam Kapoor, Chen Zhu, Yann LeCun, Andrew Gordon Wilson

    Abstract: Deep learning is increasingly moving towards a transfer learning paradigm whereby large foundation models are fine-tuned on downstream tasks, starting from an initialization learned on the source task. But an initialization contains relatively little information about the source task. Instead, we show that we can learn highly informative posteriors from the source task, through supervised or self-… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

    Comments: Code available at https://github.com/hsouri/BayesianTransferLearning

  7. arXiv:2204.07184  [pdf, other

    cs.RO

    Separating the World and Ego Models for Self-Driving

    Authors: Vlad Sobal, Alfredo Canziani, Nicolas Carion, Kyunghyun Cho, Yann LeCun

    Abstract: Training self-driving systems to be robust to the long-tail of driving scenarios is a critical problem. Model-based approaches leverage simulation to emulate a wide range of scenarios without putting users at risk in the real world. One promising path to faithful simulation is to train a forward model of the world to predict the future states of both the environment and the ego-vehicle given past… ▽ More

    Submitted 14 April, 2022; originally announced April 2022.

    Comments: 8 pages main content, 14 with references and appendix. 5 figures in total. Submitted and accepted to ICLR 2022 workshop on Generalizable Policy Learning in the Physical World (https://ai-workshops.github.io/generalizable-policy-learning-in-the-physical-world/)

  8. arXiv:2204.03632  [pdf, other

    cs.LG cs.CV stat.ML

    The Effects of Regularization and Data Augmentation are Class Dependent

    Authors: Randall Balestriero, Leon Bottou, Yann LeCun

    Abstract: Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that… ▽ More

    Submitted 8 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

  9. arXiv:2203.05483  [pdf, other

    cs.LG cs.AI quant-ph

    projUNN: efficient method for training deep networks with unitary matrices

    Authors: Bobak Kiani, Randall Balestriero, Yann LeCun, Seth Lloyd

    Abstract: In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-$k$ updates -- or their rank-$k$ approxi… ▽ More

    Submitted 13 October, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

  10. arXiv:2202.08325  [pdf, other

    cs.LG cs.CV

    A Data-Augmentation Is Worth A Thousand Samples: Exact Quantification From Analytical Augmented Sample Moments

    Authors: Randall Balestriero, Ishan Misra, Yann LeCun

    Abstract: Data-Augmentation (DA) is known to improve performance across tasks and datasets. We propose a method to theoretically analyze the effect of DA and study questions such as: how many augmented samples are needed to correctly estimate the information encoded by that DA? How does the augmentation policy impact the final parameters of a model? We derive several quantities in close-form, such as the ex… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

  11. arXiv:2201.10000  [pdf, other

    cs.LG cs.CV

    Neural Manifold Clustering and Embedding

    Authors: Zengyi Li, Yubei Chen, Yann LeCun, Friedrich T. Sommer

    Abstract: Given a union of non-linear manifolds, non-linear subspace clustering or manifold clustering aims to cluster data points based on manifold structures and also learn to parameterize each manifold as a linear subspace in a feature space. Deep neural networks have the potential to achieve this goal under highly non-linear settings given their large capacity and flexibility. We argue that achieving ma… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  12. arXiv:2112.09214  [pdf, other

    cs.CV cs.LG

    Sparse Coding with Multi-Layer Decoders using Variance Regularization

    Authors: Katrina Evtimova, Yann LeCun

    Abstract: Sparse representations of images are useful in many computer vision applications. Sparse coding with an $l_1$ penalty and a learned linear dictionary requires regularization of the dictionary to prevent a collapse in the $l_1$ norms of the codes. Typically, this regularization entails bounding the Euclidean norms of the dictionary's elements. In this work, we propose a novel sparse coding protocol… ▽ More

    Submitted 7 September, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

  13. arXiv:2110.09485  [pdf, other

    cs.LG cs.CV

    Learning in High Dimension Always Amounts to Extrapolation

    Authors: Randall Balestriero, Jerome Pesenti, Yann LeCun

    Abstract: The notion of interpolation and extrapolation is fundamental in various fields from deep learning to function approximation. Interpolation occurs for a sample $x$ whenever this sample falls inside or on the boundary of the given dataset's convex hull. Extrapolation occurs when $x$ falls outside of that convex hull. One fundamental (mis)conception is that state-of-the-art algorithms work so well be… ▽ More

    Submitted 29 October, 2021; v1 submitted 18 October, 2021; originally announced October 2021.

  14. arXiv:2110.09348  [pdf, other

    cs.CV cs.AI cs.LG

    Understanding Dimensional Collapse in Contrastive Self-supervised Learning

    Authors: Li **g, Pascal Vincent, Yann LeCun, Yuandong Tian

    Abstract: Self-supervised visual representation learning aims to learn useful representations without relying on human annotations. Joint embedding approach bases on maximizing the agreement between embedding vectors from different views of the same image. Various methods have been proposed to solve the collapsing problem where all embedding vectors collapse to a trivial constant solution. Among these metho… ▽ More

    Submitted 23 April, 2022; v1 submitted 18 October, 2021; originally announced October 2021.

    Comments: In Proceedings of the 10th International Conference on Learning Representations (ICLR) 2022

    Journal ref: ICLR 2022

  15. arXiv:2110.06848  [pdf, other

    cs.LG cs.CV

    Decoupled Contrastive Learning

    Authors: Chun-Hsiao Yeh, Cheng-Yao Hong, Yen-Chi Hsu, Tyng-Luh Liu, Yubei Chen, Yann LeCun

    Abstract: Contrastive learning (CL) is one of the most successful paradigms for self-supervised learning (SSL). In a principled way, it considers two augmented "views" of the same image as positive to be pulled closer, and all other images as negative to be pushed further apart. However, behind the impressive success of CL-based techniques, their formulation often relies on heavy-computation settings, inclu… ▽ More

    Submitted 29 July, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: Accepted by ECCV2022

  16. arXiv:2107.07110  [pdf, other

    cs.CV cs.LG

    Compact and Optimal Deep Learning with Recurrent Parameter Generators

    Authors: Jiayun Wang, Yubei Chen, Stella X. Yu, Brian Cheung, Yann LeCun

    Abstract: Deep learning has achieved tremendous success by training increasingly large models, which are then compressed for practical deployment. We propose a drastically different approach to compact and optimal deep learning: We decouple the Degrees of freedom (DoF) and the actual number of parameters of a model, optimize a small DoF with predefined random linear constraints for a large model of arbitrar… ▽ More

    Submitted 26 October, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

    Journal ref: WACV 2023

  17. arXiv:2105.04906  [pdf, other

    cs.CV cs.AI cs.LG

    VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

    Authors: Adrien Bardes, Jean Ponce, Yann LeCun

    Abstract: Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this… ▽ More

    Submitted 28 January, 2022; v1 submitted 11 May, 2021; originally announced May 2021.

    Comments: Accepted at ICLR 2022

  18. arXiv:2104.12763  [pdf, other

    cs.CV cs.CL cs.LG

    MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

    Authors: Aishwarya Kamath, Mannat Singh, Yann LeCun, Gabriel Synnaeve, Ishan Misra, Nicolas Carion

    Abstract: Multi-modal reasoning systems rely on a pre-trained object detector to extract regions of interest from the image. However, this crucial module is typically used as a black box, trained independently of the downstream task and on a fixed vocabulary of objects and attributes. This makes it challenging for such systems to capture the long tail of visual concepts expressed in free form text. In this… ▽ More

    Submitted 11 October, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

  19. arXiv:2103.15949  [pdf, other

    cs.CL cs.LG

    Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors

    Authors: Zeyu Yun, Yubei Chen, Bruno A Olshausen, Yann LeCun

    Abstract: Transformer networks have revolutionized NLP representation learning since they were introduced. Though a great effort has been made to explain the representation in transformers, it is widely recognized that our understanding is not sufficient. One important reason is that there lack enough visualization tools for detailed analysis. In this paper, we propose to use dictionary learning to open up… ▽ More

    Submitted 4 April, 2023; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: This paper is published at DeeLIO Workshop@NAACL 2021

  20. arXiv:2103.03230  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    Barlow Twins: Self-Supervised Learning via Redundancy Reduction

    Authors: Jure Zbontar, Li **g, Ishan Misra, Yann LeCun, Stéphane Deny

    Abstract: Self-supervised learning (SSL) is rapidly closing the gap with supervised methods on large computer vision benchmarks. A successful approach to SSL is to learn embeddings which are invariant to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current methods avoid such solutions by careful implementation details. We… ▽ More

    Submitted 14 June, 2021; v1 submitted 4 March, 2021; originally announced March 2021.

    Comments: 13 pages, 6 figures, to appear at ICML 2021

  21. arXiv:2010.00679  [pdf, other

    cs.LG cs.CV stat.ML

    Implicit Rank-Minimizing Autoencoder

    Authors: Li **g, Jure Zbontar, Yann LeCun

    Abstract: An important component of autoencoders is the method by which the information capacity of the latent representation is minimized or limited. In this work, the rank of the covariance matrix of the codes is implicitly minimized by relying on the fact that gradient descent learning in multi-layer linear networks leads to minimum-rank solutions. By inserting a number of extra linear layers between the… ▽ More

    Submitted 14 October, 2020; v1 submitted 1 October, 2020; originally announced October 2020.

  22. arXiv:1906.11661  [pdf, other

    cs.CV cs.LG stat.ML

    Inspirational Adversarial Image Generation

    Authors: Baptiste Rozière, Morgane Riviere, Olivier Teytaud, Jérémy Rapin, Yann LeCun, Camille Couprie

    Abstract: The task of image generation started to receive some attention from artists and designers to inspire them in new creations. However, exploiting the results of deep generative models such as Generative Adversarial Networks can be long and tedious given the lack of existing tools. In this work, we propose a simple strategy to inspire creators with new generations learned from a dataset of their choi… ▽ More

    Submitted 2 April, 2021; v1 submitted 17 June, 2019; originally announced June 2019.

    Journal ref: TIP 2021

  23. arXiv:1904.03148  [pdf, other

    cs.CV

    Unsupervised Image Matching and Object Discovery as Optimization

    Authors: Huy V. Vo, Francis Bach, Minsu Cho, Kai Han, Yann LeCun, Patrick Perez, Jean Ponce

    Abstract: Learning with complete or partial supervision is powerful but relies on ever-growing human annotation efforts. As a way to mitigate this serious problem, as well as to serve specific applications, unsupervised learning has emerged as an important field of research. In computer vision, unsupervised learning comes in various guises. We focus here on the unsupervised discovery and matching of object… ▽ More

    Submitted 5 April, 2019; originally announced April 2019.

    Comments: Accepted to CVPR 2019

  24. arXiv:1902.08401  [pdf, other

    cs.LG stat.ML

    Learning about an exponential amount of conditional distributions

    Authors: Mohamed Ishmael Belghazi, Maxime Oquab, Yann LeCun, David Lopez-Paz

    Abstract: We introduce the Neural Conditioner (NC), a self-supervised machine able to learn about all the conditional distributions of a random vector $X$. The NC is a function $NC(x \cdot a, a, r)$ that leverages adversarial training to match each conditional distribution $P(X_r|X_a=x_a)$. After training, the NC generalizes to sample from conditional distributions never seen, including the joint distributi… ▽ More

    Submitted 22 February, 2019; originally announced February 2019.

    Comments: 8 pages, 7 figures

  25. arXiv:1901.02705  [pdf, other

    cs.LG cs.AI stat.ML

    Model-Predictive Policy Learning with Uncertainty Regularization for Driving in Dense Traffic

    Authors: Mikael Henaff, Alfredo Canziani, Yann LeCun

    Abstract: Learning a policy using only observational data is challenging because the distribution of states it induces at execution time may differ from the distribution observed during training. We propose to train a policy by unrolling a learned model of the environment dynamics over multiple time steps while explicitly penalizing two costs: the original cost the policy seeks to optimize, and an uncertain… ▽ More

    Submitted 7 January, 2019; originally announced January 2019.

  26. arXiv:1812.01161  [pdf, other

    stat.ML cs.AI cs.LG

    A Spectral Regularizer for Unsupervised Disentanglement

    Authors: Aditya Ramesh, Youngduck Choi, Yann LeCun

    Abstract: A generative model with a disentangled representation allows for independent control over different aspects of the output. Learning disentangled representations has been a recent topic of great interest, but it remains poorly understood. We show that even for GANs that do not possess disentangled representations, one can find curved trajectories in latent space over which local disentanglement occ… ▽ More

    Submitted 5 February, 2019; v1 submitted 3 December, 2018; originally announced December 2018.

  27. arXiv:1811.04201  [pdf, other

    cs.CL cs.LG

    Adversarially-Trained Normalized Noisy-Feature Auto-Encoder for Text Generation

    Authors: Xiang Zhang, Yann LeCun

    Abstract: This article proposes Adversarially-Trained Normalized Noisy-Feature Auto-Encoder (ATNNFAE) for byte-level text generation. An ATNNFAE consists of an auto-encoder where the internal code is normalized on the unit sphere and corrupted by additive noise. Simultaneously, a replica of the decoder (sharing the same parameters as the AE decoder) is used as the generator and fed with random latent vector… ▽ More

    Submitted 10 November, 2018; originally announced November 2018.

  28. arXiv:1806.05662  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    GLoMo: Unsupervisedly Learned Relational Graphs as Transferable Representations

    Authors: Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann LeCun

    Abstract: Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning… ▽ More

    Submitted 2 July, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

  29. arXiv:1806.00499  [pdf, other

    cs.LG cs.AI stat.ML

    Backpropagation for Implicit Spectral Densities

    Authors: Aditya Ramesh, Yann LeCun

    Abstract: Most successful machine intelligence systems rely on gradient-based learning, which is made possible by backpropagation. Some systems are designed to aid us in interpreting data when explicit goals cannot be provided. These unsupervised systems are commonly trained by backpropagating through a likelihood function. We introduce a tool that allows us to do this even when the likelihood is not explic… ▽ More

    Submitted 1 June, 2018; originally announced June 2018.

  30. arXiv:1805.12076  [pdf, other

    cs.LG stat.ML

    Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks

    Authors: Behnam Neyshabur, Zhiyuan Li, Srinadh Bhojanapalli, Yann LeCun, Nathan Srebro

    Abstract: Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization. In this work we suggest a novel complexity measure based on unit-wise capacities resulting in a tighter generalization bound… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: 19 pages, 8 figures

  31. arXiv:1804.00921  [pdf, other

    cs.LG stat.ML

    DeSIGN: Design Inspiration from Generative Networks

    Authors: Othman Sbai, Mohamed Elhoseiny, Antoine Bordes, Yann LeCun, Camille Couprie

    Abstract: Can an algorithm create original and compelling fashion designs to serve as an inspirational assistant? To help answer this question, we design and investigate different image generation models associated with different loss functions to boost creativity in fashion generation. The dimensions of our explorations include: (i) different Generative Adversarial Networks architectures that start from no… ▽ More

    Submitted 14 September, 2018; v1 submitted 3 April, 2018; originally announced April 2018.

  32. arXiv:1803.11496  [pdf, other

    cs.CV

    Predicting Future Instance Segmentation by Forecasting Convolutional Features

    Authors: Pauline Luc, Camille Couprie, Yann LeCun, Jakob Verbeek

    Abstract: Anticipating future events is an important prerequisite towards intelligent behavior. Video forecasting has been studied as a proxy task towards this goal. Recent work has shown that to predict semantic segmentation of future frames, forecasting at the semantic level is more effective than forecasting RGB frames and then segmenting these. In this paper we consider the more challenging problem of f… ▽ More

    Submitted 3 October, 2018; v1 submitted 30 March, 2018; originally announced March 2018.

  33. arXiv:1803.06969  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Comparing Dynamics: Deep Neural Networks versus Glassy Systems

    Authors: M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, M. Wyart, G. Biroli

    Abstract: We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that dur… ▽ More

    Submitted 7 June, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

    Comments: 10 pages, 5 figures. Version accepted at ICML 2018

    Journal ref: PMLR 80:324-333, 2018; Republication with DOI (cite this one): J. Stat. Mech. (2019) 124013

  34. arXiv:1802.01817  [pdf, other

    cs.CL

    Byte-Level Recursive Convolutional Auto-Encoder for Text

    Authors: Xiang Zhang, Yann LeCun

    Abstract: This article proposes to auto-encode text at byte-level using convolutional networks with a recursive architecture. The motivation is to explore whether it is possible to have scalable and homogeneous text generation at byte-level in a non-sequential fashion through the simple task of auto-encoding. We show that non-sequential text generation from a fixed-length representation is not only possible… ▽ More

    Submitted 6 February, 2018; originally announced February 2018.

    Comments: Rejected from ICLR 2018

  35. arXiv:1711.11248  [pdf, other

    cs.CV

    A Closer Look at Spatiotemporal Convolutions for Action Recognition

    Authors: Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, Manohar Paluri

    Abstract: In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of r… ▽ More

    Submitted 11 April, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

  36. arXiv:1711.04994  [pdf, other

    cs.AI

    Prediction Under Uncertainty with Error-Encoding Networks

    Authors: Mikael Henaff, Junbo Zhao, Yann LeCun

    Abstract: In this work we introduce a new framework for performing temporal predictions in the presence of uncertainty. It is based on a simple idea of disentangling components of the future state which are predictable from those which are inherently unpredictable, and encoding the unpredictable components into a low-dimensional latent variable which is fed into a forward model. Our method uses a supervised… ▽ More

    Submitted 30 November, 2017; v1 submitted 14 November, 2017; originally announced November 2017.

  37. arXiv:1709.01062  [pdf, ps, other

    cs.LG cs.CV stat.ML

    A hierarchical loss and its problems when classifying non-hierarchically

    Authors: Cinna Wu, Mark Tygert, Yann LeCun

    Abstract: Failing to distinguish between a sheepdog and a skyscraper should be worse and penalized more than failing to distinguish between a sheepdog and a poodle; after all, sheepdogs and poodles are both breeds of dogs. However, existing metrics of failure (so-called "loss" or "win") used in textual or visual classification/recognition via neural networks seldom leverage a-priori information, such as a s… ▽ More

    Submitted 9 December, 2019; v1 submitted 1 September, 2017; originally announced September 2017.

    Comments: 19 pages, 4 figures, 7 tables

    Journal ref: PLOS ONE, 14 (12): 1-17, 2019

  38. arXiv:1708.02657  [pdf, other

    cs.CL cs.LG

    Which Encoding is the Best for Text Classification in Chinese, English, Japanese and Korean?

    Authors: Xiang Zhang, Yann LeCun

    Abstract: This article offers an empirical study on the different ways of encoding Chinese, Japanese, Korean (CJK) and English languages for text classification. Different encoding levels are studied, including UTF-8 bytes, characters, words, romanized characters and romanized words. For all encoding levels, whenever applicable, we provide comparisons with linear models, fastText and convolutional networks.… ▽ More

    Submitted 16 August, 2017; v1 submitted 8 August, 2017; originally announced August 2017.

  39. arXiv:1706.04223  [pdf, other

    cs.LG cs.CL cs.NE

    Adversarially Regularized Autoencoders

    Authors: Jake Zhao, Yoon Kim, Kelly Zhang, Alexander M. Rush, Yann LeCun

    Abstract: Deep latent variable models, trained using variational autoencoders or generative adversarial networks, are now a key technique for representation learning of continuous structures. However, applying similar methods to discrete structures, such as text sequences or discretized images, has proven to be more challenging. In this work, we propose a flexible method for training deep latent variable mo… ▽ More

    Submitted 28 June, 2018; v1 submitted 13 June, 2017; originally announced June 2017.

    Comments: ICML 2018

  40. arXiv:1705.07177  [pdf, other

    cs.AI

    Model-Based Planning with Discrete and Continuous Actions

    Authors: Mikael Henaff, William F. Whitney, Yann LeCun

    Abstract: Action planning using learned and differentiable forward models of the world is a general approach which has a number of desirable properties, including improved sample complexity over model-free RL methods, reuse of learned models across different tasks, and the ability to perform efficient gradient-based optimization in continuous action spaces. However, this approach does not apply straightforw… ▽ More

    Submitted 4 April, 2018; v1 submitted 19 May, 2017; originally announced May 2017.

  41. arXiv:1703.07684  [pdf, other

    cs.CV cs.LG

    Predicting Deeper into the Future of Semantic Segmentation

    Authors: Pauline Luc, Natalia Neverova, Camille Couprie, Jakob Verbeek, Yann LeCun

    Abstract: The ability to predict and therefore to anticipate the future is an important attribute of intelligence. It is also of utmost importance in real-time systems, e.g. in robotics or autonomous driving, which depend on visual scene understanding for decision making. While prediction of the raw RGB pixel values in future video frames has been studied in previous work, here we introduce the novel task o… ▽ More

    Submitted 8 August, 2017; v1 submitted 22 March, 2017; originally announced March 2017.

    Comments: Accepted to ICCV 2017. Supplementary material available on the authors' webpages

  42. arXiv:1612.05231  [pdf, other

    cs.LG cs.NE stat.ML

    Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNNs

    Authors: Li **g, Yichen Shen, Tena Dubček, John Peurifoy, Scott Skirlo, Yann LeCun, Max Tegmark, Marin Soljačić

    Abstract: Using unitary (instead of general) matrices in artificial neural networks (ANNs) is a promising way to solve the gradient explosion/vanishing problem, as well as to enable ANNs to learn long-term correlations in the data. This approach appears particularly promising for Recurrent Neural Networks (RNNs). In this work, we present a new architecture for implementing an Efficient Unitary Neural Networ… ▽ More

    Submitted 3 April, 2017; v1 submitted 15 December, 2016; originally announced December 2016.

    Comments: 9 pages, 4 figures

  43. arXiv:1612.03969  [pdf, ps, other

    cs.CL

    Tracking the World State with Recurrent Entity Networks

    Authors: Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun

    Abstract: We introduce a new model, the Recurrent Entity Network (EntNet). It is equipped with a dynamic long-term memory which allows it to maintain and update a representation of the state of the world as it receives new data. For language understanding tasks, it can reason on-the-fly as it reads text, not just when it is required to answer a question or respond as is the case for a Memory Network (Sukhba… ▽ More

    Submitted 10 May, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

    Journal ref: ICLR 2017

  44. Geometric deep learning: going beyond Euclidean data

    Authors: Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, Pierre Vandergheynst

    Abstract: Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social… ▽ More

    Submitted 3 May, 2017; v1 submitted 24 November, 2016; originally announced November 2016.

  45. arXiv:1611.07476  [pdf, other

    cs.LG

    Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond

    Authors: Levent Sagun, Leon Bottou, Yann LeCun

    Abstract: We look at the eigenvalues of the Hessian of a loss function before and after training. The eigenvalue distribution is seen to be composed of two parts, the bulk which is concentrated around zero, and the edges which are scattered away from zero. We present empirical evidence for the bulk indicating how over-parametrized the system is, and for the edges that depend on the input data.

    Submitted 5 October, 2017; v1 submitted 22 November, 2016; originally announced November 2016.

    Comments: ICLR submission, 2016 - updated to match the openreview.net version

  46. arXiv:1611.03383  [pdf, other

    cs.LG stat.ML

    Disentangling factors of variation in deep representations using adversarial training

    Authors: Michael Mathieu, Junbo Zhao, Pablo Sprechmann, Aditya Ramesh, Yann LeCun

    Abstract: We introduce a conditional generative model for learning to disentangle the hidden factors of variation within a set of labeled observations, and separate them into complementary codes. One code summarizes the specified factors of variation associated with the labels. The other summarizes the remaining unspecified variability. During training, the only available source of supervision comes from ou… ▽ More

    Submitted 10 November, 2016; originally announced November 2016.

    Comments: Conference paper in NIPS 2016

  47. arXiv:1611.01838  [pdf, other

    cs.LG stat.ML

    Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

    Authors: Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina

    Abstract: This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based object… ▽ More

    Submitted 21 April, 2017; v1 submitted 6 November, 2016; originally announced November 2016.

    Comments: ICLR '17

  48. arXiv:1609.03126  [pdf, other

    cs.LG stat.ML

    Energy-based Generative Adversarial Network

    Authors: Junbo Zhao, Michael Mathieu, Yann LeCun

    Abstract: We introduce the "Energy-based Generative Adversarial Network" model (EBGAN) which views the discriminator as an energy function that attributes low energies to the regions near the data manifold and higher energies to other regions. Similar to the probabilistic GANs, a generator is seen as being trained to produce contrastive samples with minimal energies, while the discriminator is trained to as… ▽ More

    Submitted 6 March, 2017; v1 submitted 11 September, 2016; originally announced September 2016.

    Comments: Submitted to ICLR 2017

  49. arXiv:1606.08057  [pdf

    cs.RO

    Fast Incremental Learning for Off-Road Robot Navigation

    Authors: Artem Provodin, Liila Torabi, Beat Flepp, Yann LeCun, Michael Sergio, L. D. Jackel, Urs Muller, Jure Zbontar

    Abstract: A promising approach to autonomous driving is machine learning. In such systems, training datasets are created that capture the sensory input to a vehicle as well as the desired response. A disadvantage of using a learned navigation system is that the learning process itself may require a huge number of training examples and a large amount of computing. To avoid the need to collect a large trainin… ▽ More

    Submitted 26 June, 2016; originally announced June 2016.

  50. arXiv:1606.01781  [pdf, ps, other

    cs.CL cs.LG cs.NE

    Very Deep Convolutional Networks for Text Classification

    Authors: Alexis Conneau, Holger Schwenk, Loïc Barrault, Yann Lecun

    Abstract: The dominant approach for many NLP tasks are recurrent neural networks, in particular LSTMs, and convolutional neural networks. However, these architectures are rather shallow in comparison to the deep convolutional networks which have pushed the state-of-the-art in computer vision. We present a new architecture (VDCNN) for text processing which operates directly at the character level and uses on… ▽ More

    Submitted 27 January, 2017; v1 submitted 6 June, 2016; originally announced June 2016.

    Comments: 10 pages, EACL 2017, camera-ready