Skip to main content

Showing 1–50 of 53 results for author: Ranzato, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.10616  [pdf, other

    cs.LG cs.CL

    DiPaCo: Distributed Path Composition

    Authors: Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Adhiguna Kuncoro, Yani Donchev, Rachita Chhaparia, Ionel Gog, Marc'Aurelio Ranzato, Jiajun Shen, Arthur Szlam

    Abstract: Progress in machine learning (ML) has been fueled by scaling neural network models. This scaling has been enabled by ever more heroic feats of engineering, necessary for accommodating ML approaches that require high bandwidth communication between devices working in parallel. In this work, we propose a co-designed modular architecture and training approach for ML models, dubbed DIstributed PAth CO… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  2. arXiv:2401.09135  [pdf, other

    cs.LG cs.CL

    Asynchronous Local-SGD Training for Language Modeling

    Authors: Bo Liu, Rachita Chhaparia, Arthur Douillard, Satyen Kale, Andrei A. Rusu, Jiajun Shen, Arthur Szlam, Marc'Aurelio Ranzato

    Abstract: Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication. This work presents an empirical study of {\it asynchronous} Local-SGD for training language models; that is, each worker updates the global parameters as soon as it has finished its SGD steps. We co… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  3. arXiv:2311.08105  [pdf, other

    cs.LG cs.CL

    DiLoCo: Distributed Low-Communication Training of Language Models

    Authors: Arthur Douillard, Qixuan Feng, Andrei A. Rusu, Rachita Chhaparia, Yani Donchev, Adhiguna Kuncoro, Marc'Aurelio Ranzato, Arthur Szlam, Jiajun Shen

    Abstract: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of tightly interconnected accelerators, with devices exchanging gradients and other intermediate states at each optimization step. While it is difficult to build and maintain a single computing cluster hosting many accelerators,… ▽ More

    Submitted 2 December, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

  4. arXiv:2307.05741  [pdf, other

    cs.CL

    Towards Robust and Efficient Continual Language Learning

    Authors: Adam Fisch, Amal Rannen-Triki, Razvan Pascanu, Jörg Bornschein, Angeliki Lazaridou, Elena Gribovskaya, Marc'Aurelio Ranzato

    Abstract: As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks. We approach this classic question from a continual learning perspective, in which we aim to continue fine-tuning models trained on past tasks on new tasks, with the goal of "transferring" relevant knowledge. However, this strategy also runs the risk of doing m… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  5. arXiv:2304.13164  [pdf, other

    cs.LG cs.AI

    Towards Compute-Optimal Transfer Learning

    Authors: Massimo Caccia, Alexandre Galashov, Arthur Douillard, Amal Rannen-Triki, Dushyant Rao, Michela Paganini, Laurent Charlin, Marc'Aurelio Ranzato, Razvan Pascanu

    Abstract: The field of transfer learning is undergoing a significant shift with the introduction of large pretrained models which have demonstrated strong adaptability to a variety of downstream tasks. However, the high computational and memory requirements to finetune or use these models can be a hindrance to their widespread use. In this study, we present a solution to this issue by proposing a simple yet… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

  6. arXiv:2211.11747  [pdf, other

    cs.LG cs.CV

    NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

    Authors: Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuang Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc'Aurelio Ranzato

    Abstract: A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks. An even more ambitious goal is to build models that never stop adapting, and that become increasingly more efficient through time by suitably transferring the accrued knowledge. Beyond the study o… ▽ More

    Submitted 16 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  7. arXiv:2210.04971  [pdf, other

    cs.LG cs.AI

    Multi-step Planning for Automated Hyperparameter Optimization with OptFormer

    Authors: Lucio M. Dery, Abram L. Friesen, Nando De Freitas, Marc'Aurelio Ranzato, Yutian Chen

    Abstract: As machine learning permeates more industries and models become more expensive and time consuming to train, the need for efficient automated hyperparameter optimization (HPO) has never been more pressing. Multi-step planning based approaches to hyperparameter optimization promise improved efficiency over myopic alternatives by more effectively balancing out exploration and exploitation. However, t… ▽ More

    Submitted 16 November, 2022; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: 8 pages, 7 figures

  8. arXiv:2205.13320  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Learning Universal Hyperparameter Optimizers with Transformers

    Authors: Yutian Chen, Xingyou Song, Chansoo Lee, Zi Wang, Qiuyi Zhang, David Dohan, Kazuya Kawakami, Greg Kochanski, Arnaud Doucet, Marc'aurelio Ranzato, Sagi Perel, Nando de Freitas

    Abstract: Meta-learning hyperparameter optimization (HPO) algorithms from prior experiments is a promising approach to improve optimization efficiency over objective functions from a similar distribution. However, existing methods are restricted to learning from experiments sharing the same set of hyperparameters. In this paper, we introduce the OptFormer, the first text-based Transformer HPO framework that… ▽ More

    Submitted 13 October, 2022; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Published as a conference paper in Neural Information Processing Systems (NeurIPS) 2022. Code can be found in https://github.com/google-research/optformer and Google AI Blog can be found in https://ai.googleblog.com/2022/08/optformer-towards-universal.html

  9. arXiv:2106.09563  [pdf, other

    cs.LG cs.CV

    On Anytime Learning at Macroscale

    Authors: Lucas Caccia, **g Xu, Myle Ott, Marc'Aurelio Ranzato, Ludovic Denoyer

    Abstract: In many practical applications of machine learning data arrives sequentially over time in large chunks. Practitioners have then to decide how to allocate their computational budget in order to obtain the best performance at any point in time. Online learning theory for convex optimization suggests that the best strategy is to use data as soon as it arrives. However, this might not be the best stra… ▽ More

    Submitted 2 August, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted at the Conference on Lifelong Learning Agents (CoLLAs) 2022

  10. arXiv:2106.03193  [pdf, other

    cs.CL cs.AI

    The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

    Authors: Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc'Aurelio Ranzato, Francisco Guzman, Angela Fan

    Abstract: One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either lack good coverage of low-resource languages, consider only restricted domains, or are low quality because they are constructed using semi-automatic procedures. In this work, we introduce the FLORES-101 evaluation benc… ▽ More

    Submitted 6 June, 2021; originally announced June 2021.

  11. arXiv:2012.12631  [pdf, other

    cs.LG

    Efficient Continual Learning with Modular Networks and Task-Driven Priors

    Authors: Tom Veniat, Ludovic Denoyer, Marc'Aurelio Ranzato

    Abstract: Existing literature in Continual Learning (CL) has focused on overcoming catastrophic forgetting, the inability of the learner to recall how to perform tasks observed in the past. There are however other desirable properties of a CL system, such as the ability to transfer knowledge from previous tasks and to scale memory and compute sub-linearly with the number of tasks. Since most current benchma… ▽ More

    Submitted 12 February, 2021; v1 submitted 23 December, 2020; originally announced December 2020.

    Comments: Accepted as a conference paper at ICLR 2021

  12. arXiv:2012.09543  [pdf, other

    cs.LG

    Few-shot Sequence Learning with Transformers

    Authors: Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak Lee, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens and propose an efficient learning algorithm based on Transformers. In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that t… ▽ More

    Submitted 17 December, 2020; originally announced December 2020.

    Comments: NeurIPS Meta-Learning Workshop 2020

  13. arXiv:2005.00581  [pdf, other

    cs.CL cs.LG

    Multi-scale Transformer Language Models

    Authors: Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau

    Abstract: We investigate multi-scale transformer language models that learn representations of text at multiple scales, and present three different architectures that have an inductive bias to handle the hierarchical nature of language. Experiments on large-scale language modeling benchmarks empirically demonstrate favorable likelihood vs memory footprint trade-offs, e.g. we show that it is possible to trai… ▽ More

    Submitted 1 May, 2020; originally announced May 2020.

  14. arXiv:2004.11714  [pdf, other

    cs.CL cs.LG

    Residual Energy-Based Models for Text Generation

    Authors: Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, Marc'Aurelio Ranzato

    Abstract: Text generation is ubiquitous in many NLP tasks, from summarization, to dialogue and machine translation. The dominant parametric approach is based on locally normalized models which predict one word at a time. While these work remarkably well, they are plagued by exposure bias due to the greedy nature of the generation process. In this work, we investigate un-normalized energy-based models (EBMs)… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

    Comments: published at ICLR 2020. arXiv admin note: substantial text overlap with arXiv:2004.10188

    Journal ref: ICLR 2020

  15. arXiv:2004.10188  [pdf, other

    cs.CL cs.LG stat.ML

    Residual Energy-Based Models for Text

    Authors: Anton Bakhtin, Yuntian Deng, Sam Gross, Myle Ott, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Current large-scale auto-regressive language models display impressive fluency and can generate convincing text. In this work we start by asking the question: Can the generations of these models be reliably distinguished from real text by statistical discriminators? We find experimentally that the answer is affirmative when we have access to the training data for the model, and guardedly affirmati… ▽ More

    Submitted 21 December, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

    Comments: long journal version

    Journal ref: Journal of Machine Learning Research 21 (2020) 1-41

  16. arXiv:1910.06848  [pdf, other

    cs.CL

    Facebook AI's WAT19 Myanmar-English Translation Task Submission

    Authors: Peng-Jen Chen, Jiajun Shen, Matt Le, Vishrav Chaudhary, Ahmed El-Kishky, Guillaume Wenzek, Myle Ott, Marc'Aurelio Ranzato

    Abstract: This paper describes Facebook AI's submission to the WAT 2019 Myanmar-English translation task. Our baseline systems are BPE-based transformer models. We explore methods to leverage monolingual data to improve generalization, including self-training, back-translation and their combination. We further improve results by using noisy channel re-ranking and ensembling. We demonstrate that these techni… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

    Comments: The 6th Workshop on Asian Translation

  17. arXiv:1909.13788  [pdf, other

    cs.LG cs.CL stat.ML

    Revisiting Self-Training for Neural Sequence Generation

    Authors: Junxian He, Jiatao Gu, Jiajun Shen, Marc'Aurelio Ranzato

    Abstract: Self-training is one of the earliest and simplest semi-supervised methods. The key idea is to augment the original labeled dataset with unlabeled data paired with the model's prediction (i.e. the pseudo-parallel data). While self-training has been extensively studied on classification problems, in complex sequence generation tasks (e.g. machine translation) it is still unclear how self-training wo… ▽ More

    Submitted 18 October, 2020; v1 submitted 30 September, 2019; originally announced September 2019.

    Comments: ICLR 2020. The first two authors contributed equally. Updated to fix typos

  18. arXiv:1909.13151  [pdf, other

    cs.CL

    The Source-Target Domain Mismatch Problem in Machine Translation

    Authors: Jiajun Shen, Peng-Jen Chen, Matt Le, Junxian He, Jiatao Gu, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

    Abstract: While we live in an increasingly interconnected world, different places still exhibit strikingly different cultures and many events we experience in our every day life pertain only to the specific place we live in. As a result, people often talk about different things in different parts of the world. In this work we study the effect of local context in machine translation and postulate that partic… ▽ More

    Submitted 16 June, 2020; v1 submitted 28 September, 2019; originally announced September 2019.

  19. arXiv:1908.05204  [pdf, other

    cs.CL

    On The Evaluation of Machine Translation Systems Trained With Back-Translation

    Authors: Sergey Edunov, Myle Ott, Marc'Aurelio Ranzato, Michael Auli

    Abstract: Back-translation is a widely used data augmentation technique which leverages target monolingual data. However, its effectiveness has been challenged since automatic metrics such as BLEU only show significant improvements for test examples where the source itself is a translation, or translationese. This is believed to be due to translationese inputs better matching the back-translated training da… ▽ More

    Submitted 18 August, 2020; v1 submitted 14 August, 2019; originally announced August 2019.

    Comments: ACL 2020

  20. arXiv:1907.05242  [pdf, other

    cs.CL cs.LG

    Large Memory Layers with Product Keys

    Authors: Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

    Abstract: This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase th… ▽ More

    Submitted 15 December, 2019; v1 submitted 10 July, 2019; originally announced July 2019.

    Comments: Advances in Neural Information Processing Systems, 2019

  21. arXiv:1906.03351  [pdf, other

    cs.LG cs.CL stat.ML

    Real or Fake? Learning to Discriminate Machine from Human Generated Text

    Authors: Anton Bakhtin, Sam Gross, Myle Ott, Yuntian Deng, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Energy-based models (EBMs), a.k.a. un-normalized models, have had recent successes in continuous spaces. However, they have not been successfully applied to model text sequences. While decreasing the energy at training samples is straightforward, mining (negative) samples where the energy should be increased is difficult. In part, this is because standard gradient-based methods are not readily app… ▽ More

    Submitted 25 November, 2019; v1 submitted 7 June, 2019; originally announced June 2019.

  22. arXiv:1905.05908  [pdf, other

    cs.CV

    Task-Driven Modular Networks for Zero-Shot Compositional Learning

    Authors: Senthil Purushwalkam, Maximilian Nickel, Abhinav Gupta, Marc'Aurelio Ranzato

    Abstract: One of the hallmarks of human intelligence is the ability to compose learned knowledge into novel concepts which can be recognized without a single training example. In contrast, current state-of-the-art methods require hundreds of training examples for each possible category to build reliable and accurate classifiers. To alleviate this striking difference in efficiency, we propose a task-driven m… ▽ More

    Submitted 14 May, 2019; originally announced May 2019.

    Comments: http://www.cs.cmu.edu/~spurushw/projects/compositional.html

  23. arXiv:1902.10486  [pdf, other

    cs.LG stat.ML

    On Tiny Episodic Memories in Continual Learning

    Authors: Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K. Dokania, Philip H. S. Torr, Marc'Aurelio Ranzato

    Abstract: In continual learning (CL), an agent learns from a stream of tasks leveraging prior experience to transfer knowledge to future tasks. It is an ideal framework to decrease the amount of supervision in the existing learning algorithms. But for a successful knowledge transfer, the learner needs to remember how to perform previous tasks. One way to endow the learner the ability to perform tasks seen i… ▽ More

    Submitted 4 June, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Comments: Making the main point of the paper more clear

  24. arXiv:1902.07816  [pdf, other

    cs.CL cs.LG

    Mixture Models for Diverse Machine Translation: Tricks of the Trade

    Authors: Tianxiao Shen, Myle Ott, Michael Auli, Marc'Aurelio Ranzato

    Abstract: Mixture models trained via EM are among the simplest, most widely used and well understood latent variable models in the machine learning literature. Surprisingly, these models have been hardly explored in text generation applications such as machine translation. In principle, they provide a latent variable to control generation and produce a diverse set of hypotheses. In practice, however, mixtur… ▽ More

    Submitted 24 May, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

    Comments: ICML 2019 camera-ready

  25. arXiv:1902.01382  [pdf, other

    cs.CL

    The FLoRes Evaluation Datasets for Low-Resource Machine Translation: Nepali-English and Sinhala-English

    Authors: Francisco Guzmán, Peng-Jen Chen, Myle Ott, Juan Pino, Guillaume Lample, Philipp Koehn, Vishrav Chaudhary, Marc'Aurelio Ranzato

    Abstract: For machine translation, a vast majority of language pairs in the world are considered low-resource because they have little parallel data available. Besides the technical challenges of learning with limited supervision, it is difficult to evaluate methods trained on low-resource language pairs because of the lack of freely and publicly available benchmarks. In this work, we introduce the FLoRes e… ▽ More

    Submitted 14 September, 2019; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: EMNLP 2019

  26. arXiv:1812.00420  [pdf, other

    cs.LG stat.ML

    Efficient Lifelong Learning with A-GEM

    Authors: Arslan Chaudhry, Marc'Aurelio Ranzato, Marcus Rohrbach, Mohamed Elhoseiny

    Abstract: In lifelong learning, the learner is presented with a sequence of tasks, incrementally building a data-driven prior which may be leveraged to speed up learning of a new task. In this work, we investigate the efficiency of current lifelong approaches, in terms of sample complexity, computational and memory cost. Towards this end, we first introduce a new and a more realistic evaluation protocol, wh… ▽ More

    Submitted 9 January, 2019; v1 submitted 2 December, 2018; originally announced December 2018.

    Comments: Published as a conference paper at ICLR 2019

  27. arXiv:1811.00552  [pdf, other

    cs.CL cs.LG

    Multiple-Attribute Text Style Transfer

    Authors: Sandeep Subramanian, Guillaume Lample, Eric Michael Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau

    Abstract: The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style". In this paper, we show that this condition is not necessary and is not always met in practice, even with domain adversarial training that explicitly aims at learning such disentangled representations. We thus propose… ▽ More

    Submitted 20 September, 2019; v1 submitted 1 November, 2018; originally announced November 2018.

  28. arXiv:1804.07755  [pdf, other

    cs.CL

    Phrase-Based & Neural Unsupervised Machine Translation

    Authors: Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

    Abstract: Machine translation systems achieve near human-level performance on some languages, yet their effectiveness strongly relies on the availability of large amounts of parallel sentences, which hinders their applicability to the majority of language pairs. This work investigates how to learn to translate when having access to only large monolingual corpora in each language. We propose two model varian… ▽ More

    Submitted 13 August, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

    Comments: EMNLP 2018

  29. arXiv:1804.07705  [pdf, other

    cs.CL

    Lightweight Adaptive Mixture of Neural and N-gram Language Models

    Authors: Anton Bakhtin, Arthur Szlam, Marc'Aurelio Ranzato, Edouard Grave

    Abstract: It is often the case that the best performing language model is an ensemble of a neural language model with n-grams. In this work, we propose a method to improve how these two models are combined. By using a small network which predicts the mixture weight between the two models, we adapt their relative importance at each time step. Because the gating network is small, it trains quickly on small am… ▽ More

    Submitted 26 October, 2018; v1 submitted 20 April, 2018; originally announced April 2018.

  30. arXiv:1803.00047  [pdf, other

    cs.CL

    Analyzing Uncertainty in Neural Machine Translation

    Authors: Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

    Abstract: Machine translation is a popular test bed for research in neural sequence-to-sequence models but despite much recent research, there is still a lack of understanding of these models. Practitioners report performance degradation with large beams, the under-estimation of rare words and a lack of diversity in the final translations. Our study relates some of these issues to the inherent uncertainty o… ▽ More

    Submitted 13 August, 2018; v1 submitted 28 February, 2018; originally announced March 2018.

    Comments: ICML 2018

  31. arXiv:1711.04956  [pdf, other

    cs.CL

    Classical Structured Prediction Losses for Sequence to Sequence Learning

    Authors: Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato

    Abstract: There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam. In this paper, we survey a range of classical objective functions that have been widely used to train linear models for structured prediction and apply them to neural sequence to sequence models. Our experiments show that these losse… ▽ More

    Submitted 5 October, 2018; v1 submitted 14 November, 2017; originally announced November 2017.

    Comments: 10 pages, NAACL 2018

  32. arXiv:1711.00043  [pdf, other

    cs.CL cs.AI

    Unsupervised Machine Translation Using Monolingual Corpora Only

    Authors: Guillaume Lample, Alexis Conneau, Ludovic Denoyer, Marc'Aurelio Ranzato

    Abstract: Machine translation has recently achieved impressive performance thanks to recent advances in deep learning and the availability of large-scale parallel corpora. There have been numerous attempts to extend these successes to low-resource language pairs, yet requiring tens of thousands of parallel sentences. In this work, we take this research direction to the extreme and investigate whether it is… ▽ More

    Submitted 13 April, 2018; v1 submitted 31 October, 2017; originally announced November 2017.

    Comments: ICLR 2018

  33. arXiv:1710.04087  [pdf, other

    cs.CL

    Word Translation Without Parallel Data

    Authors: Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou

    Abstract: State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a comm… ▽ More

    Submitted 30 January, 2018; v1 submitted 11 October, 2017; originally announced October 2017.

    Comments: ICLR 2018

  34. arXiv:1706.08840  [pdf, other

    cs.LG cs.AI

    Gradient Episodic Memory for Continual Learning

    Authors: David Lopez-Paz, Marc'Aurelio Ranzato

    Abstract: One major obstacle towards AI is the poor ability of models to solve new problems quicker, and without forgetting previously acquired knowledge. To better understand this issue, we study the problem of continual learning, where the model observes, once and one by one, examples concerning a sequence of tasks. First, we propose a set of metrics to evaluate models learning over a continuum of data. T… ▽ More

    Submitted 13 September, 2022; v1 submitted 26 June, 2017; originally announced June 2017.

    Comments: Published at NIPS 2017

  35. arXiv:1706.00409  [pdf, other

    cs.CV

    Fader Networks: Manipulating Images by Sliding Attributes

    Authors: Guillaume Lample, Neil Zeghidour, Nicolas Usunier, Antoine Bordes, Ludovic Denoyer, Marc'Aurelio Ranzato

    Abstract: This paper introduces a new encoder-decoder architecture that is trained to reconstruct images by disentangling the salient information of the image and the values of attributes directly in the latent space. As a result, after training, our model can generate different realistic versions of an input image by varying the attribute values. By using continuous attribute values, we can choose how much… ▽ More

    Submitted 28 January, 2018; v1 submitted 1 June, 2017; originally announced June 2017.

    Comments: NIPS 2017

  36. arXiv:1704.06363  [pdf, other

    cs.CV stat.ML

    Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

    Authors: Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Training convolutional networks (CNN's) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN's that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effe… ▽ More

    Submitted 20 April, 2017; originally announced April 2017.

    Comments: Appearing in CVPR 2017

  37. arXiv:1702.04770  [pdf, other

    cs.CL cs.LG cs.NE

    Training Language Models Using Target-Propagation

    Authors: Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

    Abstract: While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps. We investigate whether Target Propagation (TPROP) style approaches can address these shortcomings. Unfortunately, extensive experim… ▽ More

    Submitted 15 February, 2017; originally announced February 2017.

  38. arXiv:1701.08435  [pdf, other

    cs.LG cs.CV

    Transformation-Based Models of Video Sequences

    Authors: Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala

    Abstract: In this work we propose a simple unsupervised approach for next frame prediction in video. Instead of directly predicting the pixels in a frame given past frames, we predict the transformations needed for generating the next frame in a sequence, given the transformations of the past frames. This leads to sharper results, while using a smaller prediction model. In order to enable a fair comparison… ▽ More

    Submitted 6 February, 2023; v1 submitted 29 January, 2017; originally announced January 2017.

  39. arXiv:1612.04936  [pdf, other

    cs.CL cs.AI

    Learning through Dialogue Interactions by Asking Questions

    Authors: Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston

    Abstract: A good dialogue agent should have the ability to interact with users by both responding to questions and by asking questions, and importantly to learn from both types of interaction. In this work, we explore this direction by designing a simulator and a set of synthetic tasks in the movie domain that allow such interactions between a learner and a teacher. We investigate how a learner can benefit… ▽ More

    Submitted 13 February, 2017; v1 submitted 15 December, 2016; originally announced December 2016.

  40. arXiv:1611.09823  [pdf, other

    cs.AI cs.CL

    Dialogue Learning With Human-In-The-Loop

    Authors: Jiwei Li, Alexander H. Miller, Sumit Chopra, Marc'Aurelio Ranzato, Jason Weston

    Abstract: An important aspect of develo** conversational agents is to give a bot the ability to improve through communicating with humans and to learn from the mistakes that it makes. Most research has focused on learning from fixed training sets of labeled data rather than interacting with a dialogue partner in an online fashion. In this paper we explore this direction in a reinforcement learning setting… ▽ More

    Submitted 13 January, 2017; v1 submitted 29 November, 2016; originally announced November 2016.

  41. arXiv:1511.06732  [pdf, other

    cs.LG cs.CL

    Sequence Level Training with Recurrent Neural Networks

    Authors: Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, Wojciech Zaremba

    Abstract: Many natural language processing applications use language models to generate text. These models are typically trained to predict the next word in a sequence, given the previous words and some context such as an image. However, at test time the model is expected to generate the entire sequence from scratch. This discrepancy makes generation brittle, as errors may accumulate along the way. We addre… ▽ More

    Submitted 6 May, 2016; v1 submitted 20 November, 2015; originally announced November 2015.

  42. arXiv:1506.08230  [pdf, other

    cs.LG cs.NE

    Convolutional networks and learning invariant to homogeneous multiplicative scalings

    Authors: Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

    Abstract: The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification st… ▽ More

    Submitted 16 February, 2016; v1 submitted 26 June, 2015; originally announced June 2015.

    Comments: 12 pages, 6 figures, 4 tables

    Journal ref: Appl. Comput. Harmon. Anal., 42 (1): 154-166, 2017

  43. arXiv:1412.7753  [pdf, other

    cs.NE cs.LG

    Learning Longer Memory in Recurrent Neural Networks

    Authors: Tomas Mikolov, Armand Joulin, Sumit Chopra, Michael Mathieu, Marc'Aurelio Ranzato

    Abstract: Recurrent neural network is a powerful model that learns temporal patterns in sequential data. For a long time, it was believed that recurrent networks are difficult to train using simple optimizers, such as stochastic gradient descent, due to the so-called vanishing gradient problem. In this paper, we show that learning longer term patterns in real data, such as in natural language, is perfectly… ▽ More

    Submitted 16 April, 2015; v1 submitted 24 December, 2014; originally announced December 2014.

  44. arXiv:1412.6604  [pdf, ps, other

    cs.LG cs.CV

    Video (language) modeling: a baseline for generative models of natural videos

    Authors: MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, Sumit Chopra

    Abstract: We propose a strong baseline model for unsupervised feature learning using video data. By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations and motion patterns. The models we propose are largely borrowed from the language modeling literature, and… ▽ More

    Submitted 4 May, 2016; v1 submitted 20 December, 2014; originally announced December 2014.

  45. arXiv:1412.5335  [pdf, ps, other

    cs.CL cs.IR cs.LG cs.NE

    Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews

    Authors: Grégoire Mesnil, Tomas Mikolov, Marc'Aurelio Ranzato, Yoshua Bengio

    Abstract: Sentiment analysis is a common task in natural language processing that aims to detect polarity of a text document (typically a consumer review). In the simplest settings, we discriminate only between positive and negative sentiment, turning the task into a standard binary classification problem. We compare several ma- chine learning approaches to this problem, and combine them to achieve the best… ▽ More

    Submitted 27 May, 2015; v1 submitted 17 December, 2014; originally announced December 2014.

  46. arXiv:1406.5266  [pdf, other

    cs.CV

    Web-Scale Training for Face Identification

    Authors: Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, Lior Wolf

    Abstract: Scaling machine learning methods to very large datasets has attracted considerable attention in recent years, thanks to easy access to ubiquitous sensing and data from the web. We study face recognition and show that three distinct properties have surprising effects on the transferability of deep convolutional networks (CNN): (1) The bottleneck of the network serves as an important transfer learni… ▽ More

    Submitted 18 April, 2015; v1 submitted 19 June, 2014; originally announced June 2014.

  47. arXiv:1405.5488  [pdf, other

    cs.CV cs.LG

    On Learning Where To Look

    Authors: Marc'Aurelio Ranzato

    Abstract: Current automatic vision systems face two major challenges: scalability and extreme variability of appearance. First, the computational time required to process an image typically scales linearly with the number of pixels in the image, therefore limiting the resolution of input images to thumbnail size. Second, variability in appearance and pose of the objects constitute a major hurdle for robust… ▽ More

    Submitted 23 April, 2014; originally announced May 2014.

    Comments: deep learning, vision

  48. arXiv:1312.5853  [pdf, other

    cs.LG cs.NE

    Multi-GPU Training of ConvNets

    Authors: Omry Yadan, Keith Adams, Yaniv Taigman, Marc'Aurelio Ranzato

    Abstract: In this work we evaluate different approaches to parallelize computation of convolutional neural networks across several GPUs.

    Submitted 18 February, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: Machine Learning, Deep Learning, Convolutional Networks, Computer Vision, GPU, CUDA

  49. arXiv:1312.4314  [pdf, other

    cs.LG

    Learning Factored Representations in a Deep Mixture of Experts

    Authors: David Eigen, Marc'Aurelio Ranzato, Ilya Sutskever

    Abstract: Mixtures of Experts combine the outputs of several "expert" networks, each of which specializes in a different part of the input space. This is achieved by training a "gating" network that maps each input to a distribution over the experts. Such models show promise for building larger networks that are still cheap to compute at test time, and more parallelizable at training time. In this this work… ▽ More

    Submitted 9 March, 2014; v1 submitted 16 December, 2013; originally announced December 2013.

  50. arXiv:1311.5591  [pdf, other

    cs.CV

    PANDA: Pose Aligned Networks for Deep Attribute Modeling

    Authors: Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev

    Abstract: We propose a method for inferring human attributes (such as gender, hair style, clothes style, expression, action) from images of people under large variation of viewpoint, pose, appearance, articulation and occlusion. Convolutional Neural Nets (CNN) have been shown to perform very well on large scale object recognition problems. In the context of attribute classification, however, the signal is o… ▽ More

    Submitted 5 May, 2014; v1 submitted 21 November, 2013; originally announced November 2013.

    Comments: 8 pages