Skip to main content

Showing 51–85 of 85 results for author: Szlam, A

.
  1. arXiv:1711.07950  [pdf, other

    cs.CL

    Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

    Authors: Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

    Abstract: Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment. In this work we propose an interactive learning procedure called Mechanical Turker Descent (MTD) and use it to train agents to execute natural language commands grounded in a fantasy text adventure game. In MTD, Turkers compete to train better… ▽ More

    Submitted 16 April, 2018; v1 submitted 21 November, 2017; originally announced November 2017.

  2. arXiv:1707.05776  [pdf, other

    stat.ML cs.CV cs.LG

    Optimizing the Latent Space of Generative Networks

    Authors: Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam

    Abstract: Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images. In most successful applications, GAN models share two common aspects: solving a challenging saddle point optimization problem, interpreted as an adversarial game between a generator and a discriminator functions; and parameterizing the generator and the discriminator as deep… ▽ More

    Submitted 20 May, 2019; v1 submitted 18 July, 2017; originally announced July 2017.

  3. arXiv:1706.02332  [pdf, other

    cs.CV cs.LG stat.ML

    Low-shot learning with large-scale diffusion

    Authors: Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou

    Abstract: This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based… ▽ More

    Submitted 15 June, 2018; v1 submitted 7 June, 2017; originally announced June 2017.

  4. arXiv:1704.06363  [pdf, other

    cs.CV stat.ML

    Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

    Authors: Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam

    Abstract: Training convolutional networks (CNN's) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN's that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effe… ▽ More

    Submitted 20 April, 2017; originally announced April 2017.

    Comments: Appearing in CVPR 2017

  5. arXiv:1703.05407  [pdf, other

    cs.LG

    Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

    Authors: Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

    Abstract: We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be res… ▽ More

    Submitted 27 April, 2018; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: Published in ICLR 2018

  6. arXiv:1702.04770  [pdf, other

    cs.CL cs.LG cs.NE

    Training Language Models Using Target-Propagation

    Authors: Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

    Abstract: While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps. We investigate whether Target Propagation (TPROP) style approaches can address these shortcomings. Unfortunately, extensive experim… ▽ More

    Submitted 15 February, 2017; originally announced February 2017.

  7. arXiv:1702.02540  [pdf, ps, other

    cs.CL cs.AI cs.NE stat.ML

    Automatic Rule Extraction from Long Short Term Memory Networks

    Authors: W. James Murdoch, Arthur Szlam

    Abstract: Although deep learning models have proven effective at solving problems in natural language processing, the mechanism by which they come to their conclusions is often unclear. As a result, these models are generally treated as black boxes, yielding no insight of the underlying learned patterns. In this paper we consider Long Short Term Memory networks (LSTMs) and demonstrate a new approach for tra… ▽ More

    Submitted 24 February, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

    Comments: ICLR 2017 accepted paper

  8. arXiv:1701.08435  [pdf, other

    cs.LG cs.CV

    Transformation-Based Models of Video Sequences

    Authors: Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala

    Abstract: In this work we propose a simple unsupervised approach for next frame prediction in video. Instead of directly predicting the pixels in a frame given past frames, we predict the transformations needed for generating the next frame in a sequence, given the transformations of the past frames. This leads to sharper results, while using a smaller prediction model. In order to enable a fair comparison… ▽ More

    Submitted 6 February, 2023; v1 submitted 29 January, 2017; originally announced January 2017.

  9. arXiv:1612.03969  [pdf, ps, other

    cs.CL

    Tracking the World State with Recurrent Entity Networks

    Authors: Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun

    Abstract: We introduce a new model, the Recurrent Entity Network (EntNet). It is equipped with a dynamic long-term memory which allows it to maintain and update a representation of the state of the world as it receives new data. For language understanding tasks, it can reason on-the-fly as it reads text, not just when it is required to answer a question or respond as is the case for a Memory Network (Sukhba… ▽ More

    Submitted 10 May, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

    Journal ref: ICLR 2017

  10. Geometric deep learning: going beyond Euclidean data

    Authors: Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, Pierre Vandergheynst

    Abstract: Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social… ▽ More

    Submitted 3 May, 2017; v1 submitted 24 November, 2016; originally announced November 2016.

  11. arXiv:1605.07736  [pdf, other

    cs.LG cs.AI

    Learning Multiagent Communication with Backpropagation

    Authors: Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

    Abstract: Many tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their p… ▽ More

    Submitted 31 October, 2016; v1 submitted 25 May, 2016; originally announced May 2016.

    Comments: Accepted to NIPS 2016

  12. arXiv:1603.01765  [pdf, ps, other

    math.NA stat.CO

    Accurate principal component analysis via a few iterations of alternating least squares

    Authors: Arthur Szlam, Andrew Tulloch, Mark Tygert

    Abstract: A few iterations of alternating least squares with a random starting point provably suffice to produce nearly optimal spectral- and Frobenius-norm accuracies of low-rank approximations to a matrix; iterating to convergence is unnecessary. Thus, software implementing alternating least squares can be retrofitted via appropriate setting of parameters to calculate nearly optimally accurate low-rank ap… ▽ More

    Submitted 5 March, 2016; originally announced March 2016.

    Comments: 9 pages, 3 tables

    Journal ref: SIAM Journal on Matrix Analysis and Applications, 38 (2): 425-433, 2017

  13. arXiv:1602.06662  [pdf, other

    cs.NE cs.AI cs.LG stat.ML

    Recurrent Orthogonal Networks and Long-Memory Tasks

    Authors: Mikael Henaff, Arthur Szlam, Yann LeCun

    Abstract: Although RNNs have been shown to be powerful tools for processing sequential data, finding architectures or optimization strategies that allow them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets originally outlined in (Hochreiter and Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store inform… ▽ More

    Submitted 15 March, 2017; v1 submitted 22 February, 2016; originally announced February 2016.

  14. arXiv:1512.02167  [pdf, other

    cs.CV cs.CL

    Simple Baseline for Visual Question Answering

    Authors: Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

    Abstract: We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we… ▽ More

    Submitted 15 December, 2015; v1 submitted 7 December, 2015; originally announced December 2015.

    Comments: One comparison method's scores are put into the correct column, and a new experiment of generating attention map is added

  15. arXiv:1511.07401  [pdf, other

    cs.LG cs.AI cs.NE

    MazeBase: A Sandbox for Learning from Games

    Authors: Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus

    Abstract: This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning. Within it, we create 10 simple games embodying a range of algorithmic tasks (e.g. if-then statements or set negation). A variety of neural models (fully connected, convolutional network, memory network) are deployed via reinforcement learning on these… ▽ More

    Submitted 7 January, 2016; v1 submitted 23 November, 2015; originally announced November 2015.

  16. arXiv:1511.06931  [pdf, ps, other

    cs.CL cs.LG

    Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems

    Authors: Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston

    Abstract: A long-term goal of machine learning is to build intelligent conversational agents. One recent popular approach is to train end-to-end models on a large amount of real dialog transcripts between humans (Sordoni et al., 2015; Vinyals & Le, 2015; Shang et al., 2015). However, this approach leaves many questions unanswered as an understanding of the precise successes and shortcomings of each model is… ▽ More

    Submitted 19 April, 2016; v1 submitted 21 November, 2015; originally announced November 2015.

  17. arXiv:1506.08230  [pdf, other

    cs.LG cs.NE

    Convolutional networks and learning invariant to homogeneous multiplicative scalings

    Authors: Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

    Abstract: The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification st… ▽ More

    Submitted 16 February, 2016; v1 submitted 26 June, 2015; originally announced June 2015.

    Comments: 12 pages, 6 figures, 4 tables

    Journal ref: Appl. Comput. Harmon. Anal., 42 (1): 154-166, 2017

  18. arXiv:1506.05751  [pdf, other

    cs.CV

    Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

    Authors: Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

    Abstract: In this paper we introduce a generative parametric model capable of producing high quality samples of natural images. Our approach uses a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion. At each level of the pyramid, a separate generative convnet model is trained using the Generative Adversarial Nets (GAN) approach (Goodfellow e… ▽ More

    Submitted 18 June, 2015; originally announced June 2015.

  19. arXiv:1503.08895  [pdf, other

    cs.NE cs.CL

    End-To-End Memory Networks

    Authors: Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus

    Abstract: We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network (Weston et al., 2015) but unlike the model in that work, it is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. It can also be seen as an extension of RNNse… ▽ More

    Submitted 24 November, 2015; v1 submitted 30 March, 2015; originally announced March 2015.

    Comments: Accepted to NIPS 2015

  20. arXiv:1503.03438  [pdf, ps, other

    cs.LG cs.NE stat.ML

    A mathematical motivation for complex-valued convolutional networks

    Authors: Joan Bruna, Soumith Chintala, Yann LeCun, Serkan Piantino, Arthur Szlam, Mark Tygert

    Abstract: A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors followed by (2) taking the absolute value of every entry of the resulting vectors followed by (3) local averaging. For processing real-v… ▽ More

    Submitted 12 December, 2015; v1 submitted 11 March, 2015; originally announced March 2015.

    Comments: 11 pages, 3 figures; this is the retitled version submitted to the journal, "Neural Computation"

    Journal ref: Neural Computation, 28 (5): 815-825, May 2016

  21. arXiv:1412.6604  [pdf, ps, other

    cs.LG cs.CV

    Video (language) modeling: a baseline for generative models of natural videos

    Authors: MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, Sumit Chopra

    Abstract: We propose a strong baseline model for unsupervised feature learning using video data. By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations and motion patterns. The models we propose are largely borrowed from the language modeling literature, and… ▽ More

    Submitted 4 May, 2016; v1 submitted 20 December, 2014; originally announced December 2014.

  22. arXiv:1412.3510  [pdf, other

    stat.CO cs.MS

    An implementation of a randomized algorithm for principal component analysis

    Authors: Arthur Szlam, Yuval Kluger, Mark Tygert

    Abstract: Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis (PCA) and the calculation of truncated singular value decompositions (SVD). The present paper presents an essentially black-box, fool-proof implementation for Mathworks' MATLAB, a popular software platform for numerical computation. As illustrated via… ▽ More

    Submitted 10 December, 2014; originally announced December 2014.

    Comments: 13 pages, 4 figures

    Journal ref: ACM TOMS, 43(3): 28:1-28:14, 2016

  23. arXiv:1406.3837  [pdf, other

    stat.ML cs.LG

    An Incremental Reseeding Strategy for Clustering

    Authors: Xavier Bresson, Huiyi Hu, Thomas Laurent, Arthur Szlam, James von Brecht

    Abstract: In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves… ▽ More

    Submitted 15 June, 2014; originally announced June 2014.

  24. Better Feature Tracking Through Subspace Constraints

    Authors: Bryan Poling, Gilad Lerman, Arthur Szlam

    Abstract: Feature tracking in video is a crucial task in computer vision. Usually, the tracking problem is handled one feature at a time, using a single-feature tracker like the Kanade-Lucas-Tomasi algorithm, or one of its derivatives. While this approach works quite well when dealing with high-quality video and "strong" features, it often falters when faced with dark and noisy video containing low-quality… ▽ More

    Submitted 9 May, 2014; originally announced May 2014.

    Comments: 8 pages, 2 figures. CVPR 2014

    Journal ref: Proceedings of CVPR, 2014, pages 3454-3461

  25. arXiv:1312.6203  [pdf, other

    cs.LG cs.CV cs.NE

    Spectral Networks and Locally Connected Networks on Graphs

    Authors: Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann LeCun

    Abstract: Convolutional Neural Networks are extremely efficient architectures in image and audio recognition tasks, thanks to their ability to exploit the local translational invariance of signal classes over their domain. In this paper we consider possible generalizations of CNNs to signals defined on more general domains without the action of a translation group. In particular, we propose two construction… ▽ More

    Submitted 21 May, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

    Comments: 14 pages

  26. arXiv:1312.5783  [pdf, other

    cs.LG cs.CV cs.NE

    Unsupervised Feature Learning by Deep Sparse Coding

    Authors: Yunlong He, Koray Kavukcuoglu, Yun Wang, Arthur Szlam, Yanjun Qi

    Abstract: In this paper, we propose a new unsupervised feature learning framework, namely Deep Sparse Coding (DeepSC), that extends sparse coding to a multi-layer architecture for visual object recognition tasks. The main innovation of the framework is that it connects the sparse-encoders from different layers by a sparse-to-dense module. The sparse-to-dense module is a composition of a local spatial poolin… ▽ More

    Submitted 19 December, 2013; originally announced December 2013.

    Comments: 9 pages, submitted to ICLR

  27. arXiv:1311.4025  [pdf, ps, other

    stat.ML

    Signal Recovery from Pooling Representations

    Authors: Joan Bruna, Arthur Szlam, Yann LeCun

    Abstract: In this work we compute lower Lipschitz bounds of $\ell_p$ pooling operators for $p=1, 2, \infty$ as well as $\ell_p$ pooling operators preceded by half-rectification layers. These give sufficient conditions for the design of invertible neural network layers. Numerical experiments on MNIST and image patches confirm that pooling layers can be inverted with phase recovery algorithms. Moreover, the r… ▽ More

    Submitted 27 February, 2014; v1 submitted 16 November, 2013; originally announced November 2013.

    Comments: 17 pages, 3 figures

  28. arXiv:1301.3590  [pdf, ps, other

    cs.IT cs.CV

    Tree structured sparse coding on cubes

    Authors: Arthur Szlam

    Abstract: A brief description of tree structured sparse coding on the binary cube.

    Submitted 16 January, 2013; originally announced January 2013.

  29. arXiv:1301.3537  [pdf, ps, other

    cs.AI math.NA

    Learning Stable Group Invariant Representations with Convolutional Networks

    Authors: Joan Bruna, Arthur Szlam, Yann LeCun

    Abstract: Transformation groups, such as translations or rotations, effectively express part of the variability observed in many recognition problems. The group structure enables the construction of invariant signal representations with appealing mathematical properties, where convolutions, together with pooling operators, bring stability to additive and geometric perturbations of the input. Whereas physica… ▽ More

    Submitted 15 January, 2013; originally announced January 2013.

    Comments: 4 pages

  30. arXiv:1202.6384  [pdf, ps, other

    cs.CV

    Fast approximations to structured sparse coding and applications to object classification

    Authors: Arthur Szlam, Karol Gregor, Yann LeCun

    Abstract: We describe a method for fast approximation of sparse coding. The input space is subdivided by a binary decision tree, and we simultaneously learn a dictionary and assignment of allowed dictionary elements for each leaf of the tree. We store a lookup table with the assignments and the pseudoinverses for each node, allowing for very fast inference. We give an algorithm for learning the tree, the di… ▽ More

    Submitted 28 February, 2012; originally announced February 2012.

  31. Hybrid Linear Modeling via Local Best-fit Flats

    Authors: Teng Zhang, Arthur Szlam, Yi Wang, Gilad Lerman

    Abstract: We present a simple and fast geometric method for modeling data by a union of affine subspaces. The method begins by forming a collection of local best-fit affine subspaces, i.e., subspaces approximating the data in local neighborhoods. The correct sizes of the local neighborhoods are determined automatically by the Jones' $β_2$ numbers (we prove under certain geometric conditions that our method… ▽ More

    Submitted 1 May, 2012; v1 submitted 17 October, 2010; originally announced October 2010.

    Comments: This version adds some clarifications and numerical experiments as well as strengthens the previous theorem. For face experiments, we use here the Extended Yale Face Database B (cropped faces unlike previous version). This database points to a failure mode of our algorithms, but we suggest and successfully test a workaround

    Journal ref: International Journal of Computer Vision Volume 100, Issue 3 (2012), Page 217-240

  32. arXiv:1010.0422  [pdf, ps, other

    cs.CV

    Convolutional Matching Pursuit and Dictionary Training

    Authors: Arthur Szlam, Koray Kavukcuoglu, Yann LeCun

    Abstract: Matching pursuit and K-SVD is demonstrated in the translation invariant setting

    Submitted 3 October, 2010; originally announced October 2010.

  33. Randomized hybrid linear modeling by local best-fit flats

    Authors: Teng Zhang, Arthur Szlam, Yi Wang, Gilad Lerman

    Abstract: The hybrid linear modeling problem is to identify a set of d-dimensional affine sets in a D-dimensional Euclidean space. It arises, for example, in object tracking and structure from motion. The hybrid linear model can be considered as the second simplest (behind linear) manifold model of data. In this paper we will present a very simple geometric method for hybrid linear modeling based on selecti… ▽ More

    Submitted 5 May, 2010; originally announced May 2010.

    Comments: To appear in the proceedings of CVPR 2010

    Journal ref: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (13-18 June 2010), pp. 1927-1934

  34. Median K-flats for hybrid linear modeling with many outliers

    Authors: Teng Zhang, Arthur Szlam, Gilad Lerman

    Abstract: We describe the Median K-Flats (MKF) algorithm, a simple online method for hybrid linear modeling, i.e., for approximating data by a mixture of flats. This algorithm simultaneously partitions the data into clusters while finding their corresponding best approximating l1 d-flats, so that the cumulative l1 error is minimized. The current implementation restricts d-flats to be d-dimensional linear… ▽ More

    Submitted 16 September, 2009; originally announced September 2009.

    Journal ref: Proc. of 2nd IEEE International Workshop on Subspace Methods (Subspace 2009), pp. 234-241 (2009)

  35. arXiv:0809.2274  [pdf, ps, other

    stat.CO

    A randomized algorithm for principal component analysis

    Authors: Vladimir Rokhlin, Arthur Szlam, Mark Tygert

    Abstract: Principal component analysis (PCA) requires the computation of a low-rank approximation to a matrix containing the data being analyzed. In many applications of PCA, the best possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spectral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, efficient algorithms have… ▽ More

    Submitted 5 July, 2009; v1 submitted 12 September, 2008; originally announced September 2008.

    Comments: 26 pages, 6 tables, 1 figure; to appear in the SIAM Journal on Matrix Analysis and Applications

    Report number: UCLA Computational and Applied Math Technical Report 08-60

    Journal ref: A randomized algorithm for principal component analysis, SIAM Journal on Matrix Analysis and Applications, 31 (3): 1100-1124, 2009