Search | arXiv e-print repository

Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent

Authors: Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston

Abstract: Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment. In this work we propose an interactive learning procedure called Mechanical Turker Descent (MTD) and use it to train agents to execute natural language commands grounded in a fantasy text adventure game. In MTD, Turkers compete to train better… ▽ More Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment. In this work we propose an interactive learning procedure called Mechanical Turker Descent (MTD) and use it to train agents to execute natural language commands grounded in a fantasy text adventure game. In MTD, Turkers compete to train better agents in the short term, and collaborate by sharing their agents' skills in the long term. This results in a gamified, engaging experience for the Turkers and a better quality teaching signal for the agents compared to static datasets, as the Turkers naturally adapt the training data to the agent's abilities. △ Less

Submitted 16 April, 2018; v1 submitted 21 November, 2017; originally announced November 2017.

arXiv:1707.05776 [pdf, other]

Optimizing the Latent Space of Generative Networks

Authors: Piotr Bojanowski, Armand Joulin, David Lopez-Paz, Arthur Szlam

Abstract: Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images. In most successful applications, GAN models share two common aspects: solving a challenging saddle point optimization problem, interpreted as an adversarial game between a generator and a discriminator functions; and parameterizing the generator and the discriminator as deep… ▽ More Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images. In most successful applications, GAN models share two common aspects: solving a challenging saddle point optimization problem, interpreted as an adversarial game between a generator and a discriminator functions; and parameterizing the generator and the discriminator as deep convolutional neural networks. The goal of this paper is to disentangle the contribution of these two factors to the success of GANs. In particular, we introduce Generative Latent Optimization (GLO), a framework to train deep convolutional generators using simple reconstruction losses. Throughout a variety of experiments, we show that GLO enjoys many of the desirable properties of GANs: synthesizing visually-appealing samples, interpolating meaningfully between samples, and performing linear arithmetic with noise vectors; all of this without the adversarial optimization scheme. △ Less

Submitted 20 May, 2019; v1 submitted 18 July, 2017; originally announced July 2017.

arXiv:1706.02332 [pdf, other]

Low-shot learning with large-scale diffusion

Authors: Matthijs Douze, Arthur Szlam, Bharath Hariharan, Hervé Jégou

Abstract: This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based… ▽ More This paper considers the problem of inferring image labels from images when only a few annotated examples are available at training time. This setup is often referred to as low-shot learning, where a standard approach is to re-train the last few layers of a convolutional neural network learned on separate classes for which training examples are abundant. We consider a semi-supervised setting based on a large collection of images to support label propagation. This is possible by leveraging the recent advances on large-scale similarity graph construction. We show that despite its conceptual simplicity, scaling label propagation up to hundred millions of images leads to state of the art accuracy in the low-shot learning regime. △ Less

Submitted 15 June, 2018; v1 submitted 7 June, 2017; originally announced June 2017.

arXiv:1704.06363 [pdf, other]

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Authors: Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam

Abstract: Training convolutional networks (CNN's) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN's that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effe… ▽ More Training convolutional networks (CNN's) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large CNN's that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect on large scale hashtag (multilabel) prediction tasks. Mixture of experts models are not new (Jacobs et. al. 1991, Collobert et. al. 2003), but in the past, researchers have had to devise sophisticated methods to deal with data fragmentation. We show empirically that modern weakly supervised data sets are large enough to support naive partitioning schemes where each data point is assigned to a single expert. Because the experts are independent, training them in parallel is easy, and evaluation is cheap for the size of the model. Furthermore, we show that we can use a single decoding layer for all the experts, allowing a unified feature embedding space. We demonstrate that it is feasible (and in fact relatively painless) to train far larger models than could be practically trained with standard CNN architectures, and that the extra capacity can be well used on current datasets. △ Less

Submitted 20 April, 2017; originally announced April 2017.

Comments: Appearing in CVPR 2017

arXiv:1703.05407 [pdf, other]

Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

Authors: Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

Abstract: We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be res… ▽ More We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be reset. Alice will "propose" the task by doing a sequence of actions and then Bob must undo or repeat them, respectively. Via an appropriate reward structure, Alice and Bob automatically generate a curriculum of exploration, enabling unsupervised training of the agent. When Bob is deployed on an RL task within the environment, this unsupervised training reduces the number of supervised episodes needed to learn, and in some cases converges to a higher reward. △ Less

Submitted 27 April, 2018; v1 submitted 15 March, 2017; originally announced March 2017.

Comments: Published in ICLR 2018

arXiv:1702.04770 [pdf, other]

Training Language Models Using Target-Propagation

Authors: Sam Wiseman, Sumit Chopra, Marc'Aurelio Ranzato, Arthur Szlam, Ruoyu Sun, Soumith Chintala, Nicolas Vasilache

Abstract: While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps. We investigate whether Target Propagation (TPROP) style approaches can address these shortcomings. Unfortunately, extensive experim… ▽ More While Truncated Back-Propagation through Time (BPTT) is the most popular approach to training Recurrent Neural Networks (RNNs), it suffers from being inherently sequential (making parallelization difficult) and from truncating gradient flow between distant time-steps. We investigate whether Target Propagation (TPROP) style approaches can address these shortcomings. Unfortunately, extensive experiments suggest that TPROP generally underperforms BPTT, and we end with an analysis of this phenomenon, and suggestions for future work. △ Less

Submitted 15 February, 2017; originally announced February 2017.

arXiv:1702.02540 [pdf, ps, other]

Automatic Rule Extraction from Long Short Term Memory Networks

Authors: W. James Murdoch, Arthur Szlam

Abstract: Although deep learning models have proven effective at solving problems in natural language processing, the mechanism by which they come to their conclusions is often unclear. As a result, these models are generally treated as black boxes, yielding no insight of the underlying learned patterns. In this paper we consider Long Short Term Memory networks (LSTMs) and demonstrate a new approach for tra… ▽ More Although deep learning models have proven effective at solving problems in natural language processing, the mechanism by which they come to their conclusions is often unclear. As a result, these models are generally treated as black boxes, yielding no insight of the underlying learned patterns. In this paper we consider Long Short Term Memory networks (LSTMs) and demonstrate a new approach for tracking the importance of a given input to the LSTM for a given output. By identifying consistently important patterns of words, we are able to distill state of the art LSTMs on sentiment analysis and question answering into a set of representative phrases. This representation is then quantitatively validated by using the extracted phrases to construct a simple, rule-based classifier which approximates the output of the LSTM. △ Less

Submitted 24 February, 2017; v1 submitted 8 February, 2017; originally announced February 2017.

Comments: ICLR 2017 accepted paper

arXiv:1701.08435 [pdf, other]

Transformation-Based Models of Video Sequences

Authors: Joost van Amersfoort, Anitha Kannan, Marc'Aurelio Ranzato, Arthur Szlam, Du Tran, Soumith Chintala

Abstract: In this work we propose a simple unsupervised approach for next frame prediction in video. Instead of directly predicting the pixels in a frame given past frames, we predict the transformations needed for generating the next frame in a sequence, given the transformations of the past frames. This leads to sharper results, while using a smaller prediction model. In order to enable a fair comparison… ▽ More In this work we propose a simple unsupervised approach for next frame prediction in video. Instead of directly predicting the pixels in a frame given past frames, we predict the transformations needed for generating the next frame in a sequence, given the transformations of the past frames. This leads to sharper results, while using a smaller prediction model. In order to enable a fair comparison between different video frame prediction models, we also propose a new evaluation protocol. We use generated frames as input to a classifier trained with ground truth sequences. This criterion guarantees that models scoring high are those producing sequences which preserve discriminative features, as opposed to merely penalizing any deviation, plausible or not, from the ground truth. Our proposed approach compares favourably against more sophisticated ones on the UCF-101 data set, while also being more efficient in terms of the number of parameters and computational cost. △ Less

Submitted 6 February, 2023; v1 submitted 29 January, 2017; originally announced January 2017.

arXiv:1612.03969 [pdf, ps, other]

Tracking the World State with Recurrent Entity Networks

Authors: Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun

Abstract: We introduce a new model, the Recurrent Entity Network (EntNet). It is equipped with a dynamic long-term memory which allows it to maintain and update a representation of the state of the world as it receives new data. For language understanding tasks, it can reason on-the-fly as it reads text, not just when it is required to answer a question or respond as is the case for a Memory Network (Sukhba… ▽ More We introduce a new model, the Recurrent Entity Network (EntNet). It is equipped with a dynamic long-term memory which allows it to maintain and update a representation of the state of the world as it receives new data. For language understanding tasks, it can reason on-the-fly as it reads text, not just when it is required to answer a question or respond as is the case for a Memory Network (Sukhbaatar et al., 2015). Like a Neural Turing Machine or Differentiable Neural Computer (Graves et al., 2014; 2016) it maintains a fixed size memory and can learn to perform location and content-based read and write operations. However, unlike those models it has a simple parallel architecture in which several memory locations can be updated simultaneously. The EntNet sets a new state-of-the-art on the bAbI tasks, and is the first method to solve all the tasks in the 10k training examples setting. We also demonstrate that it can solve a reasoning task which requires a large number of supporting facts, which other methods are not able to solve, and can generalize past its training horizon. It can also be practically used on large scale datasets such as Children's Book Test, where it obtains competitive performance, reading the story in a single pass. △ Less

Submitted 10 May, 2017; v1 submitted 12 December, 2016; originally announced December 2016.

Journal ref: ICLR 2017

arXiv:1611.08097 [pdf, other]

doi 10.1109/MSP.2017.2693418

Geometric deep learning: going beyond Euclidean data

Authors: Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, Pierre Vandergheynst

Abstract: Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social… ▽ More Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions), and are natural targets for machine learning techniques. In particular, we would like to use deep neural networks, which have recently proven to be powerful tools for a broad range of problems from computer vision, natural language processing, and audio analysis. However, these tools have been most successful on data with an underlying Euclidean or grid-like structure, and in cases where the invariances of these structures are built into networks used to model them. Geometric deep learning is an umbrella term for emerging techniques attempting to generalize (structured) deep neural models to non-Euclidean domains such as graphs and manifolds. The purpose of this paper is to overview different examples of geometric deep learning problems and present available solutions, key difficulties, applications, and future research directions in this nascent field. △ Less

Submitted 3 May, 2017; v1 submitted 24 November, 2016; originally announced November 2016.

arXiv:1605.07736 [pdf, other]

Learning Multiagent Communication with Backpropagation

Authors: Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

Abstract: Many tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their p… ▽ More Many tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their policy. We apply this model to a diverse set of tasks, demonstrating the ability of the agents to learn to communicate amongst themselves, yielding improved performance over non-communicative agents and baselines. In some cases, it is possible to interpret the language devised by the agents, revealing simple but effective strategies for solving the task at hand. △ Less

Submitted 31 October, 2016; v1 submitted 25 May, 2016; originally announced May 2016.

Comments: Accepted to NIPS 2016

arXiv:1603.01765 [pdf, ps, other]

Accurate principal component analysis via a few iterations of alternating least squares

Authors: Arthur Szlam, Andrew Tulloch, Mark Tygert

Abstract: A few iterations of alternating least squares with a random starting point provably suffice to produce nearly optimal spectral- and Frobenius-norm accuracies of low-rank approximations to a matrix; iterating to convergence is unnecessary. Thus, software implementing alternating least squares can be retrofitted via appropriate setting of parameters to calculate nearly optimally accurate low-rank ap… ▽ More A few iterations of alternating least squares with a random starting point provably suffice to produce nearly optimal spectral- and Frobenius-norm accuracies of low-rank approximations to a matrix; iterating to convergence is unnecessary. Thus, software implementing alternating least squares can be retrofitted via appropriate setting of parameters to calculate nearly optimally accurate low-rank approximations highly efficiently, with no need for convergence. △ Less

Submitted 5 March, 2016; originally announced March 2016.

Comments: 9 pages, 3 tables

Journal ref: SIAM Journal on Matrix Analysis and Applications, 38 (2): 425-433, 2017

arXiv:1602.06662 [pdf, other]

Recurrent Orthogonal Networks and Long-Memory Tasks

Authors: Mikael Henaff, Arthur Szlam, Yann LeCun

Abstract: Although RNNs have been shown to be powerful tools for processing sequential data, finding architectures or optimization strategies that allow them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets originally outlined in (Hochreiter and Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store inform… ▽ More Although RNNs have been shown to be powerful tools for processing sequential data, finding architectures or optimization strategies that allow them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets originally outlined in (Hochreiter and Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illuminate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions furthermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices. △ Less

Submitted 15 March, 2017; v1 submitted 22 February, 2016; originally announced February 2016.

arXiv:1512.02167 [pdf, other]

Simple Baseline for Visual Question Answering

Authors: Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

Abstract: We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we… ▽ More We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we also provide an interactive web demo and open-source code. . △ Less

Submitted 15 December, 2015; v1 submitted 7 December, 2015; originally announced December 2015.

Comments: One comparison method's scores are put into the correct column, and a new experiment of generating attention map is added

arXiv:1511.07401 [pdf, other]

MazeBase: A Sandbox for Learning from Games

Authors: Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus

Abstract: This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning. Within it, we create 10 simple games embodying a range of algorithmic tasks (e.g. if-then statements or set negation). A variety of neural models (fully connected, convolutional network, memory network) are deployed via reinforcement learning on these… ▽ More This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning. Within it, we create 10 simple games embodying a range of algorithmic tasks (e.g. if-then statements or set negation). A variety of neural models (fully connected, convolutional network, memory network) are deployed via reinforcement learning on these games, with and without a procedurally generated curriculum. Despite the tasks' simplicity, the performance of the models is far from optimal, suggesting directions for future development. We also demonstrate the versatility of MazeBase by using it to emulate small combat scenarios from StarCraft. Models trained on the MazeBase version can be directly applied to StarCraft, where they consistently beat the in-game AI. △ Less

Submitted 7 January, 2016; v1 submitted 23 November, 2015; originally announced November 2015.

arXiv:1511.06931 [pdf, ps, other]

Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems

Authors: Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston

Abstract: A long-term goal of machine learning is to build intelligent conversational agents. One recent popular approach is to train end-to-end models on a large amount of real dialog transcripts between humans (Sordoni et al., 2015; Vinyals & Le, 2015; Shang et al., 2015). However, this approach leaves many questions unanswered as an understanding of the precise successes and shortcomings of each model is… ▽ More A long-term goal of machine learning is to build intelligent conversational agents. One recent popular approach is to train end-to-end models on a large amount of real dialog transcripts between humans (Sordoni et al., 2015; Vinyals & Le, 2015; Shang et al., 2015). However, this approach leaves many questions unanswered as an understanding of the precise successes and shortcomings of each model is hard to assess. A contrasting recent proposal are the bAbI tasks (Weston et al., 2015b) which are synthetic data that measure the ability of learning machines at various reasoning tasks over toy language. Unfortunately, those tests are very small and hence may encourage methods that do not scale. In this work, we propose a suite of new tasks of a much larger scale that attempt to bridge the gap between the two regimes. Choosing the domain of movies, we provide tasks that test the ability of models to answer factual questions (utilizing OMDB), provide personalization (utilizing MovieLens), carry short conversations about the two, and finally to perform on natural dialogs from Reddit. We provide a dataset covering 75k movie entities and with 3.5M training examples. We present results of various models on these tasks, and evaluate their performance. △ Less

Submitted 19 April, 2016; v1 submitted 21 November, 2015; originally announced November 2015.

arXiv:1506.08230 [pdf, other]

Convolutional networks and learning invariant to homogeneous multiplicative scalings

Authors: Mark Tygert, Arthur Szlam, Soumith Chintala, Marc'Aurelio Ranzato, Yuandong Tian, Wojciech Zaremba

Abstract: The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification st… ▽ More The conventional classification schemes -- notably multinomial logistic regression -- used in conjunction with convolutional networks (convnets) are classical in statistics, designed without consideration for the usual coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification stage turns out to be more robust than multinomial logistic regression, appears to result in slightly lower errors on several standard test sets, has similar computational costs, and features precise control over the actual rate of learning. "Scale-invariant" means that multiplying the input values by any nonzero scalar leaves the output unchanged. △ Less

Submitted 16 February, 2016; v1 submitted 26 June, 2015; originally announced June 2015.

Comments: 12 pages, 6 figures, 4 tables

Journal ref: Appl. Comput. Harmon. Anal., 42 (1): 154-166, 2017

arXiv:1506.05751 [pdf, other]

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Authors: Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

Abstract: In this paper we introduce a generative parametric model capable of producing high quality samples of natural images. Our approach uses a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion. At each level of the pyramid, a separate generative convnet model is trained using the Generative Adversarial Nets (GAN) approach (Goodfellow e… ▽ More In this paper we introduce a generative parametric model capable of producing high quality samples of natural images. Our approach uses a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion. At each level of the pyramid, a separate generative convnet model is trained using the Generative Adversarial Nets (GAN) approach (Goodfellow et al.). Samples drawn from our model are of significantly higher quality than alternate approaches. In a quantitative assessment by human evaluators, our CIFAR10 samples were mistaken for real images around 40% of the time, compared to 10% for samples drawn from a GAN baseline model. We also show samples from models trained on the higher resolution images of the LSUN scene dataset. △ Less

Submitted 18 June, 2015; originally announced June 2015.

arXiv:1503.08895 [pdf, other]

End-To-End Memory Networks

Authors: Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus

Abstract: We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network (Weston et al., 2015) but unlike the model in that work, it is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. It can also be seen as an extension of RNNse… ▽ More We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network (Weston et al., 2015) but unlike the model in that work, it is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. It can also be seen as an extension of RNNsearch to the case where multiple computational steps (hops) are performed per output symbol. The flexibility of the model allows us to apply it to tasks as diverse as (synthetic) question answering and to language modeling. For the former our approach is competitive with Memory Networks, but with less supervision. For the latter, on the Penn TreeBank and Text8 datasets our approach demonstrates comparable performance to RNNs and LSTMs. In both cases we show that the key concept of multiple computational hops yields improved results. △ Less

Submitted 24 November, 2015; v1 submitted 30 March, 2015; originally announced March 2015.

Comments: Accepted to NIPS 2015

arXiv:1503.03438 [pdf, ps, other]

A mathematical motivation for complex-valued convolutional networks

Authors: Joan Bruna, Soumith Chintala, Yann LeCun, Serkan Piantino, Arthur Szlam, Mark Tygert

Abstract: A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors followed by (2) taking the absolute value of every entry of the resulting vectors followed by (3) local averaging. For processing real-v… ▽ More A complex-valued convolutional network (convnet) implements the repeated application of the following composition of three operations, recursively applying the composition to an input vector of nonnegative real numbers: (1) convolution with complex-valued vectors followed by (2) taking the absolute value of every entry of the resulting vectors followed by (3) local averaging. For processing real-valued random vectors, complex-valued convnets can be viewed as "data-driven multiscale windowed power spectra," "data-driven multiscale windowed absolute spectra," "data-driven multiwavelet absolute values," or (in their most general configuration) "data-driven nonlinear multiwavelet packets." Indeed, complex-valued convnets can calculate multiscale windowed spectra when the convnet filters are windowed complex-valued exponentials. Standard real-valued convnets, using rectified linear units (ReLUs), sigmoidal (for example, logistic or tanh) nonlinearities, max. pooling, etc., do not obviously exhibit the same exact correspondence with data-driven wavelets (whereas for complex-valued convnets, the correspondence is much more than just a vague analogy). Courtesy of the exact correspondence, the remarkably rich and rigorous body of mathematical analysis for wavelets applies directly to (complex-valued) convnets. △ Less

Submitted 12 December, 2015; v1 submitted 11 March, 2015; originally announced March 2015.

Comments: 11 pages, 3 figures; this is the retitled version submitted to the journal, "Neural Computation"

Journal ref: Neural Computation, 28 (5): 815-825, May 2016

arXiv:1412.6604 [pdf, ps, other]

Video (language) modeling: a baseline for generative models of natural videos

Authors: MarcAurelio Ranzato, Arthur Szlam, Joan Bruna, Michael Mathieu, Ronan Collobert, Sumit Chopra

Abstract: We propose a strong baseline model for unsupervised feature learning using video data. By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations and motion patterns. The models we propose are largely borrowed from the language modeling literature, and… ▽ More We propose a strong baseline model for unsupervised feature learning using video data. By learning to predict missing frames or extrapolate future frames from an input video sequence, the model discovers both spatial and temporal correlations which are useful to represent complex deformations and motion patterns. The models we propose are largely borrowed from the language modeling literature, and adapted to the vision domain by quantizing the space of image patches into a large dictionary. We demonstrate the approach on both a filling and a generation task. For the first time, we show that, after training on natural videos, such a model can predict non-trivial motions over short video sequences. △ Less

Submitted 4 May, 2016; v1 submitted 20 December, 2014; originally announced December 2014.

arXiv:1412.3510 [pdf, other]

An implementation of a randomized algorithm for principal component analysis

Authors: Arthur Szlam, Yuval Kluger, Mark Tygert

Abstract: Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis (PCA) and the calculation of truncated singular value decompositions (SVD). The present paper presents an essentially black-box, fool-proof implementation for Mathworks' MATLAB, a popular software platform for numerical computation. As illustrated via… ▽ More Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis (PCA) and the calculation of truncated singular value decompositions (SVD). The present paper presents an essentially black-box, fool-proof implementation for Mathworks' MATLAB, a popular software platform for numerical computation. As illustrated via several tests, the randomized algorithms for low-rank approximation outperform or at least match the classical techniques (such as Lanczos iterations) in basically all respects: accuracy, computational efficiency (both speed and memory usage), ease-of-use, parallelizability, and reliability. However, the classical procedures remain the methods of choice for estimating spectral norms, and are far superior for calculating the least singular values and corresponding singular vectors (or singular subspaces). △ Less

Submitted 10 December, 2014; originally announced December 2014.

Comments: 13 pages, 4 figures

Journal ref: ACM TOMS, 43(3): 28:1-28:14, 2016

arXiv:1406.3837 [pdf, other]

An Incremental Reseeding Strategy for Clustering

Authors: Xavier Bresson, Huiyi Hu, Thomas Laurent, Arthur Szlam, James von Brecht

Abstract: In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves… ▽ More In this work we propose a simple and easily parallelizable algorithm for multiway graph partitioning. The algorithm alternates between three basic components: diffusing seed vertices over the graph, thresholding the diffused seeds, and then randomly reseeding the thresholded clusters. We demonstrate experimentally that the proper combination of these ingredients leads to an algorithm that achieves state-of-the-art performance in terms of cluster purity on standard benchmarks datasets. Moreover, the algorithm runs an order of magnitude faster than the other algorithms that achieve comparable results in terms of accuracy. We also describe a coarsen, cluster and refine approach similar to GRACLUS and METIS that removes an additional order of magnitude from the runtime of our algorithm while still maintaining competitive accuracy. △ Less

Submitted 15 June, 2014; originally announced June 2014.

arXiv:1405.2316 [pdf, other]

doi 10.1109/CVPR.2014.441

Better Feature Tracking Through Subspace Constraints

Authors: Bryan Poling, Gilad Lerman, Arthur Szlam

Abstract: Feature tracking in video is a crucial task in computer vision. Usually, the tracking problem is handled one feature at a time, using a single-feature tracker like the Kanade-Lucas-Tomasi algorithm, or one of its derivatives. While this approach works quite well when dealing with high-quality video and "strong" features, it often falters when faced with dark and noisy video containing low-quality… ▽ More Feature tracking in video is a crucial task in computer vision. Usually, the tracking problem is handled one feature at a time, using a single-feature tracker like the Kanade-Lucas-Tomasi algorithm, or one of its derivatives. While this approach works quite well when dealing with high-quality video and "strong" features, it often falters when faced with dark and noisy video containing low-quality features. We present a framework for jointly tracking a set of features, which enables sharing information between the different features in the scene. We show that our method can be employed to track features for both rigid and nonrigid motions (possibly of few moving bodies) even when some features are occluded. Furthermore, it can be used to significantly improve tracking results in poorly-lit scenes (where there is a mix of good and bad features). Our approach does not require direct modeling of the structure or the motion of the scene, and runs in real time on a single CPU core. △ Less

Submitted 9 May, 2014; originally announced May 2014.

Comments: 8 pages, 2 figures. CVPR 2014

Journal ref: Proceedings of CVPR, 2014, pages 3454-3461

arXiv:1312.6203 [pdf, other]

Spectral Networks and Locally Connected Networks on Graphs

Authors: Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann LeCun

Abstract: Convolutional Neural Networks are extremely efficient architectures in image and audio recognition tasks, thanks to their ability to exploit the local translational invariance of signal classes over their domain. In this paper we consider possible generalizations of CNNs to signals defined on more general domains without the action of a translation group. In particular, we propose two construction… ▽ More Convolutional Neural Networks are extremely efficient architectures in image and audio recognition tasks, thanks to their ability to exploit the local translational invariance of signal classes over their domain. In this paper we consider possible generalizations of CNNs to signals defined on more general domains without the action of a translation group. In particular, we propose two constructions, one based upon a hierarchical clustering of the domain, and another based on the spectrum of the graph Laplacian. We show through experiments that for low-dimensional graphs it is possible to learn convolutional layers with a number of parameters independent of the input size, resulting in efficient deep architectures. △ Less

Submitted 21 May, 2014; v1 submitted 20 December, 2013; originally announced December 2013.

Comments: 14 pages

arXiv:1312.5783 [pdf, other]

Unsupervised Feature Learning by Deep Sparse Coding

Authors: Yunlong He, Koray Kavukcuoglu, Yun Wang, Arthur Szlam, Yanjun Qi

Abstract: In this paper, we propose a new unsupervised feature learning framework, namely Deep Sparse Coding (DeepSC), that extends sparse coding to a multi-layer architecture for visual object recognition tasks. The main innovation of the framework is that it connects the sparse-encoders from different layers by a sparse-to-dense module. The sparse-to-dense module is a composition of a local spatial poolin… ▽ More In this paper, we propose a new unsupervised feature learning framework, namely Deep Sparse Coding (DeepSC), that extends sparse coding to a multi-layer architecture for visual object recognition tasks. The main innovation of the framework is that it connects the sparse-encoders from different layers by a sparse-to-dense module. The sparse-to-dense module is a composition of a local spatial pooling step and a low-dimensional embedding process, which takes advantage of the spatial smoothness information in the image. As a result, the new method is able to learn several levels of sparse representation of the image which capture features at a variety of abstraction levels and simultaneously preserve the spatial smoothness between the neighboring image patches. Combining the feature representations from multiple layers, DeepSC achieves the state-of-the-art performance on multiple object recognition tasks. △ Less

Submitted 19 December, 2013; originally announced December 2013.

Comments: 9 pages, submitted to ICLR

arXiv:1311.4025 [pdf, ps, other]

Signal Recovery from Pooling Representations

Authors: Joan Bruna, Arthur Szlam, Yann LeCun

Abstract: In this work we compute lower Lipschitz bounds of $\ell_p$ pooling operators for $p=1, 2, \infty$ as well as $\ell_p$ pooling operators preceded by half-rectification layers. These give sufficient conditions for the design of invertible neural network layers. Numerical experiments on MNIST and image patches confirm that pooling layers can be inverted with phase recovery algorithms. Moreover, the r… ▽ More In this work we compute lower Lipschitz bounds of $\ell_p$ pooling operators for $p=1, 2, \infty$ as well as $\ell_p$ pooling operators preceded by half-rectification layers. These give sufficient conditions for the design of invertible neural network layers. Numerical experiments on MNIST and image patches confirm that pooling layers can be inverted with phase recovery algorithms. Moreover, the regularity of the inverse pooling, controlled by the lower Lipschitz constant, is empirically verified with a nearest neighbor regression. △ Less

Submitted 27 February, 2014; v1 submitted 16 November, 2013; originally announced November 2013.

Comments: 17 pages, 3 figures

arXiv:1301.3590 [pdf, ps, other]

Tree structured sparse coding on cubes

Authors: Arthur Szlam

Abstract: A brief description of tree structured sparse coding on the binary cube. A brief description of tree structured sparse coding on the binary cube. △ Less

Submitted 16 January, 2013; originally announced January 2013.

arXiv:1301.3537 [pdf, ps, other]

Learning Stable Group Invariant Representations with Convolutional Networks

Authors: Joan Bruna, Arthur Szlam, Yann LeCun

Abstract: Transformation groups, such as translations or rotations, effectively express part of the variability observed in many recognition problems. The group structure enables the construction of invariant signal representations with appealing mathematical properties, where convolutions, together with pooling operators, bring stability to additive and geometric perturbations of the input. Whereas physica… ▽ More Transformation groups, such as translations or rotations, effectively express part of the variability observed in many recognition problems. The group structure enables the construction of invariant signal representations with appealing mathematical properties, where convolutions, together with pooling operators, bring stability to additive and geometric perturbations of the input. Whereas physical transformation groups are ubiquitous in image and audio applications, they do not account for all the variability of complex signal classes. We show that the invariance properties built by deep convolutional networks can be cast as a form of stable group invariance. The network wiring architecture determines the invariance group, while the trainable filter coefficients characterize the group action. We give explanatory examples which illustrate how the network architecture controls the resulting invariance group. We also explore the principle by which additional convolutional layers induce a group factorization enabling more abstract, powerful invariant representations. △ Less

Submitted 15 January, 2013; originally announced January 2013.

Comments: 4 pages

arXiv:1202.6384 [pdf, ps, other]

Fast approximations to structured sparse coding and applications to object classification

Authors: Arthur Szlam, Karol Gregor, Yann LeCun

Abstract: We describe a method for fast approximation of sparse coding. The input space is subdivided by a binary decision tree, and we simultaneously learn a dictionary and assignment of allowed dictionary elements for each leaf of the tree. We store a lookup table with the assignments and the pseudoinverses for each node, allowing for very fast inference. We give an algorithm for learning the tree, the di… ▽ More We describe a method for fast approximation of sparse coding. The input space is subdivided by a binary decision tree, and we simultaneously learn a dictionary and assignment of allowed dictionary elements for each leaf of the tree. We store a lookup table with the assignments and the pseudoinverses for each node, allowing for very fast inference. We give an algorithm for learning the tree, the dictionary and the dictionary element assignment, and In the process of describing this algorithm, we discuss the more general problem of learning the groups in group structured sparse modelling. We show that our method creates good sparse representations by using it in the object recognition framework of \cite{lazebnik06,yang-cvpr-09}. Implementing our own fast version of the SIFT descriptor the whole system runs at 20 frames per second on $321 \times 481$ sized images on a laptop with a quad-core cpu, while sacrificing very little accuracy on the Caltech 101 and 15 scenes benchmarks. △ Less

Submitted 28 February, 2012; originally announced February 2012.

arXiv:1010.3460 [pdf, ps, other]

doi 10.1007/s11263-012-0535-6

Hybrid Linear Modeling via Local Best-fit Flats

Authors: Teng Zhang, Arthur Szlam, Yi Wang, Gilad Lerman

Abstract: We present a simple and fast geometric method for modeling data by a union of affine subspaces. The method begins by forming a collection of local best-fit affine subspaces, i.e., subspaces approximating the data in local neighborhoods. The correct sizes of the local neighborhoods are determined automatically by the Jones' $β_2$ numbers (we prove under certain geometric conditions that our method… ▽ More We present a simple and fast geometric method for modeling data by a union of affine subspaces. The method begins by forming a collection of local best-fit affine subspaces, i.e., subspaces approximating the data in local neighborhoods. The correct sizes of the local neighborhoods are determined automatically by the Jones' $β_2$ numbers (we prove under certain geometric conditions that our method finds the optimal local neighborhoods). The collection of subspaces is further processed by a greedy selection procedure or a spectral method to generate the final model. We discuss applications to tracking-based motion segmentation and clustering of faces under different illuminating conditions. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the suggested algorithms on these problems and also on synthetic hybrid linear data as well as the MNIST handwritten digits data; and we demonstrate how to use our algorithms for fast determination of the number of affine subspaces. △ Less

Submitted 1 May, 2012; v1 submitted 17 October, 2010; originally announced October 2010.

Comments: This version adds some clarifications and numerical experiments as well as strengthens the previous theorem. For face experiments, we use here the Extended Yale Face Database B (cropped faces unlike previous version). This database points to a failure mode of our algorithms, but we suggest and successfully test a workaround

Journal ref: International Journal of Computer Vision Volume 100, Issue 3 (2012), Page 217-240

arXiv:1010.0422 [pdf, ps, other]

Convolutional Matching Pursuit and Dictionary Training

Authors: Arthur Szlam, Koray Kavukcuoglu, Yann LeCun

Abstract: Matching pursuit and K-SVD is demonstrated in the translation invariant setting Matching pursuit and K-SVD is demonstrated in the translation invariant setting △ Less

Submitted 3 October, 2010; originally announced October 2010.

arXiv:1005.0858 [pdf, ps, other]

doi 10.1109/CVPR.2010.5539866

Randomized hybrid linear modeling by local best-fit flats

Authors: Teng Zhang, Arthur Szlam, Yi Wang, Gilad Lerman

Abstract: The hybrid linear modeling problem is to identify a set of d-dimensional affine sets in a D-dimensional Euclidean space. It arises, for example, in object tracking and structure from motion. The hybrid linear model can be considered as the second simplest (behind linear) manifold model of data. In this paper we will present a very simple geometric method for hybrid linear modeling based on selecti… ▽ More The hybrid linear modeling problem is to identify a set of d-dimensional affine sets in a D-dimensional Euclidean space. It arises, for example, in object tracking and structure from motion. The hybrid linear model can be considered as the second simplest (behind linear) manifold model of data. In this paper we will present a very simple geometric method for hybrid linear modeling based on selecting a set of local best fit flats that minimize a global l1 error measure. The size of the local neighborhoods is determined automatically by the Jones' l2 beta numbers; it is proven under certain geometric conditions that good local neighborhoods exist and are found by our method. We also demonstrate how to use this algorithm for fast determination of the number of affine subspaces. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the algorithm on synthetic and real hybrid linear data. △ Less

Submitted 5 May, 2010; originally announced May 2010.

Comments: To appear in the proceedings of CVPR 2010

Journal ref: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (13-18 June 2010), pp. 1927-1934

arXiv:0909.3123 [pdf, ps, other]

doi 10.1109/ICCVW.2009.5457695

Median K-flats for hybrid linear modeling with many outliers

Authors: Teng Zhang, Arthur Szlam, Gilad Lerman

Abstract: We describe the Median K-Flats (MKF) algorithm, a simple online method for hybrid linear modeling, i.e., for approximating data by a mixture of flats. This algorithm simultaneously partitions the data into clusters while finding their corresponding best approximating l1 d-flats, so that the cumulative l1 error is minimized. The current implementation restricts d-flats to be d-dimensional linear… ▽ More We describe the Median K-Flats (MKF) algorithm, a simple online method for hybrid linear modeling, i.e., for approximating data by a mixture of flats. This algorithm simultaneously partitions the data into clusters while finding their corresponding best approximating l1 d-flats, so that the cumulative l1 error is minimized. The current implementation restricts d-flats to be d-dimensional linear subspaces. It requires a negligible amount of storage, and its complexity, when modeling data consisting of N points in D-dimensional Euclidean space with K d-dimensional linear subspaces, is of order O(n K d D+n d^2 D), where n is the number of iterations required for convergence (empirically on the order of 10^4). Since it is an online algorithm, data can be supplied to it incrementally and it can incrementally produce the corresponding output. The performance of the algorithm is carefully evaluated using synthetic and real data. △ Less

Submitted 16 September, 2009; originally announced September 2009.

Journal ref: Proc. of 2nd IEEE International Workshop on Subspace Methods (Subspace 2009), pp. 234-241 (2009)

arXiv:0809.2274 [pdf, ps, other]

A randomized algorithm for principal component analysis

Authors: Vladimir Rokhlin, Arthur Szlam, Mark Tygert

Abstract: Principal component analysis (PCA) requires the computation of a low-rank approximation to a matrix containing the data being analyzed. In many applications of PCA, the best possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spectral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, efficient algorithms have… ▽ More Principal component analysis (PCA) requires the computation of a low-rank approximation to a matrix containing the data being analyzed. In many applications of PCA, the best possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spectral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, efficient algorithms have not come with guarantees of good accuracy, unless one or both dimensions of the matrix being approximated are small. We describe an efficient algorithm for the low-rank approximation of matrices that produces accuracy very close to the best possible, for matrices of arbitrary sizes. We illustrate our theoretical results via several numerical examples. △ Less

Submitted 5 July, 2009; v1 submitted 12 September, 2008; originally announced September 2008.

Comments: 26 pages, 6 tables, 1 figure; to appear in the SIAM Journal on Matrix Analysis and Applications

Report number: UCLA Computational and Applied Math Technical Report 08-60

Journal ref: A randomized algorithm for principal component analysis, SIAM Journal on Matrix Analysis and Applications, 31 (3): 1100-1124, 2009

Showing 51–85 of 85 results for author: Szlam, A