Search | arXiv e-print repository

Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers

Authors: Clément Farabet, Camille Couprie, Laurent Najman, Yann LeCun

Abstract: Scene parsing, or semantic segmentation, consists in labeling each pixel in an image with the category of the object it belongs to. It is a challenging task that involves the simultaneous detection, segmentation and recognition of all the objects in the image. The scene parsing method proposed here starts by computing a tree of segments from a graph of pixel dissimilarities. Simultaneously, a se… ▽ More Scene parsing, or semantic segmentation, consists in labeling each pixel in an image with the category of the object it belongs to. It is a challenging task that involves the simultaneous detection, segmentation and recognition of all the objects in the image. The scene parsing method proposed here starts by computing a tree of segments from a graph of pixel dissimilarities. Simultaneously, a set of dense feature vectors is computed which encodes regions of multiple sizes centered on each pixel. The feature extractor is a multiscale convolutional network trained from raw pixels. The feature vectors associated with the segments covered by each node in the tree are aggregated and fed to a classifier which produces an estimate of the distribution of object categories contained in the segment. A subset of tree nodes that cover the image are then selected so as to maximize the average "purity" of the class distributions, hence maximizing the overall likelihood that each segment will contain a single object. The convolutional network feature extractor is trained end-to-end from raw pixels, alleviating the need for engineered features. After training, the system is parameter free. The system yields record accuracies on the Stanford Background Dataset (8 classes), the Sift Flow Dataset (33 classes) and the Barcelona Dataset (170 classes) while being an order of magnitude faster than competing approaches, producing a 320 \times 240 image labeling in less than 1 second. △ Less

Submitted 13 July, 2012; v1 submitted 9 February, 2012; originally announced February 2012.

Comments: 9 pages, 4 figures - Published in 29th International Conference on Machine Learning (ICML 2012), Jun 2012, Edinburgh, United Kingdom

arXiv:1108.1169 [pdf, ps, other]

Learning Representations by Maximizing Compression

Authors: Karol Gregor, Yann LeCun

Abstract: We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of computations similar to an autoencoder. The likelihood under the model can be calculated exactly, and arithmetic coding can be used directly for compression. When training on digits the algorithm learns filters… ▽ More We give an algorithm that learns a representation of data through compression. The algorithm 1) predicts bits sequentially from those previously seen and 2) has a structure and a number of computations similar to an autoencoder. The likelihood under the model can be calculated exactly, and arithmetic coding can be used directly for compression. When training on digits the algorithm learns filters similar to those of restricted boltzman machines and denoising autoencoders. Independent samples can be drawn from the model by a single sweep through the pixels. The algorithm has a good compression performance when compared to other methods that work under random ordering of pixels. △ Less

Submitted 4 August, 2011; originally announced August 2011.

Comments: 8 pages, 3 figures

arXiv:1105.5307 [pdf, ps, other]

Efficient Learning of Sparse Invariant Representations

Authors: Karol Gregor, Yann LeCun

Abstract: We propose a simple and efficient algorithm for learning sparse invariant representations from unlabeled data with fast inference. When trained on short movies sequences, the learned features are selective to a range of orientations and spatial frequencies, but robust to a wide range of positions, similar to complex cells in the primary visual cortex. We give a hierarchical version of the algorith… ▽ More We propose a simple and efficient algorithm for learning sparse invariant representations from unlabeled data with fast inference. When trained on short movies sequences, the learned features are selective to a range of orientations and spatial frequencies, but robust to a wide range of positions, similar to complex cells in the primary visual cortex. We give a hierarchical version of the algorithm, and give guarantees of fast convergence under certain conditions. △ Less

Submitted 26 May, 2011; originally announced May 2011.

Comments: 9 pages + 6 supplement pages

arXiv:1010.3467 [pdf, ps, other]

Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition

Authors: Koray Kavukcuoglu, Marc'Aurelio Ranzato, Yann LeCun

Abstract: Adaptive sparse coding methods learn a possibly overcomplete set of basis functions, such that natural image patches can be reconstructed by linearly combining a small subset of these bases. The applicability of these methods to visual object recognition tasks has been limited because of the prohibitive cost of the optimization algorithms required to compute the sparse representation. In this work… ▽ More Adaptive sparse coding methods learn a possibly overcomplete set of basis functions, such that natural image patches can be reconstructed by linearly combining a small subset of these bases. The applicability of these methods to visual object recognition tasks has been limited because of the prohibitive cost of the optimization algorithms required to compute the sparse representation. In this work we propose a simple and efficient algorithm to learn basis functions. After training, this model also provides a fast and smooth approximator to the optimal representation, achieving even better accuracy than exact sparse coding algorithms on visual object recognition tasks. △ Less

Submitted 17 October, 2010; originally announced October 2010.

Report number: CBLL-TR-2008-12-01

arXiv:1010.0422 [pdf, ps, other]

Convolutional Matching Pursuit and Dictionary Training

Authors: Arthur Szlam, Koray Kavukcuoglu, Yann LeCun

Abstract: Matching pursuit and K-SVD is demonstrated in the translation invariant setting Matching pursuit and K-SVD is demonstrated in the translation invariant setting △ Less

Submitted 3 October, 2010; originally announced October 2010.

arXiv:1006.0448 [pdf, ps, other]

Emergence of Complex-Like Cells in a Temporal Product Network with Local Receptive Fields

Authors: Karo Gregor, Yann LeCun

Abstract: We introduce a new neural architecture and an unsupervised algorithm for learning invariant representations from temporal sequence of images. The system uses two groups of complex cells whose outputs are combined multiplicatively: one that represents the content of the image, constrained to be constant over several consecutive frames, and one that represents the precise location of features, which… ▽ More We introduce a new neural architecture and an unsupervised algorithm for learning invariant representations from temporal sequence of images. The system uses two groups of complex cells whose outputs are combined multiplicatively: one that represents the content of the image, constrained to be constant over several consecutive frames, and one that represents the precise location of features, which is allowed to vary over time but constrained to be sparse. The architecture uses an encoder to extract features, and a decoder to reconstruct the input from the features. The method was applied to patches extracted from consecutive movie frames and produces orientation and frequency selective units analogous to the complex cells in V1. An extension of the method is proposed to train a network composed of units with local receptive field spread over a large image of arbitrary size. A layer of complex cells, subject to sparsity constraints, pool feature units over overlap** local neighborhoods, which causes the feature units to organize themselves into pinwheel patterns of orientation-selective receptive fields, similar to those observed in the mammalian visual cortex. A feed-forward encoder efficiently computes the feature representation of full images. △ Less

Submitted 2 June, 2010; originally announced June 2010.

Showing 151–156 of 156 results for author: LeCun, Y