Search | arXiv e-print repository

arXiv:1404.2000 [pdf, ps, other]

Notes on Kullback-Leibler Divergence and Likelihood

Abstract: The Kullback-Leibler (KL) divergence is a fundamental equation of information theory that quantifies the proximity of two probability distributions. Although difficult to understand by examining the equation, an intuition and understanding of the KL divergence arises from its intimate relationship with likelihood theory. We discuss how KL divergence arises from likelihood theory in an attempt to p… ▽ More The Kullback-Leibler (KL) divergence is a fundamental equation of information theory that quantifies the proximity of two probability distributions. Although difficult to understand by examining the equation, an intuition and understanding of the KL divergence arises from its intimate relationship with likelihood theory. We discuss how KL divergence arises from likelihood theory in an attempt to provide some intuition and reserve a rigorous (but rather simple) derivation for the appendix. Finally, we comment on recent applications of KL divergence in the neural coding literature and highlight its natural application. △ Less

Submitted 7 April, 2014; originally announced April 2014.

arXiv:1404.1999 [pdf, other]

Notes on Generalized Linear Models of Neurons

Authors: Jonathon Shlens

Abstract: Experimental neuroscience increasingly requires tractable models for analyzing and predicting the behavior of neurons and networks. The generalized linear model (GLM) is an increasingly popular statistical framework for analyzing neural data that is flexible, exhibits rich dynamic behavior and is computationally tractable (Paninski, 2004; Pillow et al., 2008; Truccolo et al., 2005). What follows i… ▽ More Experimental neuroscience increasingly requires tractable models for analyzing and predicting the behavior of neurons and networks. The generalized linear model (GLM) is an increasingly popular statistical framework for analyzing neural data that is flexible, exhibits rich dynamic behavior and is computationally tractable (Paninski, 2004; Pillow et al., 2008; Truccolo et al., 2005). What follows is a brief summary of the primary equations governing the application of GLM's to spike trains with a few sentences linking this work to the larger statistical literature. Latter sections include extensions of a basic GLM to model spatio-temporal receptive fields as well as network activity in an arbitrary numbers of neurons. △ Less

Submitted 7 April, 2014; originally announced April 2014.

arXiv:1404.1998 [pdf, other]

A Light Discussion and Derivation of Entropy

Authors: Jonathon Shlens

Abstract: The expression for entropy sometimes appears mysterious - as it often is asserted without justification. This short manuscript contains a discussion of the underlying assumptions behind entropy as well as simple derivation of this ubiquitous quantity. The expression for entropy sometimes appears mysterious - as it often is asserted without justification. This short manuscript contains a discussion of the underlying assumptions behind entropy as well as simple derivation of this ubiquitous quantity. △ Less

Submitted 7 April, 2014; originally announced April 2014.

arXiv:1404.1100 [pdf, other]

A Tutorial on Principal Component Analysis

Authors: Jonathon Shlens

Abstract: Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used but (sometimes) poorly understood. The goal of this paper is to dispel the magic behind this black box. This manuscript focuses on building a solid intuition for how and why principal component analysis works. This manuscript crystallizes this knowledge by deriving from simple intuitions, the… ▽ More Principal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used but (sometimes) poorly understood. The goal of this paper is to dispel the magic behind this black box. This manuscript focuses on building a solid intuition for how and why principal component analysis works. This manuscript crystallizes this knowledge by deriving from simple intuitions, the mathematics behind PCA. This tutorial does not shy away from explaining the ideas informally, nor does it shy away from the mathematics. The hope is that by addressing both aspects, readers of all levels will be able to gain a better understanding of PCA as well as the when, the how and the why of applying this technique. △ Less

Submitted 3 April, 2014; originally announced April 2014.

arXiv:1312.5697 [pdf, other]

Using Web Co-occurrence Statistics for Improving Image Categorization

Authors: Samy Bengio, Jeff Dean, Dumitru Erhan, Eugene Ie, Quoc Le, Andrew Rabinovich, Jonathon Shlens, Yoram Singer

Abstract: Object recognition and localization are important tasks in computer vision. The focus of this work is the incorporation of contextual information in order to improve object recognition and localization. For instance, it is natural to expect not to see an elephant to appear in the middle of an ocean. We consider a simple approach to encapsulate such common sense knowledge using co-occurrence statis… ▽ More Object recognition and localization are important tasks in computer vision. The focus of this work is the incorporation of contextual information in order to improve object recognition and localization. For instance, it is natural to expect not to see an elephant to appear in the middle of an ocean. We consider a simple approach to encapsulate such common sense knowledge using co-occurrence statistics from web documents. By merely counting the number of times nouns (such as elephants, sharks, oceans, etc.) co-occur in web documents, we obtain a good estimate of expected co-occurrences in visual data. We then cast the problem of combining textual co-occurrence statistics with the predictions of image-based classifiers as an optimization problem. The resulting optimization problem serves as a surrogate for our inference procedure. Albeit the simplicity of the resulting optimization problem, it is effective in improving both recognition and localization accuracy. Concretely, we observe significant improvements in recognition and localization rates for both ImageNet Detection 2012 and Sun 2012 datasets. △ Less

Submitted 20 December, 2013; v1 submitted 19 December, 2013; originally announced December 2013.

arXiv:1312.5650 [pdf, other]

Zero-Shot Learning by Convex Combination of Semantic Embeddings

Authors: Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg S. Corrado, Jeffrey Dean

Abstract: Several recent publications have proposed methods for map** images into continuous semantic embedding spaces. In some cases the embedding space is trained jointly with the image transformation. In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage. Proponents of… ▽ More Several recent publications have proposed methods for map** images into continuous semantic embedding spaces. In some cases the embedding space is trained jointly with the image transformation. In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage. Proponents of these image embedding systems have stressed their advantages over the traditional \nway{} classification framing of image understanding, particularly in terms of the promise for zero-shot learning -- the ability to correctly annotate images of previously unseen object categories. In this paper, we propose a simple method for constructing an image embedding system from any existing \nway{} image classifier and a semantic word embedding model, which contains the $\n$ class labels in its vocabulary. Our method maps images into the semantic embedding space via convex combination of the class label embedding vectors, and requires no additional training. We show that this simple and direct method confers many of the advantages associated with more complex image embedding schemes, and indeed outperforms state of the art methods on the ImageNet zero-shot learning task. △ Less

Submitted 21 March, 2014; v1 submitted 19 December, 2013; originally announced December 2013.

Showing 51–56 of 56 results for author: Shlens, J