Search | arXiv e-print repository

Dense Sample Deep Learning

Authors: Stephen Josè Hanson, Vivek Yadav, Catherine Hanson

Abstract: Deep Learning (DL) , a variant of the neural network algorithms originally proposed in the 1980s, has made surprising progress in Artificial Intelligence (AI), ranging from language translation, protein folding, autonomous cars, and more recently human-like language models (CHATbots), all that seemed intractable until very recently. Despite the growing use of Deep Learning (DL) networks, little is… ▽ More Deep Learning (DL) , a variant of the neural network algorithms originally proposed in the 1980s, has made surprising progress in Artificial Intelligence (AI), ranging from language translation, protein folding, autonomous cars, and more recently human-like language models (CHATbots), all that seemed intractable until very recently. Despite the growing use of Deep Learning (DL) networks, little is actually understood about the learning mechanisms and representations that makes these networks effective across such a diverse range of applications. Part of the answer must be the huge scale of the architecture and of course the large scale of the data, since not much has changed since 1987. But the nature of deep learned representations remain largely unknown. Unfortunately training sets with millions or billions of tokens have unknown combinatorics and Networks with millions or billions of hidden units cannot easily be visualized and their mechanisms cannot be easily revealed. In this paper, we explore these questions with a large (1.24M weights; VGG) DL in a novel high density sample task (5 unique tokens with at minimum 500 exemplars per token) which allows us to more carefully follow the emergence of category structure and feature construction. We use various visualization methods for following the emergence of the classification and the development of the coupling of feature detectors and structures that provide a type of graphical bootstrap**, From these results we harvest some basic observations of the learning dynamics of DL and propose a new theory of complex feature construction based on our results. △ Less

Submitted 21 July, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

arXiv:1808.03578 [pdf, other]

Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning

Authors: Noah Frazier-Logue, Stephen José Hanson

Abstract: Multi-layer neural networks have lead to remarkable performance on many kinds of benchmark tasks in text, speech and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting and misspecification. One approach to these estimation and related problems (local minima, colinearity, feature discovery etc.) is called Dropout (Hinton, et al 2012, Baldi… ▽ More Multi-layer neural networks have lead to remarkable performance on many kinds of benchmark tasks in text, speech and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting and misspecification. One approach to these estimation and related problems (local minima, colinearity, feature discovery etc.) is called Dropout (Hinton, et al 2012, Baldi et al 2016). The Dropout algorithm removes hidden units according to a Bernoulli random variable with probability $p$ prior to each update, creating random "shocks" to the network that are averaged over updates. In this paper we will show that Dropout is a special case of a more general model published originally in 1990 called the Stochastic Delta Rule, or SDR (Hanson, 1990). SDR redefines each weight in the network as a random variable with mean $μ_{w_{ij}}$ and standard deviation $σ_{w_{ij}}$. Each weight random variable is sampled on each forward activation, consequently creating an exponential number of potential networks with shared weights. Both parameters are updated according to prediction error, thus resulting in weight noise injections that reflect a local history of prediction error and local model averaging. SDR therefore implements a more sensitive local gradient-dependent simulated annealing per weight converging in the limit to a Bayes optimal network. Tests on standard benchmarks (CIFAR) using a modified version of DenseNet shows the SDR outperforms standard Dropout in test error by approx. $17\%$ with DenseNet-BC 250 on CIFAR-100 and approx. $12-14\%$ in smaller networks. We also show that SDR reaches the same accuracy that Dropout attains in 100 epochs in as few as 35 epochs. △ Less

Submitted 7 February, 2019; v1 submitted 10 August, 2018; originally announced August 2018.

Comments: 6 pages, 7 figures; submitted to ICML

arXiv:1311.3318 [pdf, other]

A Study of Actor and Action Semantic Retention in Video Supervoxel Segmentation

Authors: Chenliang Xu, Richard F. Doell, Stephen José Hanson, Catherine Hanson, Jason J. Corso

Abstract: Existing methods in the semantic computer vision community seem unable to deal with the explosion and richness of modern, open-source and social video content. Although sophisticated methods such as object detection or bag-of-words models have been well studied, they typically operate on low level features and ultimately suffer from either scalability issues or a lack of semantic meaning. On the o… ▽ More Existing methods in the semantic computer vision community seem unable to deal with the explosion and richness of modern, open-source and social video content. Although sophisticated methods such as object detection or bag-of-words models have been well studied, they typically operate on low level features and ultimately suffer from either scalability issues or a lack of semantic meaning. On the other hand, video supervoxel segmentation has recently been established and applied to large scale data processing, which potentially serves as an intermediate representation to high level video semantic extraction. The supervoxels are rich decompositions of the video content: they capture object shape and motion well. However, it is not yet known if the supervoxel segmentation retains the semantics of the underlying video content. In this paper, we conduct a systematic study of how well the actor and action semantics are retained in video supervoxel segmentation. Our study has human observers watching supervoxel segmentation videos and trying to discriminate both actor (human or animal) and action (one of eight everyday actions). We gather and analyze a large set of 640 human perceptions over 96 videos in 3 different supervoxel scales. Furthermore, we conduct machine recognition experiments on a feature defined on supervoxel segmentation, called supervoxel shape context, which is inspired by the higher order processes in human perception. Our ultimate findings suggest that a significant amount of semantics have been well retained in the video supervoxel segmentation and can be used for further video analysis. △ Less

Submitted 13 November, 2013; originally announced November 2013.

Comments: This article is in review at the International Journal of Semantic Computing

arXiv:1307.2150 [pdf]

Transmodal Analysis of Neural Signals

Authors: Yaroslav O. Halchenko, Michael Hanke, James V. Haxby, Stephen Jose Hanson, Christoph S. Herrmann

Abstract: Localizing neuronal activity in the brain, both in time and in space, is a central challenge to advance the understanding of brain function. Because of the inability of any single neuroimaging techniques to cover all aspects at once, there is a growing interest to combine signals from multiple modalities in order to benefit from the advantages of each acquisition method. Due to the complexity and… ▽ More Localizing neuronal activity in the brain, both in time and in space, is a central challenge to advance the understanding of brain function. Because of the inability of any single neuroimaging techniques to cover all aspects at once, there is a growing interest to combine signals from multiple modalities in order to benefit from the advantages of each acquisition method. Due to the complexity and unknown parameterization of any suggested complete model of BOLD response in functional magnetic resonance imaging (fMRI), the development of a reliable ultimate fusion approach remains difficult. But besides the primary goal of superior temporal and spatial resolution, conjoint analysis of data from multiple imaging modalities can alternatively be used to segregate neural information from physiological and acquisition noise. In this paper we suggest a novel methodology which relies on constructing a quantifiable map** of data from one modality (electroencephalography; EEG) into another (fMRI), called transmodal analysis of neural signals (TRANSfusion). TRANSfusion attempts to map neural data embedded within the EEG signal into its reflection in fMRI data. Assessing the map** performance on unseen data allows to localize brain areas where a significant portion of the signal could be reliably reconstructed, hence the areas neural activity of which is reflected in both EEG and fMRI data. Consecutive analysis of the learnt model allows to localize areas associated with specific frequency bands of EEG, or areas functionally related (connected or coherent) to any given EEG sensor. We demonstrate the performance of TRANSfusion on artificial and real data from an auditory experiment. We further speculate on possible alternative uses: cross-modal data filtering and EEG-driven interpolation of fMRI signals to obtain arbitrarily high temporal sampling of BOLD. △ Less

Submitted 8 July, 2013; originally announced July 2013.

arXiv:1304.3432 [pdf]

Machine Learning, Clustering, and Polymorphy

Authors: Stephen Jose Hanson, Malcolm Bauer

Abstract: This paper describes a machine induction program (WITT) that attempts to model human categorization. Properties of categories to which human subjects are sensitive includes best or prototypical members, relative contrasts between putative categories, and polymorphy (neither necessary or sufficient features). This approach represents an alternative to usual Artificial Intelligence approaches to gen… ▽ More This paper describes a machine induction program (WITT) that attempts to model human categorization. Properties of categories to which human subjects are sensitive includes best or prototypical members, relative contrasts between putative categories, and polymorphy (neither necessary or sufficient features). This approach represents an alternative to usual Artificial Intelligence approaches to generalization and conceptual clustering which tend to focus on necessary and sufficient feature rules, equivalence classes, and simple search and match schemes. WITT is shown to be more consistent with human categorization while potentially including results produced by more traditional clustering schemes. Applications of this approach in the domains of expert systems and information retrieval are also discussed. △ Less

Submitted 27 March, 2013; originally announced April 2013.

Comments: Appears in Proceedings of the First Conference on Uncertainty in Artificial Intelligence (UAI1985)

Report number: UAI-P-1985-PG-117-128

Showing 1–5 of 5 results for author: Hanson, S J