Search | arXiv e-print repository

Incremental Learning and Self-Attention Mechanisms Improve Neural System Identification

Authors: Isaac Lin, Tianye Wang, Shang Gao, Shiming Tang, Tai Sing Lee

Abstract: Convolutional neural networks (CNNs) have been shown to be the state-of-the-art approach for modeling the transfer functions of visual cortical neurons. Cortical neurons in the primary visual cortex are are sensitive to contextual information mediated by extensive horizontal and feedback connections. Standard CNNs can integrate global spatial image information to model such contextual modulation v… ▽ More Convolutional neural networks (CNNs) have been shown to be the state-of-the-art approach for modeling the transfer functions of visual cortical neurons. Cortical neurons in the primary visual cortex are are sensitive to contextual information mediated by extensive horizontal and feedback connections. Standard CNNs can integrate global spatial image information to model such contextual modulation via two mechanisms: successive rounds of convolutions and a fully connected readout layer. In this paper, we find that non-local networks or self-attention (SA) mechanisms, theoretically related to context-dependent flexible gating mechanisms observed in the primary visual cortex, improve neural response predictions over parameter-matched CNNs in two key metrics: tuning curve correlation and tuning peak. We factorize networks to determine the relative contribution of each context mechanism. This reveals that information in the local receptive field is most important for modeling the overall tuning curve, but surround information is critically necessary for characterizing the tuning peak. We find that self-attention can replace subsequent spatial-integration convolutions when learned in an incremental manner, and is further enhanced in the presence of a fully connected readout layer, suggesting that the two context mechanisms are complementary. Finally, we find that learning a receptive-field-centric model with self-attention, before incrementally learning a fully connected readout, yields a more biologically realistic model in terms of center-surround contributions. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Preprint NeurIPS 2024

arXiv:2310.18894 [pdf, other]

Emergence of Shape Bias in Convolutional Neural Networks through Activation Sparsity

Authors: Tianqin Li, Ziqi Wen, Yangfan Li, Tai Sing Lee

Abstract: Current deep-learning models for object recognition are known to be heavily biased toward texture. In contrast, human visual systems are known to be biased toward shape and structure. What could be the design principles in human visual systems that led to this difference? How could we introduce more shape bias into the deep learning models? In this paper, we report that sparse coding, a ubiquitous… ▽ More Current deep-learning models for object recognition are known to be heavily biased toward texture. In contrast, human visual systems are known to be biased toward shape and structure. What could be the design principles in human visual systems that led to this difference? How could we introduce more shape bias into the deep learning models? In this paper, we report that sparse coding, a ubiquitous principle in the brain, can in itself introduce shape bias into the network. We found that enforcing the sparse coding constraint using a non-differential Top-K operation can lead to the emergence of structural encoding in neurons in convolutional neural networks, resulting in a smooth decomposition of objects into parts and subparts and endowing the networks with shape bias. We demonstrated this emergence of shape bias and its functional benefits for different network structures with various datasets. For object recognition convolutional neural networks, the shape bias leads to greater robustness against style and pattern change distraction. For the image synthesis generative adversary networks, the emerged shape bias leads to more coherent and decomposable structures in the synthesized images. Ablation studies suggest that sparse codes tend to encode structures, whereas the more distributed codes tend to favor texture. Our code is host at the github repository: \url{https://github.com/Crazy-Jack/nips2023_shape_vs_texture} △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: Published as NeurIPS 2023 (Oral)

arXiv:2310.07555 [pdf, other]

Does resistance to style-transfer equal Global Shape Bias? Measuring network sensitivity to global shape configuration

Authors: Ziqi Wen, Tianqin Li, Zhi **g, Tai Sing Lee

Abstract: Deep learning models are known to exhibit a strong texture bias, while human tends to rely heavily on global shape structure for object recognition. The current benchmark for evaluating a model's global shape bias is a set of style-transferred images with the assumption that resistance to the attack of style transfer is related to the development of global structure sensitivity in the model. In th… ▽ More Deep learning models are known to exhibit a strong texture bias, while human tends to rely heavily on global shape structure for object recognition. The current benchmark for evaluating a model's global shape bias is a set of style-transferred images with the assumption that resistance to the attack of style transfer is related to the development of global structure sensitivity in the model. In this work, we show that networks trained with style-transfer images indeed learn to ignore style, but its shape bias arises primarily from local detail. We provide a \textbf{Disrupted Structure Testbench (DiST)} as a direct measurement of global structure sensitivity. Our test includes 2400 original images from ImageNet-1K, each of which is accompanied by two images with the global shapes of the original image disrupted while preserving its texture via the texture synthesis program. We found that \textcolor{black}{(1) models that performed well on the previous cue-conflict dataset do not fare well in the proposed DiST; (2) the supervised trained Vision Transformer (ViT) lose its global spatial information from positional embedding, leading to no significant advantages over Convolutional Neural Networks (CNNs) on DiST. While self-supervised learning methods, especially mask autoencoder significantly improves the global structure sensitivity of ViT. (3) Improving the global structure sensitivity is orthogonal to resistance to style-transfer, indicating that the relationship between global shape structure and local texture detail is not an either/or relationship. Training with DiST images and style-transferred images are complementary, and can be combined to train network together to enhance the global shape sensitivity and robustness of local features.} Our code will be hosted in github: https://github.com/leelabcnbc/DiST △ Less

Submitted 29 February, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.00954 [pdf, other]

doi 10.1109/TALE56641.2023.10398393

How Helpful do Novice Programmers Find the Feedback of an Automated Repair Tool?

Authors: Oka Kurniawan, Christopher M. Poskitt, Ismam Al Hoque, Norman Tiong Seng Lee, Cyrille Jégourel, Nachamma Sockalingam

Abstract: Immediate feedback has been shown to improve student learning. In programming courses, immediate, automated feedback is typically provided in the form of pre-defined test cases run by a submission platform. While these are excellent for highlighting the presence of logical errors, they do not provide novice programmers enough scaffolding to help them identify where an error is or how to fix it. To… ▽ More Immediate feedback has been shown to improve student learning. In programming courses, immediate, automated feedback is typically provided in the form of pre-defined test cases run by a submission platform. While these are excellent for highlighting the presence of logical errors, they do not provide novice programmers enough scaffolding to help them identify where an error is or how to fix it. To address this, several tools have been developed that provide richer feedback in the form of program repairs. Studies of such tools, however, tend to focus more on whether correct repairs can be generated, rather than how novices are using them. In this paper, we describe our experience of using CLARA, an automated repair tool, to provide feedback to novices. First, we extended CLARA to support a larger subset of the Python language, before integrating it with the Jupyter Notebooks used for our programming exercises. Second, we devised a preliminary study in which students tackled programming problems with and without support of the tool using the 'think aloud' protocol. We found that novices often struggled to understand the proposed repairs, echoing the well-known challenge to understand compiler/interpreter messages. Furthermore, we found that students valued being told where a fix was needed - without necessarily the fix itself - suggesting that 'less may be more' from a pedagogical perspective. △ Less

Submitted 7 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: Experience report accepted by the International Conference on Teaching, Assessment, and Learning for Engineering (TALE'23)

Journal ref: Proc. TALE'23. IEEE, 2023

arXiv:2307.00932 [pdf]

A large calcium-imaging dataset reveals a systematic V4 organization for natural scenes

Authors: Tianye Wang, Haoxuan Yao, Tai Sing Lee, Jiayi Hong, Yang Li, Hongfei Jiang, Ian Max Andolina, Shiming Tang

Abstract: The visual system evolved to process natural scenes, yet most of our understanding of the topology and function of visual cortex derives from studies using artificial stimuli. To gain deeper insights into visual processing of natural scenes, we utilized widefield calcium-imaging of primate V4 in response to many natural images, generating a large dataset of columnar-scale responses. We used this d… ▽ More The visual system evolved to process natural scenes, yet most of our understanding of the topology and function of visual cortex derives from studies using artificial stimuli. To gain deeper insights into visual processing of natural scenes, we utilized widefield calcium-imaging of primate V4 in response to many natural images, generating a large dataset of columnar-scale responses. We used this dataset to build a digital twin of V4 via deep learning, generating a detailed topographical map of natural image preferences at each cortical position. The map revealed clustered functional domains for specific classes of natural image features. These ranged from surface-related attributes like color and texture to shape-related features such as edges, curvature, and facial features. We validated the model-predicted domains with additional widefield calcium-imaging and single-cell resolution two-photon imaging. Our study illuminates the detailed topological organization and neural codes in V4 that represent natural scenes. △ Less

Submitted 23 July, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: 39 pages, 14 figures

arXiv:2210.16587 [pdf, other]

Relating Human Perception of Musicality to Prediction in a Predictive Coding Model

Authors: Nikolas McNeal, Jennifer Huang, Aniekan Umoren, Shuqi Dai, Roger Dannenberg, Richard Randall, Tai Sing Lee

Abstract: We explore the use of a neural network inspired by predictive coding for modeling human music perception. This network was developed based on the computational neuroscience theory of recurrent interactions in the hierarchical visual cortex. When trained with video data using self-supervised learning, the model manifests behaviors consistent with human visual illusions. Here, we adapt this network… ▽ More We explore the use of a neural network inspired by predictive coding for modeling human music perception. This network was developed based on the computational neuroscience theory of recurrent interactions in the hierarchical visual cortex. When trained with video data using self-supervised learning, the model manifests behaviors consistent with human visual illusions. Here, we adapt this network to model the hierarchical auditory system and investigate whether it will make similar choices to humans regarding the musicality of a set of random pitch sequences. When the model is trained with a large corpus of instrumental classical music and popular melodies rendered as mel spectrograms, it exhibits greater prediction errors for random pitch sequences that are rated less musical by human subjects. We found that the prediction error depends on the amount of information regarding the subsequent note, the pitch interval, and the temporal context. Our findings suggest that predictability is correlated with human perception of musicality and that a predictive coding neural network trained on music can be used to characterize the features and motifs contributing to human perception of music. △ Less

Submitted 29 October, 2022; originally announced October 2022.

Comments: 5 pages, 5 figures, currently in peer review

arXiv:2201.02886 [pdf]

Identifying the differences between 3 dimensional shapes Using a Custom-built Smart Glove

Authors: Davis Le, Sairam Tangirala, Tae Song Lee

Abstract: Sensor embedded glove systems have been reported to require careful, time consuming and precise calibrations on a per user basis in order to obtain consistent usable data. We have developed a low cost, flex sensor based smart glove system that may be resilient to the common limitations of data gloves. This system utilizes an Arduino based micro controller as well as a single flex sensor on each fi… ▽ More Sensor embedded glove systems have been reported to require careful, time consuming and precise calibrations on a per user basis in order to obtain consistent usable data. We have developed a low cost, flex sensor based smart glove system that may be resilient to the common limitations of data gloves. This system utilizes an Arduino based micro controller as well as a single flex sensor on each finger. Feedback from the Arduinos analog to digital converter can be used to infer objects dimensional properties, the reactions of each individual finger will differ with respect to the size and shape of a grasped object. In this work, we report its use in statistically differentiating stationary objects of spherical and cylindrical shapes of varying radii regardless of the variations introduced by gloves users. Using our sensor embedded glove system, we explored the practicability of object classification based on the tactile sensor responses from each finger of the smart glove. An estimated standard error of the mean was calculated from each of the of five fingers averaged flex sensor readings. Consistent with the literature, we found that there is a systematic dependence between an objects shape, dimension and the flex sensor readings. The sensor output from at least one finger, indicated a non-overlap** confidence interval when comparing spherical and cylindrical objects of the same radius. When sensing spheres and cylinders of varying sizes, all five fingers had a categorically varying reaction to each shape. We believe that our findings could be used in machine learning models for real-time object identification. △ Less

Submitted 8 January, 2022; originally announced January 2022.

Comments: 12 pages, 12 figures

arXiv:2201.00112 [pdf, other]

SurfGen: Adversarial 3D Shape Synthesis with Explicit Surface Discriminators

Authors: Andrew Luo, Tianqin Li, Wen-Hao Zhang, Tai Sing Lee

Abstract: Recent advances in deep generative models have led to immense progress in 3D shape synthesis. While existing models are able to synthesize shapes represented as voxels, point-clouds, or implicit functions, these methods only indirectly enforce the plausibility of the final 3D shape surface. Here we present a 3D shape synthesis framework (SurfGen) that directly applies adversarial training to the o… ▽ More Recent advances in deep generative models have led to immense progress in 3D shape synthesis. While existing models are able to synthesize shapes represented as voxels, point-clouds, or implicit functions, these methods only indirectly enforce the plausibility of the final 3D shape surface. Here we present a 3D shape synthesis framework (SurfGen) that directly applies adversarial training to the object surface. Our approach uses a differentiable spherical projection layer to capture and represent the explicit zero isosurface of an implicit 3D generator as functions defined on the unit sphere. By processing the spherical representation of 3D object surfaces with a spherical CNN in an adversarial setting, our generator can better learn the statistics of natural shape surfaces. We evaluate our model on large-scale shape datasets, and demonstrate that the end-to-end trained model is capable of generating high fidelity 3D shapes with diverse topology. △ Less

Submitted 31 December, 2021; originally announced January 2022.

Comments: ICCV 2021. Project page: https://github.com/aluo-x/NeuralRaycaster

arXiv:2110.00825 [pdf, ps, other]

Recurrent networks improve neural response prediction and provide insights into underlying cortical circuits

Authors: Yimeng Zhang, Harold Rockwell, Sicheng Dai, Ge Huang, Stephen Tsou, Yuanyuan Wei, Tai Sing Lee

Abstract: Feedforward CNN models have proven themselves in recent years as state-of-the-art models for predicting single-neuron responses to natural images in early visual cortical neurons. In this paper, we extend these models with recurrent convolutional layers, reflecting the well-known massive recurrence in the cortex, and show robust increases in predictive performance over feedforward models across th… ▽ More Feedforward CNN models have proven themselves in recent years as state-of-the-art models for predicting single-neuron responses to natural images in early visual cortical neurons. In this paper, we extend these models with recurrent convolutional layers, reflecting the well-known massive recurrence in the cortex, and show robust increases in predictive performance over feedforward models across thousands of hyperparameter combinations in three datasets of macaque V1 and V2 single-neuron responses. We propose the recurrent circuit can be conceptualized as a form of ensemble computing, with each iteration generating more effective feedforward paths of various path lengths to allow a combination of solutions in the final approximation. The statistics of the paths in the ensemble provide insights to the differential performance increases among our recurrent models. We also assess whether the recurrent circuits learned for neural response prediction can be related to cortical circuits. We find that the hidden units in the recurrent circuits of the appropriate models, when trained on long-duration wide-field image presentations, exhibit similar temporal response dynamics and classical contextual modulations as observed in V1 neurons. This work provides insights to the computational rationale of recurrent circuits and suggests that neural response prediction could be useful for characterizing the recurrent neural circuits in the visual cortex. △ Less

Submitted 13 November, 2022; v1 submitted 2 October, 2021; originally announced October 2021.

arXiv:2109.08896 [pdf, other]

doi 10.24251/HICSS.2022.121

Steps Before Syntax: Hel** Novice Programmers Solve Problems using the PCDIT Framework

Authors: Oka Kurniawan, Cyrille Jégourel, Norman Tiong Seng Lee, Matthieu De Mari, Christopher M. Poskitt

Abstract: Novice programmers often struggle with problem solving due to the high cognitive loads they face. Furthermore, many introductory programming courses do not explicitly teach it, assuming that problem solving skills are acquired along the way. In this paper, we present 'PCDIT', a non-linear problem solving framework that provides scaffolding to guide novice programmers through the process of transfo… ▽ More Novice programmers often struggle with problem solving due to the high cognitive loads they face. Furthermore, many introductory programming courses do not explicitly teach it, assuming that problem solving skills are acquired along the way. In this paper, we present 'PCDIT', a non-linear problem solving framework that provides scaffolding to guide novice programmers through the process of transforming a problem specification into an implemented and tested solution for an imperative programming language. A key distinction of PCDIT is its focus on develo** concrete cases for the problem early without actually writing test code: students are instead encouraged to think about the abstract steps from inputs to outputs before map** anything down to syntax. We reflect on our experience of teaching an introductory programming course using PCDIT, and report the results of a survey that suggests it helped students to break down challenging problems, organise their thoughts, and reach working solutions. △ Less

Submitted 18 September, 2021; originally announced September 2021.

Comments: Accepted by the 34th Conference on Software Engineering Education and Training (CSEE&T 2022): Special Track of the 55th Hawaii International Conference on System Sciences (HICSS 2022)

Journal ref: Proc. HICSS 2022, pages 982-991. ScholarSpace, 2022

arXiv:2001.03942 [pdf, other]

doi 10.1145/3328778.3366907

Securing Bring-Your-Own-Device (BYOD) Programming Exams

Authors: Oka Kurniawan, Norman Tiong Seng Lee, Christopher M. Poskitt

Abstract: Traditional pen and paper exams are inadequate for modern university programming courses as they are misaligned with pedagogies and learning objectives that target practical coding ability. Unfortunately, many institutions lack the resources or space to be able to run assessments in dedicated computer labs. This has motivated the development of bring-your-own-device (BYOD) exam formats, allowing s… ▽ More Traditional pen and paper exams are inadequate for modern university programming courses as they are misaligned with pedagogies and learning objectives that target practical coding ability. Unfortunately, many institutions lack the resources or space to be able to run assessments in dedicated computer labs. This has motivated the development of bring-your-own-device (BYOD) exam formats, allowing students to program in a similar environment to how they learnt, but presenting instructors with significant additional challenges in preventing plagiarism and cheating. In this paper, we describe a BYOD exam solution based on lockdown browsers, software which temporarily turns students' laptops into secure workstations with limited system or internet access. We combine the use of this technology with a learning management system and cloud-based programming tool to facilitate conceptual and practical programming questions that can be tackled in an interactive but controlled environment. We reflect on our experience of implementing this solution for a major undergraduate programming course, highlighting our principal lesson that policies and support mechanisms are as important to consider as the technology itself. △ Less

Submitted 12 January, 2020; originally announced January 2020.

Comments: Accepted by SIGCSE 2020

Journal ref: In Proc. ACM Technical Symposium on Computer Science Education (SIGCSE 2020), pages 880-886. ACM, 2020

arXiv:1912.10489 [pdf, other]

Recurrent Feedback Improves Feedforward Representations in Deep Neural Networks

Authors: Siming Yan, Xuyang Fang, Bowen Xiao, Harold Rockwell, Yimeng Zhang, Tai Sing Lee

Abstract: The abundant recurrent horizontal and feedback connections in the primate visual cortex are thought to play an important role in bringing global and semantic contextual information to early visual areas during perceptual inference, hel** to resolve local ambiguity and fill in missing details. In this study, we find that introducing feedback loops and horizontal recurrent connections to a deep co… ▽ More The abundant recurrent horizontal and feedback connections in the primate visual cortex are thought to play an important role in bringing global and semantic contextual information to early visual areas during perceptual inference, hel** to resolve local ambiguity and fill in missing details. In this study, we find that introducing feedback loops and horizontal recurrent connections to a deep convolution neural network (VGG16) allows the network to become more robust against noise and occlusion during inference, even in the initial feedforward pass. This suggests that recurrent feedback and contextual modulation transform the feedforward representations of the network in a meaningful and interesting way. We study the population codes of neurons in the network, before and after learning with feedback, and find that learning with feedback yielded an increase in discriminability (measured by d-prime) between the different object classes in the population codes of the neurons in the feedforward path, even at the earliest layer that receives feedback. We find that recurrent feedback, by injecting top-down semantic meaning to the population activities, helps the network learn better feedforward paths to robustly map noisy image patches to the latent representations corresponding to important visual concepts of each object class, resulting in greater robustness of the network against noises and occlusion as well as better fine-grained recognition. △ Less

Submitted 22 December, 2019; originally announced December 2019.

Comments: 10 pages, 5 figures

arXiv:1901.09002

A Neurally-Inspired Hierarchical Prediction Network for Spatiotemporal Sequence Learning and Prediction

Authors: Jielin Qiu, Ge Huang, Tai Sing Lee

Abstract: In this paper we developed a hierarchical network model, called Hierarchical Prediction Network (HPNet), to understand how spatiotemporal memories might be learned and encoded in the recurrent circuits in the visual cortical hierarchy for predicting future video frames. This neurally inspired model operates in the analysis-by-synthesis framework. It contains a feed-forward path that computes and e… ▽ More In this paper we developed a hierarchical network model, called Hierarchical Prediction Network (HPNet), to understand how spatiotemporal memories might be learned and encoded in the recurrent circuits in the visual cortical hierarchy for predicting future video frames. This neurally inspired model operates in the analysis-by-synthesis framework. It contains a feed-forward path that computes and encodes spatiotemporal features of successive complexity and a feedback path for the successive levels to project their interpretations to the level below. Within each level, the feed-forward path and the feedback path intersect in a recurrent gated circuit, instantiated in a LSTM module, to generate a prediction or explanation of the incoming signals. The network learns its internal model of the world by minimizing the errors of its prediction of the incoming signals at each level of the hierarchy. We found that hierarchical interaction in the network increases semantic clustering of global movement patterns in the population codes of the units along the hierarchy, even in the earliest module. This facilitates the learning of relationships among movement patterns, yielding state-of-the-art performance in long range video sequence predictions in the benchmark datasets. The network model automatically reproduces a variety of prediction suppression and familiarity suppression neurophysiological phenomena observed in the visual cortex, suggesting that hierarchical prediction might indeed be an important principle for representational learning in the visual cortex. △ Less

Submitted 1 October, 2021; v1 submitted 25 January, 2019; originally announced January 2019.

Comments: Some of the results are not replicable

arXiv:1803.02549 [pdf, other]

An iALM-ICA-based Anti-Jamming DS-CDMA Receiver for LMS Systems

Authors: Hyoyoung Jung, Jaewook Kang, Tae Seok Lee, Suil Kim, Kiseon Kim

Abstract: We consider a land mobile satellite communication system using spread spectrum techniques where the uplink is exposed to MT jamming attacks, and the downlink is corrupted by multi-path fading channels. We proposes an anti-jamming receiver, which exploits inherent low-dimensionality of the received signal model, by formulating a robust principal component analysis (Robust PCA)-based recovery proble… ▽ More We consider a land mobile satellite communication system using spread spectrum techniques where the uplink is exposed to MT jamming attacks, and the downlink is corrupted by multi-path fading channels. We proposes an anti-jamming receiver, which exploits inherent low-dimensionality of the received signal model, by formulating a robust principal component analysis (Robust PCA)-based recovery problem. Simulation results verify that the proposed receiver outperforms the conventional receiver for a reasonable rank of the jamming signal. △ Less

Submitted 7 March, 2018; originally announced March 2018.

Comments: IEEE Transactions on Aerospace and Electric Systems, "accepted"

arXiv:1705.07768 [pdf, other]

doi 10.1109/CRV.2017.52

Learning to Associate Words and Images Using a Large-scale Graph

Authors: Heqing Ya, Haonan Sun, Jeffrey Helt, Tai Sing Lee

Abstract: We develop an approach for unsupervised learning of associations between co-occurring perceptual events using a large graph. We applied this approach to successfully solve the image captcha of China's railroad system. The approach is based on the principle of suspicious coincidence. In this particular problem, a user is presented with a deformed picture of a Chinese phrase and eight low-resolution… ▽ More We develop an approach for unsupervised learning of associations between co-occurring perceptual events using a large graph. We applied this approach to successfully solve the image captcha of China's railroad system. The approach is based on the principle of suspicious coincidence. In this particular problem, a user is presented with a deformed picture of a Chinese phrase and eight low-resolution images. They must quickly select the relevant images in order to purchase their train tickets. This problem presents several challenges: (1) the teaching labels for both the Chinese phrases and the images were not available for supervised learning, (2) no pre-trained deep convolutional neural networks are available for recognizing these Chinese phrases or the presented images, and (3) each captcha must be solved within a few seconds. We collected 2.6 million captchas, with 2.6 million deformed Chinese phrases and over 21 million images. From these data, we constructed an association graph, composed of over 6 million vertices, and linked these vertices based on co-occurrence information and feature similarity between pairs of images. We then trained a deep convolutional neural network to learn a projection of the Chinese phrases onto a 230-dimensional latent space. Using label propagation, we computed the likelihood of each of the eight images conditioned on the latent space projection of the deformed phrase for each captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on average. Our work, in answering this practical challenge, illustrates the power of this class of unsupervised association learning techniques, which may be related to the brain's general strategy for associating language stimuli with visual objects on the principle of suspicious coincidence. △ Less

Submitted 22 May, 2017; originally announced May 2017.

Comments: 8 pages, 7 figures, 14th Conference on Computer and Robot Vision 2017

arXiv:1705.07594 [pdf, other]

doi 10.1109/CRV.2017.42

Learning Robust Object Recognition Using Composed Scenes from Generative Models

Authors: Hao Wang, Xingyu Lin, Yimeng Zhang, Tai Sing Lee

Abstract: Recurrent feedback connections in the mammalian visual system have been hypothesized to play a role in synthesizing input in the theoretical framework of analysis by synthesis. The comparison of internally synthesized representation with that of the input provides a validation mechanism during perceptual inference and learning. Inspired by these ideas, we proposed that the synthesis machinery can… ▽ More Recurrent feedback connections in the mammalian visual system have been hypothesized to play a role in synthesizing input in the theoretical framework of analysis by synthesis. The comparison of internally synthesized representation with that of the input provides a validation mechanism during perceptual inference and learning. Inspired by these ideas, we proposed that the synthesis machinery can compose new, unobserved images by imagination to train the network itself so as to increase the robustness of the system in novel scenarios. As a proof of concept, we investigated whether images composed by imagination could help an object recognition system to deal with occlusion, which is challenging for the current state-of-the-art deep convolutional neural networks. We fine-tuned a network on images containing objects in various occlusion scenarios, that are imagined or self-generated through a deep generator network. Trained on imagined occluded scenarios under the object persistence constraint, our network discovered more subtle and localized image features that were neglected by the original network for object classification, obtaining better separability of different object classes in the feature space. This leads to significant improvement of object recognition under occlusion for our network relative to the original network trained only on un-occluded images. In addition to providing practical benefits in object recognition under occlusion, this work demonstrates the use of self-generated composition of visual scenes through the synthesis loop, combined with the object persistence constraint, can provide opportunities for neural networks to discover new relevant patterns in the data, and become more flexible in dealing with novel situations. △ Less

Submitted 22 May, 2017; originally announced May 2017.

Comments: Accepted by 14th Conference on Computer and Robot Vision

arXiv:1704.00033 [pdf, other]

Transfer of View-manifold Learning to Similarity Perception of Novel Objects

Authors: Xingyu Lin, Hao Wang, Zhihao Li, Yimeng Zhang, Alan Yuille, Tai Sing Lee

Abstract: We develop a model of perceptual similarity judgment based on re-training a deep convolution neural network (DCNN) that learns to associate different views of each 3D object to capture the notion of object persistence and continuity in our visual experience. The re-training process effectively performs distance metric learning under the object persistency constraints, to modify the view-manifold o… ▽ More We develop a model of perceptual similarity judgment based on re-training a deep convolution neural network (DCNN) that learns to associate different views of each 3D object to capture the notion of object persistence and continuity in our visual experience. The re-training process effectively performs distance metric learning under the object persistency constraints, to modify the view-manifold of object representations. It reduces the effective distance between the representations of different views of the same object without compromising the distance between those of the views of different objects, resulting in the untangling of the view-manifolds between individual objects within the same category and across categories. This untangling enables the model to discriminate and recognize objects within the same category, independent of viewpoints. We found that this ability is not limited to the trained objects, but transfers to novel objects in both trained and untrained categories, as well as to a variety of completely novel artificial synthetic objects. This transfer in learning suggests the modification of distance metrics in view- manifolds is more general and abstract, likely at the levels of parts, and independent of the specific objects or categories experienced during training. Interestingly, the resulting transformation of feature representation in the deep networks is found to significantly better match human perceptual similarity judgment than AlexNet, suggesting that object persistence could be an important constraint in the development of perceptual similarity judgment in biological neural networks. △ Less

Submitted 31 March, 2017; originally announced April 2017.

Comments: Accepted to ICLR2017

arXiv:1411.3815 [pdf, other]

Predictive Encoding of Contextual Relationships for Perceptual Inference, Interpolation and Prediction

Authors: Mingmin Zhao, Chengxu Zhuang, Yizhou Wang, Tai Sing Lee

Abstract: We propose a new neurally-inspired model that can learn to encode the global relationship context of visual events across time and space and to use the contextual information to modulate the analysis by synthesis process in a predictive coding framework. The model learns latent contextual representations by maximizing the predictability of visual events based on local and global contextual informa… ▽ More We propose a new neurally-inspired model that can learn to encode the global relationship context of visual events across time and space and to use the contextual information to modulate the analysis by synthesis process in a predictive coding framework. The model learns latent contextual representations by maximizing the predictability of visual events based on local and global contextual information through both top-down and bottom-up processes. In contrast to standard predictive coding models, the prediction error in this model is used to update the contextual representation but does not alter the feedforward input for the next layer, and is thus more consistent with neurophysiological observations. We establish the computational feasibility of this model by demonstrating its ability in several aspects. We show that our model can outperform state-of-art performances of gated Boltzmann machines (GBM) in estimation of contextual information. Our model can also interpolate missing events or predict future events in image sequences while simultaneously estimating contextual information. We show it achieves state-of-art performances in terms of prediction accuracy in a variety of tasks and possesses the ability to interpolate missing frames, a function that is lacking in GBM. △ Less

Submitted 16 April, 2015; v1 submitted 14 November, 2014; originally announced November 2014.

arXiv:1204.2609 [pdf, ps, other]

Stochastic Feature Map** for PAC-Bayes Classification

Authors: Xiong Li, Tai Sing Lee, Yuncai Liu

Abstract: Probabilistic generative modeling of data distributions can potentially exploit hidden information which is useful for discriminative classification. This observation has motivated the development of approaches that couple generative and discriminative models for classification. In this paper, we propose a new approach to couple generative and discriminative models in an unified framework based on… ▽ More Probabilistic generative modeling of data distributions can potentially exploit hidden information which is useful for discriminative classification. This observation has motivated the development of approaches that couple generative and discriminative models for classification. In this paper, we propose a new approach to couple generative and discriminative models in an unified framework based on PAC-Bayes risk theory. We first derive the model-parameter-independent stochastic feature map** from a practical MAP classifier operating on generative models. Then we construct a linear stochastic classifier equipped with the feature map**, and derive the explicit PAC-Bayes risk bounds for such classifier for both supervised and semi-supervised learning. Minimizing the risk bound, using an EM-like iterative procedure, results in a new posterior over hidden variables (E-step) and the update rules of model parameters (M-step). The derivation of the posterior is always feasible due to the way of equip** feature map** and the explicit form of bounding risk. The derived posterior allows the tuning of generative models and subsequently the feature map**s for better classification. The derived update rules of the model parameters are same to those of the uncoupled models as the feature map** is model-parameter-independent. Our experiments show that the coupling between data modeling generative model and the discriminative classifier via a stochastic feature map** in this framework leads to a general classification tool with state-of-the-art performance. △ Less

Submitted 15 April, 2012; v1 submitted 11 April, 2012; originally announced April 2012.

Comments: 6 pages, 3 figures

Showing 1–19 of 19 results for author: Lee, T S