-
RONELDv2: A faster, improved lane tracking method
Authors:
Zhe Ming Chng,
Joseph Mun Hung Lew,
Jimmy Addison Lee
Abstract:
Lane detection is an integral part of control systems in autonomous vehicles and lane departure warning systems as lanes are a key component of the operating environment for road vehicles. In a previous paper, a robust neural network output enhancement for active lane detection (RONELD) method augmenting deep learning lane detection models to improve active, or ego, lane accuracy performance was p…
▽ More
Lane detection is an integral part of control systems in autonomous vehicles and lane departure warning systems as lanes are a key component of the operating environment for road vehicles. In a previous paper, a robust neural network output enhancement for active lane detection (RONELD) method augmenting deep learning lane detection models to improve active, or ego, lane accuracy performance was presented. This paper extends the work by further investigating the lane tracking methods used to increase robustness of the method to lane changes and different lane dimensions (e.g. lane marking thickness) and proposes an improved, lighter weight lane detection method, RONELDv2. It improves on the previous RONELD method by detecting the lane point variance, merging lanes to find a more accurate set of lane parameters, and using an exponential moving average method to calculate more robust lane weights. Experiments using the proposed improvements show a consistent increase in lane detection accuracy results across different datasets and deep learning models, as well as a decrease in computational complexity observed via an up to two-fold decrease in runtime, which enhances its suitability for real-time use on autonomous vehicles and lane departure warning systems.
△ Less
Submitted 26 February, 2022;
originally announced February 2022.
-
Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval
Authors:
Wei Chen,
Yu Liu,
Erwin M. Bakker,
Michael S. Lew
Abstract:
Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community. To address these challenges posited by the heterogeneity gap and the semantic gap, we propose integrating Shannon information theory and adversarial learning. In terms of the heterogeneity gap, we integrate modality classification and information entropy maximization adversaria…
▽ More
Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community. To address these challenges posited by the heterogeneity gap and the semantic gap, we propose integrating Shannon information theory and adversarial learning. In terms of the heterogeneity gap, we integrate modality classification and information entropy maximization adversarially. For this purpose, a modality classifier (as a discriminator) is built to distinguish the text and image modalities according to their different statistical properties. This discriminator uses its output probabilities to compute Shannon information entropy, which measures the uncertainty of the modality classification it performs. Moreover, feature encoders (as a generator) project uni-modal features into a commonly shared space and attempt to fool the discriminator by maximizing its output information entropy. Thus, maximizing information entropy gradually reduces the distribution discrepancy of cross-modal features, thereby achieving a domain confusion state where the discriminator cannot classify two modalities confidently. To reduce the semantic gap, Kullback-Leibler (KL) divergence and bi-directional triplet loss are used to associate the intra- and inter-modality similarity between features in the shared space. Furthermore, a regularization term based on KL-divergence with temperature scaling is used to calibrate the biased label classifier caused by the data imbalance issue. Extensive experiments with four deep models on four benchmarks are conducted to demonstrate the effectiveness of the proposed approach.
△ Less
Submitted 11 April, 2021;
originally announced April 2021.
-
Lifelong Person Re-Identification via Adaptive Knowledge Accumulation
Authors:
Nan Pu,
Wei Chen,
Yu Liu,
Erwin M. Bakker,
Michael S. Lew
Abstract:
Person ReID methods always learn through a stationary domain that is fixed by the choice of a given dataset. In many contexts (e.g., lifelong learning), those methods are ineffective because the domain is continually changing in which case incremental learning over multiple domains is required potentially. In this work we explore a new and challenging ReID task, namely lifelong person re-identific…
▽ More
Person ReID methods always learn through a stationary domain that is fixed by the choice of a given dataset. In many contexts (e.g., lifelong learning), those methods are ineffective because the domain is continually changing in which case incremental learning over multiple domains is required potentially. In this work we explore a new and challenging ReID task, namely lifelong person re-identification (LReID), which enables to learn continuously across multiple domains and even generalise on new and unseen domains. Following the cognitive processes in the human brain, we design an Adaptive Knowledge Accumulation (AKA) framework that is endowed with two crucial abilities: knowledge representation and knowledge operation. Our method alleviates catastrophic forgetting on seen domains and demonstrates the ability to generalize to unseen domains. Correspondingly, we also provide a new and large-scale benchmark for LReID. Extensive experiments demonstrate our method outperforms other competitors by a margin of 5.8% mAP in generalising evaluation.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Preprint: Norm Loss: An efficient yet effective regularization method for deep neural networks
Authors:
Theodoros Georgiou,
Sebastian Schmitt,
Thomas Bäck,
Wei Chen,
Michael Lew
Abstract:
Convolutional neural network training can suffer from diverse issues like exploding or vanishing gradients, scaling-based weight space symmetry and covariant-shift. In order to address these issues, researchers develop weight regularization methods and activation normalization methods. In this work we propose a weight soft-regularization method based on the Oblique manifold. The proposed method us…
▽ More
Convolutional neural network training can suffer from diverse issues like exploding or vanishing gradients, scaling-based weight space symmetry and covariant-shift. In order to address these issues, researchers develop weight regularization methods and activation normalization methods. In this work we propose a weight soft-regularization method based on the Oblique manifold. The proposed method uses a loss function which pushes each weight vector to have a norm close to one, i.e. the weight matrix is smoothly steered toward the so-called Oblique manifold. We evaluate our method on the very popular CIFAR-10, CIFAR-100 and ImageNet 2012 datasets using two state-of-the-art architectures, namely the ResNet and wide-ResNet. Our method introduces negligible computational overhead and the results show that it is competitive to the state-of-the-art and in some cases superior to it. Additionally, the results are less sensitive to hyperparameter settings such as batch size and regularization factor.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
PREPRINT: Comparison of deep learning and hand crafted features for mining simulation data
Authors:
Theodoros Georgiou,
Sebastian Schmitt,
Thomas Bäck,
Nan Pu,
Wei Chen,
Michael Lew
Abstract:
Computational Fluid Dynamics (CFD) simulations are a very important tool for many industrial applications, such as aerodynamic optimization of engineering designs like cars shapes, airplanes parts etc. The output of such simulations, in particular the calculated flow fields, are usually very complex and hard to interpret for realistic three-dimensional real-world applications, especially if time-d…
▽ More
Computational Fluid Dynamics (CFD) simulations are a very important tool for many industrial applications, such as aerodynamic optimization of engineering designs like cars shapes, airplanes parts etc. The output of such simulations, in particular the calculated flow fields, are usually very complex and hard to interpret for realistic three-dimensional real-world applications, especially if time-dependent simulations are investigated. Automated data analysis methods are warranted but a non-trivial obstacle is given by the very large dimensionality of the data. A flow field typically consists of six measurement values for each point of the computational grid in 3D space and time (velocity vector values, turbulent kinetic energy, pressure and viscosity). In this paper we address the task of extracting meaningful results in an automated manner from such high dimensional data sets. We propose deep learning methods which are capable of processing such data and which can be trained to solve relevant tasks on simulation data, i.e. predicting drag and lift forces applied on an airfoil. We also propose an adaptation of the classical hand crafted features known from computer vision to address the same problem and compare a large variety of descriptors and detectors. Finally, we compile a large dataset of 2D simulations of the flow field around airfoils which contains 16000 flow fields with which we tested and compared approaches. Our results show that the deep learning-based methods, as well as hand crafted feature based approaches, are well-capable to accurately describe the content of the CFD simulation output on the proposed dataset.
△ Less
Submitted 11 March, 2021;
originally announced March 2021.
-
Deep Learning for Instance Retrieval: A Survey
Authors:
Wei Chen,
Yu Liu,
Wei** Wang,
Erwin Bakker,
Theodoros Georgiou,
Paul Fieguth,
Li Liu,
Michael S. Lew
Abstract:
In recent years a vast amount of visual content has been generated and shared from many fields, such as social media platforms, medical imaging, and robotics. This abundance of content creation and sharing has introduced new challenges, particularly that of searching databases for similar content-Content Based Image Retrieval (CBIR)-a long-established research area in which improved efficiency and…
▽ More
In recent years a vast amount of visual content has been generated and shared from many fields, such as social media platforms, medical imaging, and robotics. This abundance of content creation and sharing has introduced new challenges, particularly that of searching databases for similar content-Content Based Image Retrieval (CBIR)-a long-established research area in which improved efficiency and accuracy are needed for real-time retrieval. Artificial intelligence has made progress in CBIR and has significantly facilitated the process of instance search. In this survey we review recent instance retrieval works that are developed based on deep learning algorithms and techniques, with the survey organized by deep network architecture types, deep features, feature embedding and aggregation methods, and network fine-tuning strategies. Our survey considers a wide variety of recent methods, whereby we identify milestone work, reveal connections among various methods and present the commonly used benchmarks, evaluation results, common challenges, and propose promising future directions.
△ Less
Submitted 30 October, 2022; v1 submitted 27 January, 2021;
originally announced January 2021.
-
RONELD: Robust Neural Network Output Enhancement for Active Lane Detection
Authors:
Zhe Ming Chng,
Joseph Mun Hung Lew,
Jimmy Addison Lee
Abstract:
Accurate lane detection is critical for navigation in autonomous vehicles, particularly the active lane which demarcates the single road space that the vehicle is currently traveling on. Recent state-of-the-art lane detection algorithms utilize convolutional neural networks (CNNs) to train deep learning models on popular benchmarks such as TuSimple and CULane. While each of these models works part…
▽ More
Accurate lane detection is critical for navigation in autonomous vehicles, particularly the active lane which demarcates the single road space that the vehicle is currently traveling on. Recent state-of-the-art lane detection algorithms utilize convolutional neural networks (CNNs) to train deep learning models on popular benchmarks such as TuSimple and CULane. While each of these models works particularly well on train and test inputs obtained from the same dataset, the performance drops significantly on unseen datasets of different environments. In this paper, we present a real-time robust neural network output enhancement for active lane detection (RONELD) method to identify, track, and optimize active lanes from deep learning probability map outputs. We first adaptively extract lane points from the probability map outputs, followed by detecting curved and straight lanes before using weighted least squares linear regression on straight lanes to fix broken lane edges resulting from fragmentation of edge maps in real images. Lastly, we hypothesize true active lanes through tracking preceding frames. Experimental results demonstrate an up to two-fold increase in accuracy using RONELD on cross-dataset validation tests.
△ Less
Submitted 2 November, 2020; v1 submitted 19 October, 2020;
originally announced October 2020.
-
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Authors:
Wei Chen,
Wei** Wang,
Li Liu,
Michael S. Lew
Abstract:
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. Unlike classic reviews of deep learning where monomodal image classifiers such as VGG, ResNet and Inception module are central topics, this paper will examine recent multimodal deep models and structures, including auto-encoders, generative adversarial nets and their variants. These models go…
▽ More
The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. Unlike classic reviews of deep learning where monomodal image classifiers such as VGG, ResNet and Inception module are central topics, this paper will examine recent multimodal deep models and structures, including auto-encoders, generative adversarial nets and their variants. These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering) multimodal tasks. Besides, we analyze two aspects of the challenge in terms of better content understanding in deep multimodal applications. We then introduce current ideas and trends in deep multimodal feature learning, such as feature embedding approaches and objective function design, which are crucial in overcoming the aforementioned challenges. Finally, we include several promising directions for future research.
△ Less
Submitted 16 October, 2020;
originally announced October 2020.
-
On the Exploration of Incremental Learning for Fine-grained Image Retrieval
Authors:
Wei Chen,
Yu Liu,
Wei** Wang,
Tinne Tuytelaars,
Erwin M. Bakker,
Michael Lew
Abstract:
In this paper, we consider the problem of fine-grained image retrieval in an incremental setting, when new categories are added over time. On the one hand, repeatedly training the representation on the extended dataset is time-consuming. On the other hand, fine-tuning the learned representation only with the new classes leads to catastrophic forgetting. To this end, we propose an incremental learn…
▽ More
In this paper, we consider the problem of fine-grained image retrieval in an incremental setting, when new categories are added over time. On the one hand, repeatedly training the representation on the extended dataset is time-consuming. On the other hand, fine-tuning the learned representation only with the new classes leads to catastrophic forgetting. To this end, we propose an incremental learning method to mitigate retrieval performance degradation caused by the forgetting issue. Without accessing any samples of the original classes, the classifier of the original network provides soft "labels" to transfer knowledge to train the adaptive network, so as to preserve the previous capability for classification. More importantly, a regularization function based on Maximum Mean Discrepancy is devised to minimize the discrepancy of new classes features from the original network and the adaptive network, respectively. Extensive experiments on two datasets show that our method effectively mitigates the catastrophic forgetting on the original classes while achieving high performance on the new classes.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Dual Gaussian-based Variational Subspace Disentanglement for Visible-Infrared Person Re-Identification
Authors:
Nan Pu,
Wei Chen,
Yu Liu,
Erwin M. Bakker,
Michael S. Lew
Abstract:
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task in night-time intelligent surveillance systems. Except for the intra-modality variance that RGB-RGB person re-identification mainly overcomes, VI-ReID suffers from additional inter-modality variance caused by the inherent heterogeneous gap. To solve the problem, we present a carefully designed dual Gaussian-bas…
▽ More
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task in night-time intelligent surveillance systems. Except for the intra-modality variance that RGB-RGB person re-identification mainly overcomes, VI-ReID suffers from additional inter-modality variance caused by the inherent heterogeneous gap. To solve the problem, we present a carefully designed dual Gaussian-based variational auto-encoder (DG-VAE), which disentangles an identity-discriminable and an identity-ambiguous cross-modality feature subspace, following a mixture-of-Gaussians (MoG) prior and a standard Gaussian distribution prior, respectively. Disentangling cross-modality identity-discriminable features leads to more robust retrieval for VI-ReID. To achieve efficient optimization like conventional VAE, we theoretically derive two variational inference terms for the MoG prior under the supervised setting, which not only restricts the identity-discriminable subspace so that the model explicitly handles the cross-modality intra-identity variance, but also enables the MoG distribution to avoid posterior collapse. Furthermore, we propose a triplet swap reconstruction (TSR) strategy to promote the above disentangling process. Extensive experiments demonstrate that our method outperforms state-of-the-art methods on two VI-ReID datasets.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
A Comparison of CNN and Classic Features for Image Retrieval
Authors:
Umut Özaydın,
Theodoros Georgiou,
Michael Lew
Abstract:
Feature detectors and descriptors have been successfully used for various computer vision tasks, such as video object tracking and content-based image retrieval. Many methods use image gradients in different stages of the detection-description pipeline to describe local image structures. Recently, some, or all, of these stages have been replaced by convolutional neural networks (CNNs), in order to…
▽ More
Feature detectors and descriptors have been successfully used for various computer vision tasks, such as video object tracking and content-based image retrieval. Many methods use image gradients in different stages of the detection-description pipeline to describe local image structures. Recently, some, or all, of these stages have been replaced by convolutional neural networks (CNNs), in order to increase their performance. A detector is defined as a selection problem, which makes it more challenging to implement as a CNN. They are therefore generally defined as regressors, converting input images to score maps and keypoints can be selected with non-maximum suppression. This paper discusses and compares several recent methods that use CNNs for keypoint detection. Experiments are performed both on the CNN based approaches, as well as a selection of conventional methods. In addition to qualitative measures defined on keypoints and descriptors, the bag-of-words (BoW) model is used to implement an image retrieval application, in order to determine how the methods perform in practice. The results show that each type of features are best in different contexts.
△ Less
Submitted 25 August, 2019;
originally announced August 2019.
-
On the Exploration of Convolutional Fusion Networks for Visual Recognition
Authors:
Yu Liu,
Yanming Guo,
Michael S. Lew
Abstract:
Despite recent advances in multi-scale deep representations, their limitations are attributed to expensive parameters and weak fusion modules. Hence, we propose an efficient approach to fuse multi-scale deep representations, called convolutional fusion networks (CFN). Owing to using 1$\times$1 convolution and global average pooling, CFN can efficiently generate the side branches while adding few p…
▽ More
Despite recent advances in multi-scale deep representations, their limitations are attributed to expensive parameters and weak fusion modules. Hence, we propose an efficient approach to fuse multi-scale deep representations, called convolutional fusion networks (CFN). Owing to using 1$\times$1 convolution and global average pooling, CFN can efficiently generate the side branches while adding few parameters. In addition, we present a locally-connected fusion module, which can learn adaptive weights for the side branches and form a discriminatively fused feature. CFN models trained on the CIFAR and ImageNet datasets demonstrate remarkable improvements over the plain CNNs. Furthermore, we generalize CFN to three new tasks, including scene recognition, fine-grained recognition and image retrieval. Our experiments show that it can obtain consistent improvements towards the transferring tasks.
△ Less
Submitted 16 November, 2016;
originally announced November 2016.
-
Across Browsers SVG Implementation
Authors:
Liang Wang,
Nies Huijsmans,
Michael S. Lew,
Dan Tsymbala
Abstract:
In this work SVG will be translated into VML or HTML by using Javascript based on Backbase Client Framework. The target of this project is to implement SVG to be viewed in Internet Explorer without any plug-in and work together with other Backbase Client Framework languages. The result of this project will be added as an extension to the current Backbase Client Framework.
In this work SVG will be translated into VML or HTML by using Javascript based on Backbase Client Framework. The target of this project is to implement SVG to be viewed in Internet Explorer without any plug-in and work together with other Backbase Client Framework languages. The result of this project will be added as an extension to the current Backbase Client Framework.
△ Less
Submitted 31 December, 2010;
originally announced January 2011.
-
Binary and nonbinary description of hypointensity in human brain MR images
Authors:
Xiao**g Chen,
Michael S. Lew
Abstract:
Accumulating evidence has shown that iron is involved in the mechanism underlying many neurodegenerative diseases, such as Alzheimer's disease, Parkinson's disease and Huntington's disease. Abnormal (higher) iron accumulation has been detected in the brains of most neurodegenerative patients, especially in the basal ganglia region. Presence of iron leads to changes in MR signal in both magnitude a…
▽ More
Accumulating evidence has shown that iron is involved in the mechanism underlying many neurodegenerative diseases, such as Alzheimer's disease, Parkinson's disease and Huntington's disease. Abnormal (higher) iron accumulation has been detected in the brains of most neurodegenerative patients, especially in the basal ganglia region. Presence of iron leads to changes in MR signal in both magnitude and phase. Accordingly, tissues with high iron concentration appear hypo-intense (darker than usual) in MR contrasts. In this report, we proposed an improved binary hypointensity description and a novel nonbinary hypointensity description based on principle components analysis. Moreover, Kendall's rank correlation coefficient was used to compare the complementary and redundant information provided by the two methods in order to better understand the individual descriptions of iron accumulation in the brain.
△ Less
Submitted 31 December, 2010;
originally announced January 2011.
-
A Framework for Real-Time Face and Facial Feature Tracking using Optical Flow Pre-estimation and Template Tracking
Authors:
E. R. Gast,
Michael S. Lew
Abstract:
This work presents a framework for tracking head movements and capturing the movements of the mouth and both the eyebrows in real-time. We present a head tracker which is a combination of a optical flow and a template based tracker. The estimation of the optical flow head tracker is used as starting point for the template tracker which fine-tunes the head estimation. This approach together with re…
▽ More
This work presents a framework for tracking head movements and capturing the movements of the mouth and both the eyebrows in real-time. We present a head tracker which is a combination of a optical flow and a template based tracker. The estimation of the optical flow head tracker is used as starting point for the template tracker which fine-tunes the head estimation. This approach together with re-updating the optical flow points prevents the head tracker from drifting. This combination together with our switching scheme, makes our tracker very robust against fast movement and motion-blur. We also propose a way to reduce the influence of partial occlusion of the head. In both the optical flow and the template based tracker we identify and exclude occluded points.
△ Less
Submitted 31 December, 2010;
originally announced January 2011.
-
Analysis of Using Browser-native Technology to Build Rich Internet Applications for Image Manipulation
Authors:
Thomas Steenbergen,
Michael S. Lew
Abstract:
In this work we investigate whether browser-native technologies can be used to perform photo manipulation tasks e.g crop**, resizing or rotating an image within the current mainstream browser. By the use of a case study we will analyze problems that have occurred during the implementation of a prototype web application that utilizes browser-native web technology in order to create an online vers…
▽ More
In this work we investigate whether browser-native technologies can be used to perform photo manipulation tasks e.g crop**, resizing or rotating an image within the current mainstream browser. By the use of a case study we will analyze problems that have occurred during the implementation of a prototype web application that utilizes browser-native web technology in order to create an online version of a real world photo scrapbook. Implementation of a prototype will allows us to analyze the strengths and weaknesses of current web technology when it comes to browser-based image manipulation. Furthermore we explore the possibilities of the Ajax in combination Canvas, SVG and VML to provide a more interactive graphical user interface to perform image manipulation tasks on the web.
△ Less
Submitted 31 December, 2010;
originally announced January 2011.
-
Dynamic Feature Description in Human Action Recognition
Authors:
Ruoyun Gao,
Michael S. Lew,
Ling Shao
Abstract:
This work aims to present novel description methods for human action recognition. Generally, a video sequence can be represented as a collection of spatial temporal words by detecting space-time interest points and describing the unique features around the detected points (Bag of Words representation). Interest points as well as the cuboids around them are considered informative for feature descri…
▽ More
This work aims to present novel description methods for human action recognition. Generally, a video sequence can be represented as a collection of spatial temporal words by detecting space-time interest points and describing the unique features around the detected points (Bag of Words representation). Interest points as well as the cuboids around them are considered informative for feature description in terms of both the structural distribution of interest points and the information content inside the cuboids. Our proposed description approaches are based on this idea and making the feature descriptors more discriminative.
△ Less
Submitted 31 December, 2010;
originally announced January 2011.