Search | arXiv e-print repository

RONELDv2: A faster, improved lane tracking method

Authors: Zhe Ming Chng, Joseph Mun Hung Lew, Jimmy Addison Lee

Abstract: Lane detection is an integral part of control systems in autonomous vehicles and lane departure warning systems as lanes are a key component of the operating environment for road vehicles. In a previous paper, a robust neural network output enhancement for active lane detection (RONELD) method augmenting deep learning lane detection models to improve active, or ego, lane accuracy performance was p… ▽ More Lane detection is an integral part of control systems in autonomous vehicles and lane departure warning systems as lanes are a key component of the operating environment for road vehicles. In a previous paper, a robust neural network output enhancement for active lane detection (RONELD) method augmenting deep learning lane detection models to improve active, or ego, lane accuracy performance was presented. This paper extends the work by further investigating the lane tracking methods used to increase robustness of the method to lane changes and different lane dimensions (e.g. lane marking thickness) and proposes an improved, lighter weight lane detection method, RONELDv2. It improves on the previous RONELD method by detecting the lane point variance, merging lanes to find a more accurate set of lane parameters, and using an exponential moving average method to calculate more robust lane weights. Experiments using the proposed improvements show a consistent increase in lane detection accuracy results across different datasets and deep learning models, as well as a decrease in computational complexity observed via an up to two-fold decrease in runtime, which enhances its suitability for real-time use on autonomous vehicles and lane departure warning systems. △ Less

Submitted 26 February, 2022; originally announced February 2022.

Comments: 9 pages, 8 figures, 6 tables

arXiv:2104.04991 [pdf, other]

Integrating Information Theory and Adversarial Learning for Cross-modal Retrieval

Authors: Wei Chen, Yu Liu, Erwin M. Bakker, Michael S. Lew

Abstract: Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community. To address these challenges posited by the heterogeneity gap and the semantic gap, we propose integrating Shannon information theory and adversarial learning. In terms of the heterogeneity gap, we integrate modality classification and information entropy maximization adversaria… ▽ More Accurately matching visual and textual data in cross-modal retrieval has been widely studied in the multimedia community. To address these challenges posited by the heterogeneity gap and the semantic gap, we propose integrating Shannon information theory and adversarial learning. In terms of the heterogeneity gap, we integrate modality classification and information entropy maximization adversarially. For this purpose, a modality classifier (as a discriminator) is built to distinguish the text and image modalities according to their different statistical properties. This discriminator uses its output probabilities to compute Shannon information entropy, which measures the uncertainty of the modality classification it performs. Moreover, feature encoders (as a generator) project uni-modal features into a commonly shared space and attempt to fool the discriminator by maximizing its output information entropy. Thus, maximizing information entropy gradually reduces the distribution discrepancy of cross-modal features, thereby achieving a domain confusion state where the discriminator cannot classify two modalities confidently. To reduce the semantic gap, Kullback-Leibler (KL) divergence and bi-directional triplet loss are used to associate the intra- and inter-modality similarity between features in the shared space. Furthermore, a regularization term based on KL-divergence with temperature scaling is used to calibrate the biased label classifier caused by the data imbalance issue. Extensive experiments with four deep models on four benchmarks are conducted to demonstrate the effectiveness of the proposed approach. △ Less

Submitted 11 April, 2021; originally announced April 2021.

Comments: Accepted by Pattern Recognition

arXiv:2103.12462 [pdf, other]

Lifelong Person Re-Identification via Adaptive Knowledge Accumulation

Authors: Nan Pu, Wei Chen, Yu Liu, Erwin M. Bakker, Michael S. Lew

Abstract: Person ReID methods always learn through a stationary domain that is fixed by the choice of a given dataset. In many contexts (e.g., lifelong learning), those methods are ineffective because the domain is continually changing in which case incremental learning over multiple domains is required potentially. In this work we explore a new and challenging ReID task, namely lifelong person re-identific… ▽ More Person ReID methods always learn through a stationary domain that is fixed by the choice of a given dataset. In many contexts (e.g., lifelong learning), those methods are ineffective because the domain is continually changing in which case incremental learning over multiple domains is required potentially. In this work we explore a new and challenging ReID task, namely lifelong person re-identification (LReID), which enables to learn continuously across multiple domains and even generalise on new and unseen domains. Following the cognitive processes in the human brain, we design an Adaptive Knowledge Accumulation (AKA) framework that is endowed with two crucial abilities: knowledge representation and knowledge operation. Our method alleviates catastrophic forgetting on seen domains and demonstrates the ability to generalize to unseen domains. Correspondingly, we also provide a new and large-scale benchmark for LReID. Extensive experiments demonstrate our method outperforms other competitors by a margin of 5.8% mAP in generalising evaluation. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: 10 pages, 5 figures, Accepted by CVPR2021

arXiv:2103.06583 [pdf, other]

Preprint: Norm Loss: An efficient yet effective regularization method for deep neural networks

Authors: Theodoros Georgiou, Sebastian Schmitt, Thomas Bäck, Wei Chen, Michael Lew

Abstract: Convolutional neural network training can suffer from diverse issues like exploding or vanishing gradients, scaling-based weight space symmetry and covariant-shift. In order to address these issues, researchers develop weight regularization methods and activation normalization methods. In this work we propose a weight soft-regularization method based on the Oblique manifold. The proposed method us… ▽ More Convolutional neural network training can suffer from diverse issues like exploding or vanishing gradients, scaling-based weight space symmetry and covariant-shift. In order to address these issues, researchers develop weight regularization methods and activation normalization methods. In this work we propose a weight soft-regularization method based on the Oblique manifold. The proposed method uses a loss function which pushes each weight vector to have a norm close to one, i.e. the weight matrix is smoothly steered toward the so-called Oblique manifold. We evaluate our method on the very popular CIFAR-10, CIFAR-100 and ImageNet 2012 datasets using two state-of-the-art architectures, namely the ResNet and wide-ResNet. Our method introduces negligible computational overhead and the results show that it is competitive to the state-of-the-art and in some cases superior to it. Additionally, the results are less sensitive to hyperparameter settings such as batch size and regularization factor. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Journal ref: Proceedings of the International Conference on Pattern Recognition (ICPR) 2020

arXiv:2103.06552 [pdf, other]

PREPRINT: Comparison of deep learning and hand crafted features for mining simulation data

Authors: Theodoros Georgiou, Sebastian Schmitt, Thomas Bäck, Nan Pu, Wei Chen, Michael Lew

Abstract: Computational Fluid Dynamics (CFD) simulations are a very important tool for many industrial applications, such as aerodynamic optimization of engineering designs like cars shapes, airplanes parts etc. The output of such simulations, in particular the calculated flow fields, are usually very complex and hard to interpret for realistic three-dimensional real-world applications, especially if time-d… ▽ More Computational Fluid Dynamics (CFD) simulations are a very important tool for many industrial applications, such as aerodynamic optimization of engineering designs like cars shapes, airplanes parts etc. The output of such simulations, in particular the calculated flow fields, are usually very complex and hard to interpret for realistic three-dimensional real-world applications, especially if time-dependent simulations are investigated. Automated data analysis methods are warranted but a non-trivial obstacle is given by the very large dimensionality of the data. A flow field typically consists of six measurement values for each point of the computational grid in 3D space and time (velocity vector values, turbulent kinetic energy, pressure and viscosity). In this paper we address the task of extracting meaningful results in an automated manner from such high dimensional data sets. We propose deep learning methods which are capable of processing such data and which can be trained to solve relevant tasks on simulation data, i.e. predicting drag and lift forces applied on an airfoil. We also propose an adaptation of the classical hand crafted features known from computer vision to address the same problem and compare a large variety of descriptors and detectors. Finally, we compile a large dataset of 2D simulations of the flow field around airfoils which contains 16000 flow fields with which we tested and compared approaches. Our results show that the deep learning-based methods, as well as hand crafted feature based approaches, are well-capable to accurately describe the content of the CFD simulation output on the proposed dataset. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Journal ref: Proceedings of the International Conference on Pattern Recognition (ICPR) 2020

arXiv:2101.11282 [pdf, other]

Deep Learning for Instance Retrieval: A Survey

Authors: Wei Chen, Yu Liu, Wei** Wang, Erwin Bakker, Theodoros Georgiou, Paul Fieguth, Li Liu, Michael S. Lew

Abstract: In recent years a vast amount of visual content has been generated and shared from many fields, such as social media platforms, medical imaging, and robotics. This abundance of content creation and sharing has introduced new challenges, particularly that of searching databases for similar content-Content Based Image Retrieval (CBIR)-a long-established research area in which improved efficiency and… ▽ More In recent years a vast amount of visual content has been generated and shared from many fields, such as social media platforms, medical imaging, and robotics. This abundance of content creation and sharing has introduced new challenges, particularly that of searching databases for similar content-Content Based Image Retrieval (CBIR)-a long-established research area in which improved efficiency and accuracy are needed for real-time retrieval. Artificial intelligence has made progress in CBIR and has significantly facilitated the process of instance search. In this survey we review recent instance retrieval works that are developed based on deep learning algorithms and techniques, with the survey organized by deep network architecture types, deep features, feature embedding and aggregation methods, and network fine-tuning strategies. Our survey considers a wide variety of recent methods, whereby we identify milestone work, reveal connections among various methods and present the commonly used benchmarks, evaluation results, common challenges, and propose promising future directions. △ Less

Submitted 30 October, 2022; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: IEEE Transactions on Pattern Analysis and Machine Intelligence

arXiv:2010.09548 [pdf, other]

RONELD: Robust Neural Network Output Enhancement for Active Lane Detection

Authors: Zhe Ming Chng, Joseph Mun Hung Lew, Jimmy Addison Lee

Abstract: Accurate lane detection is critical for navigation in autonomous vehicles, particularly the active lane which demarcates the single road space that the vehicle is currently traveling on. Recent state-of-the-art lane detection algorithms utilize convolutional neural networks (CNNs) to train deep learning models on popular benchmarks such as TuSimple and CULane. While each of these models works part… ▽ More Accurate lane detection is critical for navigation in autonomous vehicles, particularly the active lane which demarcates the single road space that the vehicle is currently traveling on. Recent state-of-the-art lane detection algorithms utilize convolutional neural networks (CNNs) to train deep learning models on popular benchmarks such as TuSimple and CULane. While each of these models works particularly well on train and test inputs obtained from the same dataset, the performance drops significantly on unseen datasets of different environments. In this paper, we present a real-time robust neural network output enhancement for active lane detection (RONELD) method to identify, track, and optimize active lanes from deep learning probability map outputs. We first adaptively extract lane points from the probability map outputs, followed by detecting curved and straight lanes before using weighted least squares linear regression on straight lanes to fix broken lane edges resulting from fragmentation of edge maps in real images. Lastly, we hypothesize true active lanes through tracking preceding frames. Experimental results demonstrate an up to two-fold increase in accuracy using RONELD on cross-dataset validation tests. △ Less

Submitted 2 November, 2020; v1 submitted 19 October, 2020; originally announced October 2020.

Comments: Fixed typos; Accepted at ICPR 2020, 8 pages, 6 figures, code to be published at http://github.com/czming/RONELD-Lane-Detection

arXiv:2010.08189 [pdf, other]

New Ideas and Trends in Deep Multimodal Content Understanding: A Review

Authors: Wei Chen, Wei** Wang, Li Liu, Michael S. Lew

Abstract: The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. Unlike classic reviews of deep learning where monomodal image classifiers such as VGG, ResNet and Inception module are central topics, this paper will examine recent multimodal deep models and structures, including auto-encoders, generative adversarial nets and their variants. These models go… ▽ More The focus of this survey is on the analysis of two modalities of multimodal deep learning: image and text. Unlike classic reviews of deep learning where monomodal image classifiers such as VGG, ResNet and Inception module are central topics, this paper will examine recent multimodal deep models and structures, including auto-encoders, generative adversarial nets and their variants. These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering) multimodal tasks. Besides, we analyze two aspects of the challenge in terms of better content understanding in deep multimodal applications. We then introduce current ideas and trends in deep multimodal feature learning, such as feature embedding approaches and objective function design, which are crucial in overcoming the aforementioned challenges. Finally, we include several promising directions for future research. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: Accepted by Neurocomputing

arXiv:2010.08020 [pdf, other]

On the Exploration of Incremental Learning for Fine-grained Image Retrieval

Authors: Wei Chen, Yu Liu, Wei** Wang, Tinne Tuytelaars, Erwin M. Bakker, Michael Lew

Abstract: In this paper, we consider the problem of fine-grained image retrieval in an incremental setting, when new categories are added over time. On the one hand, repeatedly training the representation on the extended dataset is time-consuming. On the other hand, fine-tuning the learned representation only with the new classes leads to catastrophic forgetting. To this end, we propose an incremental learn… ▽ More In this paper, we consider the problem of fine-grained image retrieval in an incremental setting, when new categories are added over time. On the one hand, repeatedly training the representation on the extended dataset is time-consuming. On the other hand, fine-tuning the learned representation only with the new classes leads to catastrophic forgetting. To this end, we propose an incremental learning method to mitigate retrieval performance degradation caused by the forgetting issue. Without accessing any samples of the original classes, the classifier of the original network provides soft "labels" to transfer knowledge to train the adaptive network, so as to preserve the previous capability for classification. More importantly, a regularization function based on Maximum Mean Discrepancy is devised to minimize the discrepancy of new classes features from the original network and the adaptive network, respectively. Extensive experiments on two datasets show that our method effectively mitigates the catastrophic forgetting on the original classes while achieving high performance on the new classes. △ Less

Submitted 15 October, 2020; originally announced October 2020.

Comments: BMVC2020

arXiv:2008.02520 [pdf, other]

Dual Gaussian-based Variational Subspace Disentanglement for Visible-Infrared Person Re-Identification

Authors: Nan Pu, Wei Chen, Yu Liu, Erwin M. Bakker, Michael S. Lew

Abstract: Visible-infrared person re-identification (VI-ReID) is a challenging and essential task in night-time intelligent surveillance systems. Except for the intra-modality variance that RGB-RGB person re-identification mainly overcomes, VI-ReID suffers from additional inter-modality variance caused by the inherent heterogeneous gap. To solve the problem, we present a carefully designed dual Gaussian-bas… ▽ More Visible-infrared person re-identification (VI-ReID) is a challenging and essential task in night-time intelligent surveillance systems. Except for the intra-modality variance that RGB-RGB person re-identification mainly overcomes, VI-ReID suffers from additional inter-modality variance caused by the inherent heterogeneous gap. To solve the problem, we present a carefully designed dual Gaussian-based variational auto-encoder (DG-VAE), which disentangles an identity-discriminable and an identity-ambiguous cross-modality feature subspace, following a mixture-of-Gaussians (MoG) prior and a standard Gaussian distribution prior, respectively. Disentangling cross-modality identity-discriminable features leads to more robust retrieval for VI-ReID. To achieve efficient optimization like conventional VAE, we theoretically derive two variational inference terms for the MoG prior under the supervised setting, which not only restricts the identity-discriminable subspace so that the model explicitly handles the cross-modality intra-identity variance, but also enables the MoG distribution to avoid posterior collapse. Furthermore, we propose a triplet swap reconstruction (TSR) strategy to promote the above disentangling process. Extensive experiments demonstrate that our method outperforms state-of-the-art methods on two VI-ReID datasets. △ Less

Submitted 6 August, 2020; originally announced August 2020.

Comments: Accepted by ACM MM 2020 poster. 12 pages, 10 appendixes

arXiv:1908.09300 [pdf, other]

doi 10.1109/CBMI.2019.8877470

A Comparison of CNN and Classic Features for Image Retrieval

Authors: Umut Özaydın, Theodoros Georgiou, Michael Lew

Abstract: Feature detectors and descriptors have been successfully used for various computer vision tasks, such as video object tracking and content-based image retrieval. Many methods use image gradients in different stages of the detection-description pipeline to describe local image structures. Recently, some, or all, of these stages have been replaced by convolutional neural networks (CNNs), in order to… ▽ More Feature detectors and descriptors have been successfully used for various computer vision tasks, such as video object tracking and content-based image retrieval. Many methods use image gradients in different stages of the detection-description pipeline to describe local image structures. Recently, some, or all, of these stages have been replaced by convolutional neural networks (CNNs), in order to increase their performance. A detector is defined as a selection problem, which makes it more challenging to implement as a CNN. They are therefore generally defined as regressors, converting input images to score maps and keypoints can be selected with non-maximum suppression. This paper discusses and compares several recent methods that use CNNs for keypoint detection. Experiments are performed both on the CNN based approaches, as well as a selection of conventional methods. In addition to qualitative measures defined on keypoints and descriptors, the bag-of-words (BoW) model is used to implement an image retrieval application, in order to determine how the methods perform in practice. The results show that each type of features are best in different contexts. △ Less

Submitted 25 August, 2019; originally announced August 2019.

Comments: 5 pages, 3 figures, 3 tables, CBMI 2019

arXiv:1611.05503 [pdf, other]

On the Exploration of Convolutional Fusion Networks for Visual Recognition

Authors: Yu Liu, Yanming Guo, Michael S. Lew

Abstract: Despite recent advances in multi-scale deep representations, their limitations are attributed to expensive parameters and weak fusion modules. Hence, we propose an efficient approach to fuse multi-scale deep representations, called convolutional fusion networks (CFN). Owing to using 1$\times$1 convolution and global average pooling, CFN can efficiently generate the side branches while adding few p… ▽ More Despite recent advances in multi-scale deep representations, their limitations are attributed to expensive parameters and weak fusion modules. Hence, we propose an efficient approach to fuse multi-scale deep representations, called convolutional fusion networks (CFN). Owing to using 1$\times$1 convolution and global average pooling, CFN can efficiently generate the side branches while adding few parameters. In addition, we present a locally-connected fusion module, which can learn adaptive weights for the side branches and form a discriminatively fused feature. CFN models trained on the CIFAR and ImageNet datasets demonstrate remarkable improvements over the plain CNNs. Furthermore, we generalize CFN to three new tasks, including scene recognition, fine-grained recognition and image retrieval. Our experiments show that it can obtain consistent improvements towards the transferring tasks. △ Less

Submitted 16 November, 2016; originally announced November 2016.

Comments: 23rd International Conference on MultiMedia Modeling (MMM 2017)

arXiv:1101.0243 [pdf]

Across Browsers SVG Implementation

Authors: Liang Wang, Nies Huijsmans, Michael S. Lew, Dan Tsymbala

Abstract: In this work SVG will be translated into VML or HTML by using Javascript based on Backbase Client Framework. The target of this project is to implement SVG to be viewed in Internet Explorer without any plug-in and work together with other Backbase Client Framework languages. The result of this project will be added as an extension to the current Backbase Client Framework. In this work SVG will be translated into VML or HTML by using Javascript based on Backbase Client Framework. The target of this project is to implement SVG to be viewed in Internet Explorer without any plug-in and work together with other Backbase Client Framework languages. The result of this project will be added as an extension to the current Backbase Client Framework. △ Less