Search | arXiv e-print repository

Multivariate Prototype Representation for Domain-Generalized Incremental Learning

Authors: Can Peng, Piotr Koniusz, Kaiyu Guo, Brian C. Lovell, Peyman Moghadam

Abstract: Deep learning models suffer from catastrophic forgetting when being fine-tuned with samples of new classes. This issue becomes even more pronounced when faced with the domain shift between training and testing data. In this paper, we study the critical and less explored Domain-Generalized Class-Incremental Learning (DGCIL). We design a DGCIL approach that remembers old classes, adapts to new class… ▽ More Deep learning models suffer from catastrophic forgetting when being fine-tuned with samples of new classes. This issue becomes even more pronounced when faced with the domain shift between training and testing data. In this paper, we study the critical and less explored Domain-Generalized Class-Incremental Learning (DGCIL). We design a DGCIL approach that remembers old classes, adapts to new classes, and can classify reliably objects from unseen domains. Specifically, our loss formulation maintains classification boundaries and suppresses the domain-specific information of each class. With no old exemplars stored, we use knowledge distillation and estimate old class prototype drift as incremental training advances. Our prototype representations are based on multivariate Normal distributions whose means and covariances are constantly adapted to changing model features to represent old classes well by adapting to the feature space drift. For old classes, we sample pseudo-features from the adapted Normal distributions with the help of Cholesky decomposition. In contrast to previous pseudo-feature sampling strategies that rely solely on average mean prototypes, our method excels at capturing varying semantic information. Experiments on several benchmarks validate our claims. △ Less

Submitted 24 September, 2023; originally announced September 2023.

arXiv:2303.01233 [pdf, other]

Domain-aware Triplet loss in Domain Generalization

Authors: Kaiyu Guo, Brian Lovell

Abstract: Despite much progress being made in the field of object recognition with the advances of deep learning, there are still several factors negatively affecting the performance of deep learning models. Domain shift is one of these factors and is caused by discrepancies in the distributions of the testing and training data. In this paper, we focus on the problem of compact feature clustering in domain… ▽ More Despite much progress being made in the field of object recognition with the advances of deep learning, there are still several factors negatively affecting the performance of deep learning models. Domain shift is one of these factors and is caused by discrepancies in the distributions of the testing and training data. In this paper, we focus on the problem of compact feature clustering in domain generalization to help optimize the embedding space from multi-domain data. We design a domainaware triplet loss for domain generalization to help the model to not only cluster similar semantic features, but also to disperse features arising from the domain. Unlike previous methods focusing on distribution alignment, our algorithm is designed to disperse domain information in the embedding space. The basic idea is motivated based on the assumption that embedding features can be clustered based on domain information, which is mathematically and empirically supported in this paper. In addition, during our exploration of feature clustering in domain generalization, we note that factors affecting the convergence of metric learning loss in domain generalization are more important than the pre-defined domains. To solve this issue, we utilize two methods to normalize the embedding space, reducing the internal covariate shift of the embedding features. The ablation study demonstrates the effectiveness of our algorithm. Moreover, the experiments on the benchmark datasets, including PACS, VLCS and Office-Home, show that our method outperforms related methods focusing on domain discrepancy. In particular, our results on RegnetY-16 are significantly better than state-of-the-art methods on the benchmark datasets. Our code will be released at https://github.com/workerbcd/DCT △ Less

Submitted 1 March, 2023; originally announced March 2023.

arXiv:2212.10086 [pdf, other]

End to End Generative Meta Curriculum Learning For Medical Data Augmentation

Authors: Meng Li, Brian Lovell

Abstract: Current medical image synthetic augmentation techniques rely on intensive use of generative adversarial networks (GANs). However, the nature of GAN architecture leads to heavy computational resources to produce synthetic images and the augmentation process requires multiple stages to complete. To address these challenges, we introduce a novel generative meta curriculum learning method that trains… ▽ More Current medical image synthetic augmentation techniques rely on intensive use of generative adversarial networks (GANs). However, the nature of GAN architecture leads to heavy computational resources to produce synthetic images and the augmentation process requires multiple stages to complete. To address these challenges, we introduce a novel generative meta curriculum learning method that trains the task-specific model (student) end-to-end with only one additional teacher model. The teacher learns to generate curriculum to feed into the student model for data augmentation and guides the student to improve performance in a meta-learning style. In contrast to the generator and discriminator in GAN, which compete with each other, the teacher and student collaborate to improve the student's performance on the target tasks. Extensive experiments on the histopathology datasets show that leveraging our framework results in significant and consistent improvements in classification performance. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2212.09977 [pdf, other]

Conditioned Generative Transformers for Histopathology Image Synthetic Augmentation

Authors: Meng Li, Chaoyi Li, Can Peng, Brian Lovell

Abstract: Deep learning networks have demonstrated state-of-the-art performance on medical image analysis tasks. However, the majority of the works rely heavily on abundantly labeled data, which necessitates extensive involvement of domain experts. Vision transformer (ViT) based generative adversarial networks (GANs) recently demonstrated superior potential in general image synthesis, yet are less explored… ▽ More Deep learning networks have demonstrated state-of-the-art performance on medical image analysis tasks. However, the majority of the works rely heavily on abundantly labeled data, which necessitates extensive involvement of domain experts. Vision transformer (ViT) based generative adversarial networks (GANs) recently demonstrated superior potential in general image synthesis, yet are less explored for histopathology images. In this paper, we address these challenges by proposing a pure ViT-based conditional GAN model for histopathology image synthetic augmentation. To alleviate training instability and improve generation robustness, we first introduce a conditioned class projection method to facilitate class separation. We then implement a multi-loss weighing function to dynamically balance the losses between classification tasks. We further propose a selective augmentation mechanism to actively choose the appropriate generated images and bring additional performance improvements. Extensive experiments on the histopathology datasets show that leveraging our synthetic augmentation framework results in significant and consistent improvements in classification performance. △ Less

Submitted 19 December, 2022; originally announced December 2022.

arXiv:2208.00147 [pdf, other]

Few-Shot Class-Incremental Learning from an Open-Set Perspective

Authors: Can Peng, Kun Zhao, Tianren Wang, Meng Li, Brian C. Lovell

Abstract: The continual appearance of new objects in the visual world poses considerable challenges for current deep learning methods in real-world deployments. The challenge of new task learning is often exacerbated by the scarcity of data for the new categories due to rarity or cost. Here we explore the important task of Few-Shot Class-Incremental Learning (FSCIL) and its extreme data scarcity condition o… ▽ More The continual appearance of new objects in the visual world poses considerable challenges for current deep learning methods in real-world deployments. The challenge of new task learning is often exacerbated by the scarcity of data for the new categories due to rarity or cost. Here we explore the important task of Few-Shot Class-Incremental Learning (FSCIL) and its extreme data scarcity condition of one-shot. An ideal FSCIL model needs to perform well on all classes, regardless of their presentation order or paucity of data. It also needs to be robust to open-set real-world conditions and be easily adapted to the new tasks that always arise in the field. In this paper, we first reevaluate the current task setting and propose a more comprehensive and practical setting for the FSCIL task. Then, inspired by the similarity of the goals for FSCIL and modern face recognition systems, we propose our method -- Augmented Angular Loss Incremental Classification or ALICE. In ALICE, instead of the commonly used cross-entropy loss, we propose to use the angular penalty loss to obtain well-clustered features. As the obtained features not only need to be compactly clustered but also diverse enough to maintain generalization for future incremental classes, we further discuss how class augmentation, data augmentation, and data balancing affect classification performance. Experiments on benchmark datasets, including CIFAR100, miniImageNet, and CUB200, demonstrate the improved performance of ALICE over the state-of-the-art FSCIL methods. △ Less

Submitted 30 July, 2022; originally announced August 2022.

Comments: Accepted to ECCV 2022

arXiv:2109.03492 [pdf, other]

FaceCook: Face Generation Based on Linear Scaling Factors

Authors: Tianren Wang, Can Peng, Teng Zhang, Brian Lovell

Abstract: With the excellent disentanglement properties of state-of-the-art generative models, image editing has been the dominant approach to control the attributes of synthesised face images. However, these edited results often suffer from artifacts or incorrect feature rendering, especially when there is a large discrepancy between the image to be edited and the desired feature set. Therefore, we propose… ▽ More With the excellent disentanglement properties of state-of-the-art generative models, image editing has been the dominant approach to control the attributes of synthesised face images. However, these edited results often suffer from artifacts or incorrect feature rendering, especially when there is a large discrepancy between the image to be edited and the desired feature set. Therefore, we propose a new approach to map** the latent vectors of the generative model to the scaling factors through solving a set of multivariate linear equations. The coefficients of the equations are the eigenvectors of the weight parameters of the pre-trained model, which form the basis of a hyper coordinate system. The qualitative and quantitative results both show that the proposed method outperforms the baseline in terms of image diversity. In addition, the method is much more time-efficient because you can obtain synthesised images with desirable features directly from the latent vectors, rather than the former process of editing randomly generated images requiring many processing steps. △ Less

Submitted 8 September, 2021; originally announced September 2021.

arXiv:2108.05627 [pdf, other]

DIODE: Dilatable Incremental Object Detection

Authors: Can Peng, Kun Zhao, Sam Maksoud, Tianren Wang, Brian C. Lovell

Abstract: To accommodate rapid changes in the real world, the cognition system of humans is capable of continually learning concepts. On the contrary, conventional deep learning models lack this capability of preserving previously learned knowledge. When a neural network is fine-tuned to learn new tasks, its performance on previously trained tasks will significantly deteriorate. Many recent works on increme… ▽ More To accommodate rapid changes in the real world, the cognition system of humans is capable of continually learning concepts. On the contrary, conventional deep learning models lack this capability of preserving previously learned knowledge. When a neural network is fine-tuned to learn new tasks, its performance on previously trained tasks will significantly deteriorate. Many recent works on incremental object detection tackle this problem by introducing advanced regularization. Although these methods have shown promising results, the benefits are often short-lived after the first incremental step. Under multi-step incremental learning, the trade-off between old knowledge preserving and new task learning becomes progressively more severe. Thus, the performance of regularization-based incremental object detectors gradually decays for subsequent learning steps. In this paper, we aim to alleviate this performance decay on multi-step incremental detection tasks by proposing a dilatable incremental object detector (DIODE). For the task-shared parameters, our method adaptively penalizes the changes of important weights for previous tasks. At the same time, the structure of the model is dilated or expanded by a limited number of task-specific parameters to promote new task learning. Extensive experiments on PASCAL VOC and COCO datasets demonstrate substantial improvements over the state-of-the-art methods. Notably, compared with the state-of-the-art methods, our method achieves up to 6.0% performance improvement by increasing the number of parameters by just 1.2% for each newly learned task. △ Less

Submitted 12 August, 2021; originally announced August 2021.

arXiv:2104.09005 [pdf, other]

Scalable Bayesian Deep Learning with Kernel Seed Networks

Authors: Sam Maksoud, Kun Zhao, Can Peng, Brian C. Lovell

Abstract: This paper addresses the scalability problem of Bayesian deep neural networks. The performance of deep neural networks is undermined by the fact that these algorithms have poorly calibrated measures of uncertainty. This restricts their application in high risk domains such as computer aided diagnosis and autonomous vehicle navigation. Bayesian Deep Learning (BDL) offers a promising method for repr… ▽ More This paper addresses the scalability problem of Bayesian deep neural networks. The performance of deep neural networks is undermined by the fact that these algorithms have poorly calibrated measures of uncertainty. This restricts their application in high risk domains such as computer aided diagnosis and autonomous vehicle navigation. Bayesian Deep Learning (BDL) offers a promising method for representing uncertainty in neural network. However, BDL requires a separate set of parameters to store the mean and standard deviation of model weights to learn a distribution. This results in a prohibitive 2-fold increase in the number of model parameters. To address this problem we present a method for performing BDL, namely Kernel Seed Networks (KSN), which does not require a 2-fold increase in the number of parameters. KSNs use 1x1 Convolution operations to learn a compressed latent space representation of the parameter distribution. In this paper we show how this allows KSNs to outperform conventional BDL methods while reducing the number of required parameters by up to a factor of 6.6. △ Less

Submitted 18 April, 2021; originally announced April 2021.

Comments: Under review

arXiv:2012.15439 [pdf, other]

SID: Incremental Learning for Anchor-Free Object Detection via Selective and Inter-Related Distillation

Authors: Can Peng, Kun Zhao, Sam Maksoud, Meng Li, Brian C. Lovell

Abstract: Incremental learning requires a model to continually learn new tasks from streaming data. However, traditional fine-tuning of a well-trained deep neural network on a new task will dramatically degrade performance on the old task -- a problem known as catastrophic forgetting. In this paper, we address this issue in the context of anchor-free object detection, which is a new trend in computer vision… ▽ More Incremental learning requires a model to continually learn new tasks from streaming data. However, traditional fine-tuning of a well-trained deep neural network on a new task will dramatically degrade performance on the old task -- a problem known as catastrophic forgetting. In this paper, we address this issue in the context of anchor-free object detection, which is a new trend in computer vision as it is simple, fast, and flexible. Simply adapting current incremental learning strategies fails on these anchor-free detectors due to lack of consideration of their specific model structures. To deal with the challenges of incremental learning on anchor-free object detectors, we propose a novel incremental learning paradigm called Selective and Inter-related Distillation (SID). In addition, a novel evaluation metric is proposed to better assess the performance of detectors under incremental learning conditions. By selective distilling at the proper locations and further transferring additional instance relation knowledge, our method demonstrates significant advantages on the benchmark datasets PASCAL VOC and COCO. △ Less

Submitted 30 December, 2020; originally announced December 2020.

arXiv:2006.07606 [pdf, other]

Faces à la Carte: Text-to-Face Generation via Attribute Disentanglement

Authors: Tianren Wang, Teng Zhang, Brian Lovell

Abstract: Text-to-Face (TTF) synthesis is a challenging task with great potential for diverse computer vision applications. Compared to Text-to-Image (TTI) synthesis tasks, the textual description of faces can be much more complicated and detailed due to the variety of facial attributes and the parsing of high dimensional abstract natural language. In this paper, we propose a Text-to-Face model that not onl… ▽ More Text-to-Face (TTF) synthesis is a challenging task with great potential for diverse computer vision applications. Compared to Text-to-Image (TTI) synthesis tasks, the textual description of faces can be much more complicated and detailed due to the variety of facial attributes and the parsing of high dimensional abstract natural language. In this paper, we propose a Text-to-Face model that not only produces images in high resolution (1024x1024) with text-to-image consistency, but also outputs multiple diverse faces to cover a wide range of unspecified facial features in a natural way. By fine-tuning the multi-label classifier and image encoder, our model obtains the vectors and image embeddings which are used to transform the input noise vector sampled from the normal distribution. Afterwards, the transformed noise vector is fed into a pre-trained high-resolution image generator to produce a set of faces with the desired facial attributes. We refer to our model as TTF-HD. Experimental results show that TTF-HD generates high-quality faces with state-of-the-art performance. △ Less

Submitted 18 September, 2020; v1 submitted 13 June, 2020; originally announced June 2020.

Comments: 8 pages, 4 figures

arXiv:2003.05080 [pdf, other]

doi 10.1109/CVPR42600.2020.00392

SOS: Selective Objective Switch for Rapid Immunofluorescence Whole Slide Image Classification

Authors: Sam Maksoud, Kun Zhao, Peter Hobson, Anthony Jennings, Brian Lovell

Abstract: The difficulty of processing gigapixel whole slide images (WSIs) in clinical microscopy has been a long-standing barrier to implementing computer aided diagnostic systems. Since modern computing resources are unable to perform computations at this extremely large scale, current state of the art methods utilize patch-based processing to preserve the resolution of WSIs. However, these methods are of… ▽ More The difficulty of processing gigapixel whole slide images (WSIs) in clinical microscopy has been a long-standing barrier to implementing computer aided diagnostic systems. Since modern computing resources are unable to perform computations at this extremely large scale, current state of the art methods utilize patch-based processing to preserve the resolution of WSIs. However, these methods are often resource intensive and make significant compromises on processing time. In this paper, we demonstrate that conventional patch-based processing is redundant for certain WSI classification tasks where high resolution is only required in a minority of cases. This reflects what is observed in clinical practice; where a pathologist may screen slides using a low power objective and only switch to a high power in cases where they are uncertain about their findings. To eliminate these redundancies, we propose a method for the selective use of high resolution processing based on the confidence of predictions on downscaled WSIs --- we call this the Selective Objective Switch (SOS). Our method is validated on a novel dataset of 684 Liver-Kidney-Stomach immunofluorescence WSIs routinely used in the investigation of autoimmune liver disease. By limiting high resolution processing to cases which cannot be classified confidently at low resolution, we maintain the accuracy of patch-level analysis whilst reducing the inference time by a factor of 7.74. △ Less

Submitted 10 March, 2020; originally announced March 2020.

Comments: Accepted for publication at CVPR2020

arXiv:2003.03901 [pdf, other]

Faster ILOD: Incremental Learning for Object Detectors based on Faster RCNN

Authors: Can Peng, Kun Zhao, Brian C. Lovell

Abstract: The human vision and perception system is inherently incremental where new knowledge is continually learned over time whilst existing knowledge is retained. On the other hand, deep learning networks are ill-equipped for incremental learning. When a well-trained network is adapted to new categories, its performance on the old categories will dramatically degrade. To address this problem, incrementa… ▽ More The human vision and perception system is inherently incremental where new knowledge is continually learned over time whilst existing knowledge is retained. On the other hand, deep learning networks are ill-equipped for incremental learning. When a well-trained network is adapted to new categories, its performance on the old categories will dramatically degrade. To address this problem, incremental learning methods have been explored which preserve the old knowledge of deep learning models. However, the state-of-the-art incremental object detector employs an external fixed region proposal method that increases overall computation time and reduces accuracy comparing to Region Proposal Network (RPN) based object detectors such as Faster RCNN. The purpose of this paper is to design an efficient end-to-end incremental object detector using knowledge distillation. We first evaluate and analyze the performance of the RPN-based detector with classic distillation on incremental detection tasks. Then, we introduce multi-network adaptive distillation that properly retains knowledge from the old categories when fine-tuning the model for new task. Experiments on the benchmark datasets, PASCAL VOC and COCO, demonstrate that the proposed incremental detector based on Faster RCNN is more accurate as well as being 13 times faster than the baseline detector. △ Less

Submitted 6 October, 2020; v1 submitted 8 March, 2020; originally announced March 2020.

Comments: Accepted in Pattern Recognition Letters 2020

arXiv:2002.00575 [pdf, other]

Unsupervised Domain Adaptive Object Detection using Forward-Backward Cyclic Adaptation

Authors: Siqi Yang, Lin Wu, Arnold Wiliem, Brian C. Lovell

Abstract: We present a novel approach to perform the unsupervised domain adaptation for object detection through forward-backward cyclic (FBC) training. Recent adversarial training based domain adaptation methods have shown their effectiveness on minimizing domain discrepancy via marginal feature distributions alignment. However, aligning the marginal feature distributions does not guarantee the alignment o… ▽ More We present a novel approach to perform the unsupervised domain adaptation for object detection through forward-backward cyclic (FBC) training. Recent adversarial training based domain adaptation methods have shown their effectiveness on minimizing domain discrepancy via marginal feature distributions alignment. However, aligning the marginal feature distributions does not guarantee the alignment of class conditional distributions. This limitation is more evident when adapting object detectors as the domain discrepancy is larger compared to the image classification task, e.g. various number of objects exist in one image and the majority of content in an image is the background. This motivates us to learn domain invariance for category level semantics via gradient alignment. Intuitively, if the gradients of two domains point in similar directions, then the learning of one domain can improve that of another domain. To achieve gradient alignment, we propose Forward-Backward Cyclic Adaptation, which iteratively computes adaptation from source to target via backward hop** and from target to source via forward passing. In addition, we align low-level features for adapting holistic color/texture via adversarial training. However, the detector performs well on both domains is not ideal for target domain. As such, in each cycle, domain diversity is enforced by maximum entropy regularization on the source domain to penalize confident source-specific learning and minimum entropy regularization on target domain to intrigue target-specific learning. Theoretical analysis of the training process is provided, and extensive experiments on challenging cross-domain object detection datasets have shown the superiority of our approach over the state-of-the-art. △ Less

Submitted 3 February, 2020; originally announced February 2020.

arXiv:1909.09945 [pdf, other]

To What Extent Does Downsampling, Compression, and Data Scarcity Impact Renal Image Analysis?

Authors: Can Peng, Kun Zhao, Arnold Wiliem, Teng Zhang, Peter Hobson, Anthony Jennings, Brian C. Lovell

Abstract: The condition of the Glomeruli, or filter sacks, in renal Direct Immunofluorescence (DIF) specimens is a critical indicator for diagnosing kidney diseases. A digital pathology system which digitizes a glass histology slide into a Whole Slide Image (WSI) and then automatically detects and zooms in on the glomeruli with a higher magnification objective will be extremely helpful for pathologists. In… ▽ More The condition of the Glomeruli, or filter sacks, in renal Direct Immunofluorescence (DIF) specimens is a critical indicator for diagnosing kidney diseases. A digital pathology system which digitizes a glass histology slide into a Whole Slide Image (WSI) and then automatically detects and zooms in on the glomeruli with a higher magnification objective will be extremely helpful for pathologists. In this paper, using glomerulus detection as the study case, we provide analysis and observations on several important issues to help with the development of Computer Aided Diagnostic (CAD) systems to process WSIs. Large image resolution, large file size, and data scarcity are always challenging to deal with. To this end, we first examine image downsampling rates in terms of their effect on detection accuracy. Second, we examine the impact of image compression. Third, we examine the relationship between the size of the training set and detection accuracy. To understand the above issues, experiments are performed on the state-of-the-art detectors: Faster R-CNN, R-FCN, Mask R-CNN and SSD. Critical findings are observed: (1) The best balance between detection accuracy, detection speed and file size is achieved at 8 times downsampling captured with a $40\times$ objective; (2) compression which reduces the file size dramatically, does not necessarily have an adverse effect on overall accuracy; (3) reducing the amount of training data to some extents causes a drop in precision but has a negligible impact on the recall; (4) in most cases, Faster R-CNN achieves the best accuracy in the glomerulus detection task. We show that the image file size of $40\times$ WSI images can be reduced by a factor of over 6000 with negligible loss of glomerulus detection accuracy. △ Less

Submitted 22 September, 2019; originally announced September 2019.

arXiv:1907.06844 [pdf, other]

Deep inspection: an electrical distribution pole parts study via deep neural networks

Authors: Liangchen Liu, Teng Zhang, Kun Zhao, Arnold Wiliem, Kieren Astin-Walmsley, Brian Lovell

Abstract: Electrical distribution poles are important assets in electricity supply. These poles need to be maintained in good condition to ensure they protect community safety, maintain reliability of supply, and meet legislative obligations. However, maintaining such a large volumes of assets is an expensive and challenging task. To address this, recent approaches utilise imagery data captured from helicop… ▽ More Electrical distribution poles are important assets in electricity supply. These poles need to be maintained in good condition to ensure they protect community safety, maintain reliability of supply, and meet legislative obligations. However, maintaining such a large volumes of assets is an expensive and challenging task. To address this, recent approaches utilise imagery data captured from helicopter and/or drone inspections. Whilst reducing the cost for manual inspection, manual analysis on each image is still required. As such, several image-based automated inspection systems have been proposed. In this paper, we target two major challenges: tiny object detection and extremely imbalanced datasets, which currently hinder the wide deployment of the automatic inspection. We propose a novel two-stage zoom-in detection method to gradually focus on the object of interest. To address the imbalanced dataset problem, we propose the resampling as well as reweighting schemes to iteratively adapt the model to the large intra-class variation of major class and balance the contributions to the loss from each class. Finally, we integrate these components together and devise a novel automatic inspection framework. Extensive experiments demonstrate that our proposed approaches are effective and can boost the performance compared to the baseline methods. △ Less

Submitted 16 July, 2019; originally announced July 2019.

Comments: electrical distribution pole inspection, integrated inspection system, object detection, imbalanced data classification, To appear in Proceeding of ICIP 2019

arXiv:1906.09681 [pdf, ps, other]

Deep Instance-Level Hard Negative Mining Model for Histopathology Images

Authors: Meng Li, Lin Wu, Arnold Wiliem, Kun Zhao, Teng Zhang, Brian C. Lovell

Abstract: Histopathology image analysis can be considered as a Multiple instance learning (MIL) problem, where the whole slide histopathology image (WSI) is regarded as a bag of instances (i.e, patches) and the task is to predict a single class label to the WSI. However, in many real-life applications such as computational pathology, discovering the key instances that trigger the bag label is of great inter… ▽ More Histopathology image analysis can be considered as a Multiple instance learning (MIL) problem, where the whole slide histopathology image (WSI) is regarded as a bag of instances (i.e, patches) and the task is to predict a single class label to the WSI. However, in many real-life applications such as computational pathology, discovering the key instances that trigger the bag label is of great interest because it provides reasons for the decision made by the system. In this paper, we propose a deep convolutional neural network (CNN) model that addresses the primary task of a bag classification on a WSI and also learns to identify the response of each instance to provide interpretable results to the final prediction. We incorporate the attention mechanism into the proposed model to operate the transformation of instances and learn attention weights to allow us to find key patches. To perform a balanced training, we introduce adaptive weighing in each training bag to explicitly adjust the weight distribution in order to concentrate more on the contribution of hard samples. Based on the learned attention weights, we further develop a solution to boost the classification performance by generating the bags with hard negative instances. We conduct extensive experiments on colon and breast cancer histopathology data and show that our framework achieves state-of-the-art performance. △ Less

Submitted 26 June, 2019; v1 submitted 23 June, 2019; originally announced June 2019.

Comments: Accepted by MICCAI 2019

arXiv:1906.09676 [pdf, other]

doi 10.1007/978-3-030-32239-7_48

CORAL8: Concurrent Object Regression for Area Localization in Medical Image Panels

Authors: Sam Maksoud, Arnold Wiliem, Kun Zhao, Teng Zhang, Lin Wu, Brian C. Lovell

Abstract: This work tackles the problem of generating a medical report for multi-image panels. We apply our solution to the Renal Direct Immunofluorescence (RDIF) assay which requires a pathologist to generate a report based on observations across the eight different WSI in concert with existing clinical features. To this end, we propose a novel attention-based multi-modal generative recurrent neural networ… ▽ More This work tackles the problem of generating a medical report for multi-image panels. We apply our solution to the Renal Direct Immunofluorescence (RDIF) assay which requires a pathologist to generate a report based on observations across the eight different WSI in concert with existing clinical features. To this end, we propose a novel attention-based multi-modal generative recurrent neural network (RNN) architecture capable of dynamically sampling image data concurrently across the RDIF panel. The proposed methodology incorporates text from the clinical notes of the requesting physician to regulate the output of the network to align with the overall clinical context. In addition, we found the importance of regularizing the attention weights for word generation processes. This is because the system can ignore the attention mechanism by assigning equal weights for all members. Thus, we propose two regularizations which force the system to utilize the attention mechanism. Experiments on our novel collection of RDIF WSIs provided by a large clinical laboratory demonstrate that our framework offers significant improvements over existing methods. △ Less

Submitted 23 June, 2019; originally announced June 2019.

Comments: Accepted for MICCAI 2019

arXiv:1806.05343 [pdf, other]

Convex Class Model on Symmetric Positive Definite Manifolds

Authors: Kun Zhao, Arnold Wiliem, Shaokang Chen, Brian C. Lovell

Abstract: The effectiveness of Symmetric Positive Definite (SPD) manifold features has been proven in various computer vision tasks. However, due to the non-Euclidean geometry of these features, existing Euclidean machineries cannot be directly used. In this paper, we tackle the classification tasks with limited training data on SPD manifolds. Our proposed framework, named Manifold Convex Class Model, repre… ▽ More The effectiveness of Symmetric Positive Definite (SPD) manifold features has been proven in various computer vision tasks. However, due to the non-Euclidean geometry of these features, existing Euclidean machineries cannot be directly used. In this paper, we tackle the classification tasks with limited training data on SPD manifolds. Our proposed framework, named Manifold Convex Class Model, represents each class on SPD manifolds using a convex model, and classification can be performed by computing distances to the convex models. We provide three methods based on different metrics to address the optimization problem of the smallest distance of a point to the convex model on SPD manifold. The efficacy of our proposed framework is demonstrated both on synthetic data and several computer vision tasks including object recognition, texture classification, person re-identification and traffic scene classification. △ Less

Submitted 29 May, 2019; v1 submitted 13 June, 2018; originally announced June 2018.

arXiv:1803.07240 [pdf, other]

SlideNet: Fast and Accurate Slide Quality Assessment Based on Deep Neural Networks

Authors: Teng Zhang, Johanna Carvajal, Daniel F. Smith, Kun Zhao, Arnold Wiliem, Peter Hobson, Anthony Jennings, Brian C. Lovell

Abstract: This work tackles the automatic fine-grained slide quality assessment problem for digitized direct smears test using the Gram staining protocol. Automatic quality assessment can provide useful information for the pathologists and the whole digital pathology workflow. For instance, if the system found a slide to have a low staining quality, it could send a request to the automatic slide preparation… ▽ More This work tackles the automatic fine-grained slide quality assessment problem for digitized direct smears test using the Gram staining protocol. Automatic quality assessment can provide useful information for the pathologists and the whole digital pathology workflow. For instance, if the system found a slide to have a low staining quality, it could send a request to the automatic slide preparation system to remake the slide. If the system detects severe damage in the slides, it could notify the experts that manual microscope reading may be required. In order to address the quality assessment problem, we propose a deep neural network based framework to automatically assess the slide quality in a semantic way. Specifically, the first step of our framework is to perform dense fine-grained region classification on the whole slide and calculate the region distribution histogram. Next, our framework will generate assessments of the slide quality from various perspectives: staining quality, information density, damage level and which regions are more valuable for subsequent high-magnification analysis. To make the information more accessible, we present our results in the form of a heat map and text summaries. Additionally, in order to stimulate research in this direction, we propose a novel dataset for slide quality assessment. Experiments show that the proposed framework outperforms recent related works. △ Less

Submitted 19 March, 2018; originally announced March 2018.

arXiv:1712.08263 [pdf, other]

Using LIP to Gloss Over Faces in Single-Stage Face Detection Networks

Authors: Siqi Yang, Arnold Wiliem, Shaokang Chen, Brian C. Lovell

Abstract: This work shows that it is possible to fool/attack recent state-of-the-art face detectors which are based on the single-stage networks. Successfully attacking face detectors could be a serious malware vulnerability when deploying a smart surveillance system utilizing face detectors. We show that existing adversarial perturbation methods are not effective to perform such an attack, especially when… ▽ More This work shows that it is possible to fool/attack recent state-of-the-art face detectors which are based on the single-stage networks. Successfully attacking face detectors could be a serious malware vulnerability when deploying a smart surveillance system utilizing face detectors. We show that existing adversarial perturbation methods are not effective to perform such an attack, especially when there are multiple faces in the input image. This is because the adversarial perturbation specifically generated for one face may disrupt the adversarial perturbation for another face. In this paper, we call this problem the Instance Perturbation Interference (IPI) problem. This IPI problem is addressed by studying the relationship between the deep neural network receptive field and the adversarial perturbation. As such, we propose the Localized Instance Perturbation (LIP) that uses adversarial perturbation constrained to the Effective Receptive Field (ERF) of a target to perform the attack. Experiment results show the LIP method massively outperforms existing adversarial perturbation generation methods -- often by a factor of 2 to 10. △ Less

Submitted 4 July, 2018; v1 submitted 21 December, 2017; originally announced December 2017.

Comments: to appear ECCV 2018 (accepted version)

arXiv:1712.02514 [pdf, other]

TV-GAN: Generative Adversarial Network Based Thermal to Visible Face Recognition

Authors: Teng Zhang, Arnold Wiliem, Siqi Yang, Brian C. Lovell

Abstract: This work tackles the face recognition task on images captured using thermal camera sensors which can operate in the non-light environment. While it can greatly increase the scope and benefits of the current security surveillance systems, performing such a task using thermal images is a challenging problem compared to face recognition task in the Visible Light Domain (VLD). This is partly due to t… ▽ More This work tackles the face recognition task on images captured using thermal camera sensors which can operate in the non-light environment. While it can greatly increase the scope and benefits of the current security surveillance systems, performing such a task using thermal images is a challenging problem compared to face recognition task in the Visible Light Domain (VLD). This is partly due to the much smaller amount number of thermal imagery data collected compared to the VLD data. Unfortunately, direct application of the existing very strong face recognition models trained using VLD data into the thermal imagery data will not produce a satisfactory performance. This is due to the existence of the domain gap between the thermal and VLD images. To this end, we propose a Thermal-to-Visible Generative Adversarial Network (TV-GAN) that is able to transform thermal face images into their corresponding VLD images whilst maintaining identity information which is sufficient enough for the existing VLD face recognition models to perform recognition. Some examples are presented in Figure 1. Unlike the previous methods, our proposed TV-GAN uses an explicit closed-set face recognition loss to regularize the discriminator network training. This information will then be conveyed into the generator network in the forms of gradient loss. In the experiment, we show that by using this additional explicit regularization for the discriminator network, the TV-GAN is able to preserve more identity information when translating a thermal image of a person which is not seen before by the TV-GAN. △ Less

Submitted 7 December, 2017; originally announced December 2017.

arXiv:1610.04957 [pdf, other]

What is the Best Way for Extracting Meaningful Attributes from Pictures?

Authors: Liangchen Liu, Arnold Wiliem, Shaokang Chen, Brian C. Lovell

Abstract: Automatic attribute discovery methods have gained in popularity to extract sets of visual attributes from images or videos for various tasks. Despite their good performance in some classification tasks, it is difficult to evaluate whether the attributes discovered by these methods are meaningful and which methods are the most appropriate to discover attributes for visual descriptions. In its simpl… ▽ More Automatic attribute discovery methods have gained in popularity to extract sets of visual attributes from images or videos for various tasks. Despite their good performance in some classification tasks, it is difficult to evaluate whether the attributes discovered by these methods are meaningful and which methods are the most appropriate to discover attributes for visual descriptions. In its simplest form, such an evaluation can be performed by manually verifying whether there is any consistent identifiable visual concept distinguishing between positive and negative exemplars labelled by an attribute. This manual checking is tedious, expensive and labour intensive. In addition, comparisons between different methods could also be problematic as it is not clear how one could quantitatively decide which attribute is more meaningful than the others. In this paper, we propose a novel attribute meaningfulness metric to address this challenging problem. With this metric, automatic quantitative evaluation can be performed on the attribute sets; thus, reducing the enormous effort to perform manual evaluation. The proposed metric is applied to some recent automatic attribute discovery and hashing methods on four attribute-labelled datasets. To further validate the efficacy of the proposed method, we conducted a user study. In addition, we also compared our metric with a semi-supervised attribute discover method using the mixture of probabilistic PCA. In our evaluation, we gleaned several insights that could be beneficial in develo** new automatic attribute discovery methods. △ Less

Submitted 16 October, 2016; originally announced October 2016.

Comments: Submission to Pattern Recognition

arXiv:1604.07547 [pdf, other]

doi 10.1109/ICPR.2016.7899781

Towards Miss Universe Automatic Prediction: The Evening Gown Competition

Authors: Johanna Carvajal, Arnold Wiliem, Conrad Sanderson, Brian Lovell

Abstract: Can we predict the winner of Miss Universe after watching how they stride down the catwalk during the evening gown competition? Fashion gurus say they can! In our work, we study this question from the perspective of computer vision. In particular, we want to understand whether existing computer vision approaches can be used to automatically extract the qualities exhibited by the Miss Universe winn… ▽ More Can we predict the winner of Miss Universe after watching how they stride down the catwalk during the evening gown competition? Fashion gurus say they can! In our work, we study this question from the perspective of computer vision. In particular, we want to understand whether existing computer vision approaches can be used to automatically extract the qualities exhibited by the Miss Universe winners during their catwalk. This study can pave the way towards new vision-based applications for the fashion industry. To this end, we propose a novel video dataset, called the Miss Universe dataset, comprising 10 years of the evening gown competition selected between 1996-2010. We further propose two ranking-related problems: (1) Miss Universe Listwise Ranking and (2) Miss Universe Pairwise Ranking. In addition, we also develop an approach that simultaneously addresses the two proposed problems. To describe the videos we employ the recently proposed Stacked Fisher Vectors in conjunction with robust local spatio-temporal features. From our evaluation we found that although the addressed problems are extremely challenging, the proposed system is able to rank the winner in the top 3 best predicted scores for 5 out of 10 Miss Universe competitions. △ Less

Submitted 11 September, 2016; v1 submitted 26 April, 2016; originally announced April 2016.

MSC Class: 68T45 ACM Class: I.4; I.4.8; I.4.9; I.5.4

Journal ref: International Conference on Pattern Recognition, 2016

arXiv:1602.06539 [pdf, other]

Determining the best attributes for surveillance video keywords generation

Authors: Liangchen Liu, Arnold Wiliem, Shaokang Chen, Kun Zhao, Brian C. Lovell

Abstract: Automatic video keyword generation is one of the key ingredients in reducing the burden of security officers in analyzing surveillance videos. Keywords or attributes are generally chosen manually based on expert knowledge of surveillance. Most existing works primarily aim at either supervised learning approaches relying on extensive manual labelling or hierarchical probabilistic models that assume… ▽ More Automatic video keyword generation is one of the key ingredients in reducing the burden of security officers in analyzing surveillance videos. Keywords or attributes are generally chosen manually based on expert knowledge of surveillance. Most existing works primarily aim at either supervised learning approaches relying on extensive manual labelling or hierarchical probabilistic models that assume the features are extracted using the bag-of-words approach; thus limiting the utilization of the other features. To address this, we turn our attention to automatic attribute discovery approaches. However, it is not clear which automatic discovery approach can discover the most meaningful attributes. Furthermore, little research has been done on how to compare and choose the best automatic attribute discovery methods. In this paper, we propose a novel approach, based on the shared structure exhibited amongst meaningful attributes, that enables us to compare between different automatic attribute discovery approaches.We then validate our approach by comparing various attribute discovery methods such as PiCoDeS on two attribute datasets. The evaluation shows that our approach is able to select the automatic discovery approach that discovers the most meaningful attributes. We then employ the best discovery approach to generate keywords for videos recorded from a surveillance system. This work shows it is possible to massively reduce the amount of manual work in generating video keywords without limiting ourselves to a particular video feature descriptor. △ Less

Submitted 21 February, 2016; originally announced February 2016.

Comments: 7 pages, ISBA 2016. arXiv admin note: text overlap with arXiv:1602.01940

arXiv:1602.01940 [pdf, other]

Automatic and Quantitative evaluation of attribute discovery methods

Authors: Liangchen Liu, Arnold Wiliem, Shaokang Chen, Brian C. Lovell

Abstract: Many automatic attribute discovery methods have been developed to extract a set of visual attributes from images for various tasks. However, despite good performance in some image classification tasks, it is difficult to evaluate whether these methods discover meaningful attributes and which one is the best to find the attributes for image descriptions. An intuitive way to evaluate this is to manu… ▽ More Many automatic attribute discovery methods have been developed to extract a set of visual attributes from images for various tasks. However, despite good performance in some image classification tasks, it is difficult to evaluate whether these methods discover meaningful attributes and which one is the best to find the attributes for image descriptions. An intuitive way to evaluate this is to manually verify whether consistent identifiable visual concepts exist to distinguish between positive and negative images of an attribute. This manual checking is tedious, labor intensive and expensive and it is very hard to get quantitative comparisons between different methods. In this work, we tackle this problem by proposing an attribute meaningfulness metric, that can perform automatic evaluation on the meaningfulness of attribute sets as well as achieving quantitative comparisons. We apply our proposed metric to recent automatic attribute discovery methods and popular hashing methods on three attribute datasets. A user study is also conducted to validate the effectiveness of the metric. In our evaluation, we gleaned some insights that could be beneficial in develo** automatic attribute discovery methods to generate meaningful attributes. To the best of our knowledge, this is the first work to quantitatively measure the semantic content of automatically discovered attributes. △ Less

Submitted 5 February, 2016; originally announced February 2016.

Comments: 9 pages, WACV 2016

arXiv:1602.01601 [pdf, other]

doi 10.1007/978-3-319-42996-0_10

Joint Recognition and Segmentation of Actions via Probabilistic Integration of Spatio-Temporal Fisher Vectors

Authors: Johanna Carvajal, Chris McCool, Brian Lovell, Conrad Sanderson

Abstract: We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation. A given video (containing several consecutive actions) is processed via a sequence of overlap** temporal windows. Each frame in a temporal window is represented through selective low-level spatio-temporal features which efficiently capture relevant local dynamics. Features from eac… ▽ More We propose a hierarchical approach to multi-action recognition that performs joint classification and segmentation. A given video (containing several consecutive actions) is processed via a sequence of overlap** temporal windows. Each frame in a temporal window is represented through selective low-level spatio-temporal features which efficiently capture relevant local dynamics. Features from each window are represented as a Fisher vector, which captures first and second order statistics. Instead of directly classifying each Fisher vector, it is converted into a vector of class probabilities. The final classification decision for each frame is then obtained by integrating the class probabilities at the frame level, which exploits the overlap** of the temporal windows. Experiments were performed on two datasets: s-KTH (a stitched version of the KTH dataset to simulate multi-actions), and the challenging CMU-MMAC dataset. On s-KTH, the proposed approach achieves an accuracy of 85.0%, significantly outperforming two recent approaches based on GMMs and HMMs which obtained 78.3% and 71.2%, respectively. On CMU-MMAC, the proposed approach achieves an accuracy of 40.9%, outperforming the GMM and HMM approaches which obtained 33.7% and 38.4%, respectively. Furthermore, the proposed system is on average 40 times faster than the GMM based approach. △ Less

Submitted 4 October, 2016; v1 submitted 4 February, 2016; originally announced February 2016.

ACM Class: I.2.10; I.4; I.5; I.5.4

Journal ref: Lecture Notes in Computer Science (LNCS), Vol. 9794, pp. 115-127, 2016

arXiv:1602.01599 [pdf, other]

doi 10.1007/978-3-319-42996-0_8

Comparative Evaluation of Action Recognition Methods via Riemannian Manifolds, Fisher Vectors and GMMs: Ideal and Challenging Conditions

Authors: Johanna Carvajal, Arnold Wiliem, Chris McCool, Brian Lovell, Conrad Sanderson

Abstract: We present a comparative evaluation of various techniques for action recognition while kee** as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both categories we use their corresponding nearest neighbour classifiers, kernels, and recent kernelised sparse representations. We compare against tr… ▽ More We present a comparative evaluation of various techniques for action recognition while kee** as many variables as possible controlled. We employ two categories of Riemannian manifolds: symmetric positive definite matrices and linear subspaces. For both categories we use their corresponding nearest neighbour classifiers, kernels, and recent kernelised sparse representations. We compare against traditional action recognition techniques based on Gaussian mixture models and Fisher vectors (FVs). We evaluate these action recognition techniques under ideal conditions, as well as their sensitivity in more challenging conditions (variations in scale and translation). Despite recent advancements for handling manifolds, manifold based techniques obtain the lowest performance and their kernel representations are more unstable in the presence of challenging conditions. The FV approach obtains the highest accuracy under ideal conditions. Moreover, FV best deals with moderate scale and translation changes. △ Less

Submitted 4 October, 2016; v1 submitted 4 February, 2016; originally announced February 2016.

ACM Class: I.4; I.5; I.5.4

Journal ref: Lecture Notes in Computer Science (LNCS), Vol. 9794, pp. 88-100, 2016

arXiv:1509.05536 [pdf, other]

Efficient Clustering on Riemannian Manifolds: A Kernelised Random Projection Approach

Authors: Kun Zhao, Azadeh Alavi, Arnold Wiliem, Brian C. Lovell

Abstract: Reformulating computer vision problems over Riemannian manifolds has demonstrated superior performance in various computer vision applications. This is because visual data often forms a special structure lying on a lower dimensional space embedded in a higher dimensional space. However, since these manifolds belong to non-Euclidean topological spaces, exploiting their structures is computationally… ▽ More Reformulating computer vision problems over Riemannian manifolds has demonstrated superior performance in various computer vision applications. This is because visual data often forms a special structure lying on a lower dimensional space embedded in a higher dimensional space. However, since these manifolds belong to non-Euclidean topological spaces, exploiting their structures is computationally expensive, especially when one considers the clustering analysis of massive amounts of data. To this end, we propose an efficient framework to address the clustering problem on Riemannian manifolds. This framework implements random projections for manifold points via kernel space, which can preserve the geometric structure of the original space, but is computationally efficient. Here, we introduce three methods that follow our framework. We then validate our framework on several computer vision applications by comparing against popular clustering methods on Riemannian manifolds. Experimental results demonstrate that our framework maintains the performance of the clustering whilst massively reducing computational complexity by over two orders of magnitude in some cases. △ Less

Submitted 18 September, 2015; originally announced September 2015.

arXiv:1502.01782 [pdf, other]

doi 10.1145/2689746.2689748

Multi-Action Recognition via Stochastic Modelling of Optical Flow and Gradients

Authors: Johanna Carvajal, Conrad Sanderson, Chris McCool, Brian C. Lovell

Abstract: In this paper we propose a novel approach to multi-action recognition that performs joint segmentation and classification. This approach models each action using a Gaussian mixture using robust low-dimensional action features. Segmentation is achieved by performing classification on overlap** temporal windows, which are then merged to produce the final result. This approach is considerably less… ▽ More In this paper we propose a novel approach to multi-action recognition that performs joint segmentation and classification. This approach models each action using a Gaussian mixture using robust low-dimensional action features. Segmentation is achieved by performing classification on overlap** temporal windows, which are then merged to produce the final result. This approach is considerably less complicated than previous methods which use dynamic programming or computationally expensive hidden Markov models (HMMs). Initial experiments on a stitched version of the KTH dataset show that the proposed approach achieves an accuracy of 78.3%, outperforming a recent HMM-based approach which obtained 71.2%. △ Less

Submitted 5 February, 2015; originally announced February 2015.

ACM Class: I.4.6; I.4.7; I.4.8; I.5.1; I.5.4

Journal ref: Workshop on Machine Learning for Sensory Data Analysis (MLSDA), pp. 19-24, 2014

arXiv:1409.0083 [pdf, other]

Sparse Coding on Symmetric Positive Definite Manifolds using Bregman Divergences

Authors: Mehrtash Harandi, Richard Hartley, Brian Lovell, Conrad Sanderson

Abstract: This paper introduces sparse coding and dictionary learning for Symmetric Positive Definite (SPD) matrices, which are often used in machine learning, computer vision and related areas. Unlike traditional sparse coding schemes that work in vector spaces, in this paper we discuss how SPD matrices can be described by sparse combination of dictionary atoms, where the atoms are also SPD matrices. We pr… ▽ More This paper introduces sparse coding and dictionary learning for Symmetric Positive Definite (SPD) matrices, which are often used in machine learning, computer vision and related areas. Unlike traditional sparse coding schemes that work in vector spaces, in this paper we discuss how SPD matrices can be described by sparse combination of dictionary atoms, where the atoms are also SPD matrices. We propose to seek sparse coding by embedding the space of SPD matrices into Hilbert spaces through two types of Bregman matrix divergences. This not only leads to an efficient way of performing sparse coding, but also an online and iterative scheme for dictionary learning. We apply the proposed methods to several computer vision tasks where images are represented by region covariance matrices. Our proposed algorithms outperform state-of-the-art methods on a wide range of classification tasks, including face recognition, action recognition, material classification and texture categorization. △ Less

Submitted 30 August, 2014; originally announced September 2014.

arXiv:1407.7330 [pdf, other]

doi 10.1109/WACV.2014.6836071

Discovering Discriminative Cell Attributes for HEp-2 Specimen Image Classification

Authors: Arnold Wiliem, Peter Hobson, Brian C. Lovell

Abstract: Recently, there has been a growing interest in develo** Computer Aided Diagnostic (CAD) systems for improving the reliability and consistency of pathology test results. This paper describes a novel CAD system for the Anti-Nuclear Antibody (ANA) test via Indirect Immunofluorescence protocol on Human Epithelial Type 2 (HEp-2) cells. While prior works have primarily focused on classifying cell imag… ▽ More Recently, there has been a growing interest in develo** Computer Aided Diagnostic (CAD) systems for improving the reliability and consistency of pathology test results. This paper describes a novel CAD system for the Anti-Nuclear Antibody (ANA) test via Indirect Immunofluorescence protocol on Human Epithelial Type 2 (HEp-2) cells. While prior works have primarily focused on classifying cell images extracted from ANA specimen images, this work takes a further step by focussing on the specimen image classification problem itself. Our system is able to efficiently classify specimen images as well as producing meaningful descriptions of ANA pattern class which helps physicians to understand the differences between various ANA patterns. We achieve this goal by designing a specimen-level image descriptor that: (1) is highly discriminative; (2) has small descriptor length and (3) is semantically meaningful at the cell level. In our work, a specimen image descriptor is represented by its overall cell attribute descriptors. As such, we propose two max-margin based learning schemes to discover cell attributes whilst still maintaining the discrimination of the specimen image descriptor. Our learning schemes differ from the existing discriminative attribute learning approaches as they primarily focus on discovering image-level attributes. Comparative evaluations were undertaken to contrast the proposed approach to various state-of-the-art approaches on a novel HEp-2 cell dataset which was specifically proposed for the specimen-level classification. Finally, we showcase the ability of the proposed approach to provide textual descriptions to explain ANA patterns. △ Less

Submitted 28 July, 2014; originally announced July 2014.

Comments: WACV 2014: IEEE Winter Conference on Applications of Computer Vision

arXiv:1406.5095 [pdf, other]

doi 10.1007/978-3-642-19318-7_43

MRF-based Background Initialisation for Improved Foreground Detection in Cluttered Surveillance Videos

Authors: Vikas Reddy, Conrad Sanderson, Andres Sanin, Brian C. Lovell

Abstract: Robust foreground object segmentation via background modelling is a difficult problem in cluttered environments, where obtaining a clear view of the background to model is almost impossible. In this paper, we propose a method capable of robustly estimating the background and detecting regions of interest in such environments. In particular, we propose to extend the background initialisation compon… ▽ More Robust foreground object segmentation via background modelling is a difficult problem in cluttered environments, where obtaining a clear view of the background to model is almost impossible. In this paper, we propose a method capable of robustly estimating the background and detecting regions of interest in such environments. In particular, we propose to extend the background initialisation component of a recent patch-based foreground detection algorithm with an elaborate technique based on Markov Random Fields, where the optimal labelling solution is computed using iterated conditional modes. Rather than relying purely on local temporal statistics, the proposed technique takes into account the spatial continuity of the entire background. Experiments with several tracking algorithms on the CAVIAR dataset indicate that the proposed method leads to considerable improvements in object tracking accuracy, when compared to methods based on Gaussian mixture models and feature histograms. △ Less

Submitted 19 June, 2014; originally announced June 2014.

Comments: arXiv admin note: substantial text overlap with arXiv:1303.2465

ACM Class: I.5.4; I.4.5; I.4.6

arXiv:1403.3780 [pdf, other]

doi 10.1016/j.patcog.2013.10.014

Automatic Classification of Human Epithelial Type 2 Cell Indirect Immunofluorescence Images using Cell Pyramid Matching

Authors: Arnold Wiliem, Conrad Sanderson, Yongkang Wong, Peter Hobson, Rodney F. Minchin, Brian C. Lovell

Abstract: This paper describes a novel system for automatic classification of images obtained from Anti-Nuclear Antibody (ANA) pathology tests on Human Epithelial type 2 (HEp-2) cells using the Indirect Immunofluorescence (IIF) protocol. The IIF protocol on HEp-2 cells has been the hallmark method to identify the presence of ANAs, due to its high sensitivity and the large range of antigens that can be detec… ▽ More This paper describes a novel system for automatic classification of images obtained from Anti-Nuclear Antibody (ANA) pathology tests on Human Epithelial type 2 (HEp-2) cells using the Indirect Immunofluorescence (IIF) protocol. The IIF protocol on HEp-2 cells has been the hallmark method to identify the presence of ANAs, due to its high sensitivity and the large range of antigens that can be detected. However, it suffers from numerous shortcomings, such as being subjective as well as time and labour intensive. Computer Aided Diagnostic (CAD) systems have been developed to address these problems, which automatically classify a HEp-2 cell image into one of its known patterns (eg. speckled, homogeneous). Most of the existing CAD systems use handpicked features to represent a HEp-2 cell image, which may only work in limited scenarios. We propose a novel automatic cell image classification method termed Cell Pyramid Matching (CPM), which is comprised of regional histograms of visual words coupled with the Multiple Kernel Learning framework. We present a study of several variations of generating histograms and show the efficacy of the system on two publicly available datasets: the ICPR HEp-2 cell classification contest dataset and the SNPHEp-2 dataset. △ Less

Submitted 15 March, 2014; originally announced March 2014.

Comments: arXiv admin note: substantial text overlap with arXiv:1304.1262

ACM Class: J.3; I.4.7; I.4.9; I.5.1; I.5.4; G.3

Journal ref: Pattern Recognition, Vol. 47, No. 7, pp. 2315-2324, 2014

arXiv:1403.1056 [pdf, other]

doi 10.1109/ICIP.2012.6466899

K-Tangent Spaces on Riemannian Manifolds for Improved Pedestrian Detection

Authors: Andres Sanin, Conrad Sanderson, Mehrtash T. Harandi, Brian C. Lovell

Abstract: For covariance-based image descriptors, taking into account the curvature of the corresponding feature space has been shown to improve discrimination performance. This is often done through representing the descriptors as points on Riemannian manifolds, with the discrimination accomplished on a tangent space. However, such treatment is restrictive as distances between arbitrary points on the tange… ▽ More For covariance-based image descriptors, taking into account the curvature of the corresponding feature space has been shown to improve discrimination performance. This is often done through representing the descriptors as points on Riemannian manifolds, with the discrimination accomplished on a tangent space. However, such treatment is restrictive as distances between arbitrary points on the tangent space do not represent true geodesic distances, and hence do not represent the manifold structure accurately. In this paper we propose a general discriminative model based on the combination of several tangent spaces, in order to preserve more details of the structure. The model can be used as a weak learner in a boosting-based pedestrian detection framework. Experiments on the challenging INRIA and DaimlerChrysler datasets show that the proposed model leads to considerably higher performance than methods based on histograms of oriented gradients as well as previous Riemannian-based techniques. △ Less

Submitted 5 March, 2014; originally announced March 2014.

Comments: IEEE International Conference on Image Processing (ICIP), 2012

ACM Class: I.4.7; I.4.10; I.5.1; I.5.4

arXiv:1403.0700 [pdf, other]

doi 10.1109/WACV.2014.6836085

Random Projections on Manifolds of Symmetric Positive Definite Matrices for Image Classification

Authors: Azadeh Alavi, Arnold Wiliem, Kun Zhao, Brian C. Lovell, Conrad Sanderson

Abstract: Recent advances suggest that encoding images through Symmetric Positive Definite (SPD) matrices and then interpreting such matrices as points on Riemannian manifolds can lead to increased classification performance. Taking into account manifold geometry is typically done via (1) embedding the manifolds in tangent spaces, or (2) embedding into Reproducing Kernel Hilbert Spaces (RKHS). While embeddi… ▽ More Recent advances suggest that encoding images through Symmetric Positive Definite (SPD) matrices and then interpreting such matrices as points on Riemannian manifolds can lead to increased classification performance. Taking into account manifold geometry is typically done via (1) embedding the manifolds in tangent spaces, or (2) embedding into Reproducing Kernel Hilbert Spaces (RKHS). While embedding into tangent spaces allows the use of existing Euclidean-based learning algorithms, manifold shape is only approximated which can cause loss of discriminatory information. The RKHS approach retains more of the manifold structure, but may require non-trivial effort to kernelise Euclidean-based learning algorithms. In contrast to the above approaches, in this paper we offer a novel solution that allows SPD matrices to be used with unmodified Euclidean-based learning algorithms, with the true manifold shape well-preserved. Specifically, we propose to project SPD matrices using a set of random projection hyperplanes over RKHS into a random projection space, which leads to representing each matrix as a vector of projection coefficients. Experiments on face recognition, person re-identification and texture classification show that the proposed approach outperforms several recent methods, such as Tensor Sparse Coding, Histogram Plus Epitome, Riemannian Locality Preserving Projection and Relational Divergence Classification. △ Less

Submitted 4 March, 2014; originally announced March 2014.

Comments: IEEE Winter Conference on Applications of Computer Vision (WACV), 2014

ACM Class: I.4.7; I.4.10; I.5.1; I.5.4

arXiv:1403.0320 [pdf, other]

doi 10.1109/WACV.2014.6835985

Matching Image Sets via Adaptive Multi Convex Hull

Authors: Shaokang Chen, Arnold Wiliem, Conrad Sanderson, Brian C. Lovell

Abstract: Traditional nearest points methods use all the samples in an image set to construct a single convex or affine hull model for classification. However, strong artificial features and noisy data may be generated from combinations of training samples when significant intra-class variations and/or noise occur in the image set. Existing multi-model approaches extract local models by clustering each imag… ▽ More Traditional nearest points methods use all the samples in an image set to construct a single convex or affine hull model for classification. However, strong artificial features and noisy data may be generated from combinations of training samples when significant intra-class variations and/or noise occur in the image set. Existing multi-model approaches extract local models by clustering each image set individually only once, with fixed clusters used for matching with various image sets. This may not be optimal for discrimination, as undesirable environmental conditions (eg. illumination and pose variations) may result in the two closest clusters representing different characteristics of an object (eg. frontal face being compared to non-frontal face). To address the above problem, we propose a novel approach to enhance nearest points based methods by integrating affine/convex hull classification with an adapted multi-model approach. We first extract multiple local convex hulls from a query image set via maximum margin clustering to diminish the artificial variations and constrain the noise in local convex hulls. We then propose adaptive reference clustering (ARC) to constrain the clustering of each gallery image set by forcing the clusters to have resemblance to the clusters in the query image set. By applying ARC, noisy clusters in the query set can be discarded. Experiments on Honda, MoBo and ETH-80 datasets show that the proposed method outperforms single model approaches and other recent techniques, such as Sparse Approximated Nearest Points, Mutual Subspace Method and Manifold Discriminant Analysis. △ Less

Submitted 3 March, 2014; originally announced March 2014.

Comments: IEEE Winter Conference on Applications of Computer Vision (WACV), 2014

ACM Class: I.5; I.5.1; I.5.4; G.3

arXiv:1403.0309 [pdf, other]

doi 10.1109/WACV.2014.6836008

Object Tracking via Non-Euclidean Geometry: A Grassmann Approach

Authors: Sareh Shirazi, Mehrtash T. Harandi, Brian C. Lovell, Conrad Sanderson

Abstract: A robust visual tracking system requires an object appearance model that is able to handle occlusion, pose, and illumination variations in the video stream. This can be difficult to accomplish when the model is trained using only a single image. In this paper, we first propose a tracking approach based on affine subspaces (constructed from several images) which are able to accommodate the abovemen… ▽ More A robust visual tracking system requires an object appearance model that is able to handle occlusion, pose, and illumination variations in the video stream. This can be difficult to accomplish when the model is trained using only a single image. In this paper, we first propose a tracking approach based on affine subspaces (constructed from several images) which are able to accommodate the abovementioned variations. We use affine subspaces not only to represent the object, but also the candidate areas that the object may occupy. We furthermore propose a novel approach to measure affine subspace-to-subspace distance via the use of non-Euclidean geometry of Grassmann manifolds. The tracking problem is then considered as an inference task in a Markov Chain Monte Carlo framework via particle filtering. Quantitative evaluation on challenging video sequences indicates that the proposed approach obtains considerably better performance than several recent state-of-the-art methods such as Tracking-Learning-Detection and MILtrack. △ Less

Submitted 2 March, 2014; originally announced March 2014.

Comments: IEEE Winter Conference on Applications of Computer Vision (WACV), 2014

ACM Class: I.2.10; I.4.6; I.4.7; I.4.8; I.5.1; I.5.4; G.3

arXiv:1401.8126 [pdf, other]

Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

Authors: Mehrtash Harandi, Richard Hartley, Chunhua Shen, Brian Lovell, Conrad Sanderson

Abstract: Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning over the space of linea… ▽ More Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric map**. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose closed-form solutions for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into Hilbert spaces. Experiments on several classification tasks (gender recognition, gesture classification, scene analysis, face recognition, action recognition and dynamic texture classification) show that the proposed approaches achieve considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelized Affine Hull Method and graph-embedding Grassmann discriminant analysis. △ Less

Submitted 19 May, 2015; v1 submitted 31 January, 2014; originally announced January 2014.

Comments: Appearing in International Journal of Computer Vision

arXiv:1310.4891 [pdf, other]

Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution

Authors: Mehrtash Harandi, Conrad Sanderson, Chunhua Shen, Brian C. Lovell

Abstract: Recent advances in computer vision and machine learning suggest that a wide range of problems can be addressed more appropriately by considering non-Euclidean geometry. In this paper we explore sparse dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric… ▽ More Recent advances in computer vision and machine learning suggest that a wide range of problems can be addressed more appropriately by considering non-Euclidean geometry. In this paper we explore sparse dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric map**, which enables us to devise a closed-form solution for updating a Grassmann dictionary, atom by atom. Furthermore, to handle non-linearity in data, we propose a kernelised version of the dictionary learning algorithm. Experiments on several classification tasks (face recognition, action recognition, dynamic texture classification) show that the proposed approach achieves considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelised Affine Hull Method and graph-embedding Grassmann discriminant analysis. △ Less

Submitted 17 October, 2013; originally announced October 2013.

Comments: 9 pages. Appearing in Int. Conf. Computer Vision, 2013, Australia

arXiv:1304.4344 [pdf, other]

doi 10.1007/978-3-642-33709-3_16

Sparse Coding and Dictionary Learning for Symmetric Positive Definite Matrices: A Kernel Approach

Authors: Mehrtash T. Harandi, Conrad Sanderson, Richard Hartley, Brian C. Lovell

Abstract: Recent advances suggest that a wide range of computer vision problems can be addressed more appropriately by considering non-Euclidean geometry. This paper tackles the problem of sparse coding and dictionary learning in the space of symmetric positive definite matrices, which form a Riemannian manifold. With the aid of the recently introduced Stein kernel (related to a symmetric version of Bregman… ▽ More Recent advances suggest that a wide range of computer vision problems can be addressed more appropriately by considering non-Euclidean geometry. This paper tackles the problem of sparse coding and dictionary learning in the space of symmetric positive definite matrices, which form a Riemannian manifold. With the aid of the recently introduced Stein kernel (related to a symmetric version of Bregman matrix divergence), we propose to perform sparse coding by embedding Riemannian manifolds into reproducing kernel Hilbert spaces. This leads to a convex and kernel version of the Lasso problem, which can be solved efficiently. We furthermore propose an algorithm for learning a Riemannian dictionary (used for sparse coding), closely tied to the Stein kernel. Experiments on several classification tasks (face recognition, texture classification, person re-identification) show that the proposed sparse coding approach achieves notable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as tensor sparse coding, Riemannian locality preserving projection, and symmetry-driven accumulation of local features. △ Less

Submitted 16 April, 2013; originally announced April 2013.

ACM Class: I.2.6; I.5.1; I.5.4; I.2.10

Journal ref: European Conference on Computer Vision, Lecture Notes in Computer Science (LNCS), Vol. 7573, pp. 216-229, 2012

arXiv:1304.2133 [pdf, other]

doi 10.1109/ICPR.2010.299

Dynamic Amelioration of Resolution Mismatches for Local Feature Based Identity Inference

Authors: Yongkang Wong, Conrad Sanderson, Sandra Mau, Brian C. Lovell

Abstract: While existing face recognition systems based on local features are robust to issues such as misalignment, they can exhibit accuracy degradation when comparing images of differing resolutions. This is common in surveillance environments where a gallery of high resolution mugshots is compared to low resolution CCTV probe images, or where the size of a given image is not a reliable indicator of the… ▽ More While existing face recognition systems based on local features are robust to issues such as misalignment, they can exhibit accuracy degradation when comparing images of differing resolutions. This is common in surveillance environments where a gallery of high resolution mugshots is compared to low resolution CCTV probe images, or where the size of a given image is not a reliable indicator of the underlying resolution (eg. poor optics). To alleviate this degradation, we propose a compensation framework which dynamically chooses the most appropriate face recognition system for a given pair of image resolutions. This framework applies a novel resolution detection method which does not rely on the size of the input images, but instead exploits the sensitivity of local features to resolution using a probabilistic multi-region histogram approach. Experiments on a resolution-modified version of the "Labeled Faces in the Wild" dataset show that the proposed resolution detector frontend obtains a 99% average accuracy in selecting the most appropriate face recognition system, resulting in higher overall face discrimination accuracy (across several resolutions) compared to the individual baseline face recognition systems. △ Less

Submitted 8 April, 2013; originally announced April 2013.

ACM Class: I.5.4; I.4

Journal ref: International Conference on Pattern Recognition (ICPR), pp. 1200-1203, 2010

arXiv:1304.1262 [pdf, other]

doi 10.1109/WACV.2013.6475005

Classification of Human Epithelial Type 2 Cell Indirect Immunofluoresence Images via Codebook Based Descriptors

Authors: Arnold Wiliem, Yongkang Wong, Conrad Sanderson, Peter Hobson, Shaokang Chen, Brian C. Lovell

Abstract: The Anti-Nuclear Antibody (ANA) clinical pathology test is commonly used to identify the existence of various diseases. A hallmark method for identifying the presence of ANAs is the Indirect Immunofluorescence method on Human Epithelial (HEp-2) cells, due to its high sensitivity and the large range of antigens that can be detected. However, the method suffers from numerous shortcomings, such as be… ▽ More The Anti-Nuclear Antibody (ANA) clinical pathology test is commonly used to identify the existence of various diseases. A hallmark method for identifying the presence of ANAs is the Indirect Immunofluorescence method on Human Epithelial (HEp-2) cells, due to its high sensitivity and the large range of antigens that can be detected. However, the method suffers from numerous shortcomings, such as being subjective as well as time and labour intensive. Computer Aided Diagnostic (CAD) systems have been developed to address these problems, which automatically classify a HEp-2 cell image into one of its known patterns (eg., speckled, homogeneous). Most of the existing CAD systems use handpicked features to represent a HEp-2 cell image, which may only work in limited scenarios. In this paper, we propose a cell classification system comprised of a dual-region codebook-based descriptor, combined with the Nearest Convex Hull Classifier. We evaluate the performance of several variants of the descriptor on two publicly available datasets: ICPR HEp-2 cell classification contest dataset and the new SNPHEp-2 dataset. To our knowledge, this is the first time codebook-based descriptors are applied and studied in this domain. Experiments show that the proposed system has consistent high performance and is more robust than two recent CAD systems. △ Less

Submitted 4 April, 2013; originally announced April 2013.

ACM Class: I.2.10; I.4.6; I.4.7; I.4.10; I.5.4; G.3

Journal ref: IEEE Workshop on Applications of Computer Vision (WACV), pp. 95-102, 2013

arXiv:1304.1233 [pdf, other]

doi 10.1016/j.patcog.2011.10.001

Shadow Detection: A Survey and Comparative Evaluation of Recent Methods

Authors: Andres Sanin, Conrad Sanderson, Brian C. Lovell

Abstract: This paper presents a survey and a comparative evaluation of recent techniques for moving cast shadow detection. We identify shadow removal as a critical step for improving object detection and tracking. The survey covers methods published during the last decade, and places them in a feature-based taxonomy comprised of four categories: chromacity, physical, geometry and textures. A selection of pr… ▽ More This paper presents a survey and a comparative evaluation of recent techniques for moving cast shadow detection. We identify shadow removal as a critical step for improving object detection and tracking. The survey covers methods published during the last decade, and places them in a feature-based taxonomy comprised of four categories: chromacity, physical, geometry and textures. A selection of prominent methods across the categories is compared in terms of quantitative performance measures (shadow detection and discrimination rates, colour desaturation) as well as qualitative observations. Furthermore, we propose the use of tracking performance as an unbiased approach for determining the practical usefulness of shadow detection methods. The evaluation indicates that all shadow detection approaches make different contributions and all have individual strength and weaknesses. Out of the selected methods, the geometry-based technique has strict assumptions and is not generalisable to various environments, but it is a straightforward choice when the objects of interest are easy to model and their shadows have different orientation. The chromacity based method is the fastest to implement and run, but it is sensitive to noise and less effective in low saturated scenes. The physical method improves upon the accuracy of the chromacity method by adapting to local shadow models, but fails when the spectral properties of the objects are similar to that of the background. The small-region texture based method is especially robust for pixels whose neighbourhood is textured, but may take longer to implement and is the most computationally expensive. The large-region texture based method produces the most accurate results, but has a significant computational load due to its multiple processing steps. △ Less

Submitted 3 April, 2013; originally announced April 2013.

ACM Class: I.4.6; I.4.8; I.5.4; I.2.10

Journal ref: Pattern Recognition, Vol. 45, No. 4, pp. 1684-1695, 2012

arXiv:1304.0886 [pdf, other]

doi 10.1109/CVPRW.2011.5981799

Improved Anomaly Detection in Crowded Scenes via Cell-based Analysis of Foreground Speed, Size and Texture

Authors: Vikas Reddy, Conrad Sanderson, Brian C. Lovell

Abstract: A robust and efficient anomaly detection technique is proposed, capable of dealing with crowded scenes where traditional tracking based approaches tend to fail. Initial foreground segmentation of the input frames confines the analysis to foreground objects and effectively ignores irrelevant background dynamics. Input frames are split into non-overlap** cells, followed by extracting features base… ▽ More A robust and efficient anomaly detection technique is proposed, capable of dealing with crowded scenes where traditional tracking based approaches tend to fail. Initial foreground segmentation of the input frames confines the analysis to foreground objects and effectively ignores irrelevant background dynamics. Input frames are split into non-overlap** cells, followed by extracting features based on motion, size and texture from each cell. Each feature type is independently analysed for the presence of an anomaly. Unlike most methods, a refined estimate of object motion is achieved by computing the optical flow of only the foreground pixels. The motion and size features are modelled by an approximated version of kernel density estimation, which is computationally efficient even for large training datasets. Texture features are modelled by an adaptively grown codebook, with the number of entries in the codebook selected in an online fashion. Experiments on the recently published UCSD Anomaly Detection dataset show that the proposed method obtains considerably better results than three recent approaches: MPPCA, social force, and mixture of dynamic textures (MDT). The proposed method is also several orders of magnitude faster than MDT, the next best performing method. △ Less

Submitted 3 April, 2013; originally announced April 2013.

ACM Class: I.2.10; I.4.6; I.4.8; I.5.4

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 55-61, 2011

arXiv:1304.0869 [pdf, other]

doi 10.1109/CVPRW.2011.5981881

Patch-based Probabilistic Image Quality Assessment for Face Selection and Improved Video-based Face Recognition

Authors: Yongkang Wong, Shaokang Chen, Sandra Mau, Conrad Sanderson, Brian C. Lovell

Abstract: In video based face recognition, face images are typically captured over multiple frames in uncontrolled conditions, where head pose, illumination, shadowing, motion blur and focus change over the sequence. Additionally, inaccuracies in face localisation can also introduce scale and alignment variations. Using all face images, including images of poor quality, can actually degrade face recognition… ▽ More In video based face recognition, face images are typically captured over multiple frames in uncontrolled conditions, where head pose, illumination, shadowing, motion blur and focus change over the sequence. Additionally, inaccuracies in face localisation can also introduce scale and alignment variations. Using all face images, including images of poor quality, can actually degrade face recognition performance. While one solution it to use only the "best" subset of images, current face selection techniques are incapable of simultaneously handling all of the abovementioned issues. We propose an efficient patch-based face image quality assessment algorithm which quantifies the similarity of a face image to a probabilistic face model, representing an "ideal" face. Image characteristics that affect recognition are taken into account, including variations in geometric alignment (shift, rotation and scale), sharpness, head pose and cast shadows. Experiments on FERET and PIE datasets show that the proposed algorithm is able to identify images which are simultaneously the most frontal, aligned, sharp and well illuminated. Further experiments on a new video surveillance dataset (termed ChokePoint) show that the proposed method provides better face subsets than existing face selection techniques, leading to significant improvements in recognition accuracy. △ Less

Submitted 14 March, 2014; v1 submitted 3 April, 2013; originally announced April 2013.

ACM Class: I.4.1; I.4.6; I.4.7; I.4.9; I.4.10; I.5.1; I.5.4; G.3

Journal ref: IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 74-81, 2011

arXiv:1303.6361 [pdf, other]

doi 10.1109/IVCNZ.2010.6148860

Video Face Matching using Subset Selection and Clustering of Probabilistic Multi-Region Histograms

Authors: Sandra Mau, Shaokang Chen, Conrad Sanderson, Brian C. Lovell

Abstract: Balancing computational efficiency with recognition accuracy is one of the major challenges in real-world video-based face recognition. A significant design decision for any such system is whether to process and use all possible faces detected over the video frames, or whether to select only a few "best" faces. This paper presents a video face recognition system based on probabilistic Multi-Region… ▽ More Balancing computational efficiency with recognition accuracy is one of the major challenges in real-world video-based face recognition. A significant design decision for any such system is whether to process and use all possible faces detected over the video frames, or whether to select only a few "best" faces. This paper presents a video face recognition system based on probabilistic Multi-Region Histograms to characterise performance trade-offs in: (i) selecting a subset of faces compared to using all faces, and (ii) combining information from all faces via clustering. Three face selection metrics are evaluated for choosing a subset: face detection confidence, random subset, and sequential selection. Experiments on the recently introduced MOBIO dataset indicate that the usage of all faces through clustering always outperformed selecting only a subset of faces. The experiments also show that the face selection metric based on face detection confidence generally provides better recognition performance than random or sequential sampling. Moreover, the optimal number of faces varies drastically across selection metric and subsets of MOBIO. Given the trade-offs between computational effort, recognition accuracy and robustness, it is recommended that face feature clustering would be most advantageous in batch processing (particularly for video-based watchlists), whereas face selection methods should be limited to applications with significant computational restrictions. △ Less

Submitted 25 March, 2013; originally announced March 2013.

ACM Class: I.5.4; I.4.7; I.5.3; G.3

Journal ref: International Conference of Image and Vision Computing New Zealand (IVCNZ), 2010

arXiv:1303.6021 [pdf, other]

doi 10.1109/WACV.2013.6475006

Spatio-Temporal Covariance Descriptors for Action and Gesture Recognition

Authors: Andres Sanin, Conrad Sanderson, Mehrtash T. Harandi, Brian C. Lovell

Abstract: We propose a new action and gesture recognition method based on spatio-temporal covariance descriptors and a weighted Riemannian locality preserving projection approach that takes into account the curved space formed by the descriptors. The weighted projection is then exploited during boosting to create a final multiclass classification algorithm that employs the most useful spatio-temporal region… ▽ More We propose a new action and gesture recognition method based on spatio-temporal covariance descriptors and a weighted Riemannian locality preserving projection approach that takes into account the curved space formed by the descriptors. The weighted projection is then exploited during boosting to create a final multiclass classification algorithm that employs the most useful spatio-temporal regions. We also show how the descriptors can be computed quickly through the use of integral video representations. Experiments on the UCF sport, CK+ facial expression and Cambridge hand gesture datasets indicate superior performance of the proposed method compared to several recent state-of-the-art techniques. The proposed method is robust and does not require additional processing of the videos, such as foreground detection, interest-point detection or tracking. △ Less

Submitted 24 March, 2013; originally announced March 2013.

ACM Class: I.5.1; I.5.4; I.4.7; I.4.8; I.2.10

Journal ref: IEEE Workshop on Applications of Computer Vision, pp. 103-110, 2013

arXiv:1303.4160 [pdf, other]

doi 10.1109/TCSVT.2012.2203199

Improved Foreground Detection via Block-based Classifier Cascade with Probabilistic Decision Integration

Authors: Vikas Reddy, Conrad Sanderson, Brian C. Lovell

Abstract: Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise… ▽ More Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analysed on an overlap** block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc post-processing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed approach obtains on average better results (both qualitatively and quantitatively) than several prominent methods. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed approach leads to considerable improvements in tracking accuracy on the CAVIAR dataset. △ Less

Submitted 18 March, 2013; originally announced March 2013.

ACM Class: I.4.6; I.4.8; G.3; I.5.1; I.5.4

Journal ref: IEEE Transactions on Circuits and Systems for Video Technology, Vol. 23, No. 1, pp. 83-93, 2013

arXiv:1303.2783 [pdf, other]

doi 10.1109/AVSS.2012.23

Combined Learning of Salient Local Descriptors and Distance Metrics for Image Set Face Verification

Authors: Conrad Sanderson, Mehrtash T. Harandi, Yongkang Wong, Brian C. Lovell

Abstract: In contrast to comparing faces via single exemplars, matching sets of face images increases robustness and discrimination performance. Recent image set matching approaches typically measure similarities between subspaces or manifolds, while representing faces in a rigid and holistic manner. Such representations are easily affected by variations in terms of alignment, illumination, pose and express… ▽ More In contrast to comparing faces via single exemplars, matching sets of face images increases robustness and discrimination performance. Recent image set matching approaches typically measure similarities between subspaces or manifolds, while representing faces in a rigid and holistic manner. Such representations are easily affected by variations in terms of alignment, illumination, pose and expression. While local feature based representations are considerably more robust to such variations, they have received little attention within the image set matching area. We propose a novel image set matching technique, comprised of three aspects: (i) robust descriptors of face regions based on local features, partly inspired by the hierarchy in the human visual system, (ii) use of several subspace and exemplar metrics to compare corresponding face regions, (iii) jointly learning which regions are the most discriminative while finding the optimal mixing weights for combining metrics. Face recognition experiments on LFW, PIE and MOBIO face datasets show that the proposed algorithm obtains considerably better performance than several recent state-of-the-art techniques, such as Local Principal Angle and the Kernel Affine Hull Method. △ Less

Submitted 12 March, 2013; originally announced March 2013.

ACM Class: I.5.1; I.5.4; I.2.10

Journal ref: IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS), pp, 294-299, 2012

arXiv:1303.2465 [pdf, other]

doi 10.1155/2011/164956

A Low-Complexity Algorithm for Static Background Estimation from Cluttered Image Sequences in Surveillance Contexts

Authors: Vikas Reddy, Conrad Sanderson, Brian C. Lovell

Abstract: For the purposes of foreground estimation, the true background model is unavailable in many practical circumstances and needs to be estimated from cluttered image sequences. We propose a sequential technique for static background estimation in such conditions, with low computational and memory requirements. Image sequences are analysed on a block-by-block basis. For each block location a represent… ▽ More For the purposes of foreground estimation, the true background model is unavailable in many practical circumstances and needs to be estimated from cluttered image sequences. We propose a sequential technique for static background estimation in such conditions, with low computational and memory requirements. Image sequences are analysed on a block-by-block basis. For each block location a representative set is maintained which contains distinct blocks obtained along its temporal line. The background estimation is carried out in a Markov Random Field framework, where the optimal labelling solution is computed using iterated conditional modes. The clique potentials are computed based on the combined frequency response of the candidate block and its neighbourhood. It is assumed that the most appropriate block results in the smoothest response, indirectly enforcing the spatial continuity of structures within a scene. Experiments on real-life surveillance videos demonstrate that the proposed method obtains considerably better background estimates (both qualitatively and quantitatively) than median filtering and the recently proposed "intervals of stable intensity" method. Further experiments on the Wallflower dataset suggest that the combination of the proposed method with a foreground segmentation algorithm results in improved foreground segmentation. △ Less

Submitted 11 March, 2013; originally announced March 2013.

ACM Class: I.4.5; I.4.8; G.3

Journal ref: EURASIP Journal on Image and Video Processing, 2011

Showing 1–50 of 50 results for author: Lovell, B