Search | arXiv e-print repository

Evaluation of trackers for Pan-Tilt-Zoom Scenarios

Authors: Yucao Tang, Guillaume-Alexandre Bilodeau

Abstract: Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in computer vision for many years. Compared to tracking with a still camera, the images captured with a PTZ camera are highly dynamic in nature because the camera can perform large motion resulting in quickly changing capture conditions. Furthermore, tracking with a PTZ camera involves camera control to position the camera on the… ▽ More Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in computer vision for many years. Compared to tracking with a still camera, the images captured with a PTZ camera are highly dynamic in nature because the camera can perform large motion resulting in quickly changing capture conditions. Furthermore, tracking with a PTZ camera involves camera control to position the camera on the target. For successful tracking and camera control, the tracker must be fast enough, or has to be able to predict accurately the next position of the target. Therefore, standard benchmarks do not allow to assess properly the quality of a tracker for the PTZ scenario. In this work, we use a virtual PTZ framework to evaluate different tracking algorithms and compare their performances. We also extend the framework to add target position prediction for the next frame, accounting for camera motion and processing delays. By doing this, we can assess if predicting can make long-term tracking more robust as it may help slower algorithms for kee** the target in the field of view of the camera. Results confirm that both speed and robustness are required for tracking under the PTZ scenario. △ Less

Submitted 12 November, 2017; originally announced November 2017.

Comments: 6 pages, 2 figures, International Conference on Pattern Recognition and Artificial Intelligence 2018

Journal ref: ICPRAI (2018) 3-9

arXiv:1706.09806 [pdf, other]

Robust Face Tracking using Multiple Appearance Models and Graph Relational Learning

Authors: Tanushri Chakravorty, Guillaume-Alexandre Bilodeau, Eric Granger

Abstract: This paper addresses the problem of appearance matching across different challenges while doing visual face tracking in real-world scenarios. In this paper, FaceTrack is proposed that utilizes multiple appearance models with its long-term and short-term appearance memory for efficient face tracking. It demonstrates robustness to deformation, in-plane and out-of-plane rotation, scale, distractors a… ▽ More This paper addresses the problem of appearance matching across different challenges while doing visual face tracking in real-world scenarios. In this paper, FaceTrack is proposed that utilizes multiple appearance models with its long-term and short-term appearance memory for efficient face tracking. It demonstrates robustness to deformation, in-plane and out-of-plane rotation, scale, distractors and background clutter. It capitalizes on the advantages of the tracking-by-detection, by using a face detector that tackles drastic scale appearance change of a face. The detector also helps to reinitialize FaceTrack during drift. A weighted score-level fusion strategy is proposed to obtain the face tracking output having the highest fusion score by generating candidates around possible face locations. The tracker showcases impressive performance when initiated automatically by outperforming many state-of-the-art trackers, except Struck by a very minute margin: 0.001 in precision and 0.017 in success respectively. △ Less

Submitted 30 August, 2018; v1 submitted 29 June, 2017; originally announced June 2017.

Comments: Paper is under consideration at CVIU

arXiv:1706.00505 [pdf, other]

doi 10.1016/j.jocm.2017.11.003

Discriminative conditional restricted Boltzmann machine for discrete choice and latent variable modelling

Authors: Melvin Wong, Bilal Farooq, Guillaume-Alexandre Bilodeau

Abstract: Conventional methods of estimating latent behaviour generally use attitudinal questions which are subjective and these survey questions may not always be available. We hypothesize that an alternative approach can be used for latent variable estimation through an undirected graphical models. For instance, non-parametric artificial neural networks. In this study, we explore the use of generative non… ▽ More Conventional methods of estimating latent behaviour generally use attitudinal questions which are subjective and these survey questions may not always be available. We hypothesize that an alternative approach can be used for latent variable estimation through an undirected graphical models. For instance, non-parametric artificial neural networks. In this study, we explore the use of generative non-parametric modelling methods to estimate latent variables from prior choice distribution without the conventional use of measurement indicators. A restricted Boltzmann machine is used to represent latent behaviour factors by analyzing the relationship information between the observed choices and explanatory variables. The algorithm is adapted for latent behaviour analysis in discrete choice scenario and we use a graphical approach to evaluate and understand the semantic meaning from estimated parameter vector values. We illustrate our methodology on a financial instrument choice dataset and perform statistical analysis on parameter sensitivity and stability. Our findings show that through non-parametric statistical tests, we can extract useful latent information on the behaviour of latent constructs through machine learning methods and present strong and significant influence on the choice process. Furthermore, our modelling framework shows robustness in input variability through sampling and validation. △ Less

Submitted 1 June, 2017; originally announced June 2017.

arXiv:1702.02012 [pdf, other]

Tracking using Numerous Anchor points

Authors: Tanushri Chakravorty, Guillaume-Alexandre Bilodeau, Eric Granger

Abstract: In this paper, an online adaptive model-free tracker is proposed to track single objects in video sequences to deal with real-world tracking challenges like low-resolution, object deformation, occlusion and motion blur. The novelty lies in the construction of a strong appearance model that captures features from the initialized bounding box and then are assembled into anchor-point features. These… ▽ More In this paper, an online adaptive model-free tracker is proposed to track single objects in video sequences to deal with real-world tracking challenges like low-resolution, object deformation, occlusion and motion blur. The novelty lies in the construction of a strong appearance model that captures features from the initialized bounding box and then are assembled into anchor-point features. These features memorize the global pattern of the object and have an internal star graph-like structure. These features are unique and flexible and helps tracking generic and deformable objects with no limitation on specific objects. In addition, the relevance of each feature is evaluated online using short-term consistency and long-term consistency. These parameters are adapted to retain consistent features that vote for the object location and that deal with outliers for long-term tracking scenarios. Additionally, voting in a Gaussian manner helps in tackling inherent noise of the tracking system and in accurate object localization. Furthermore, the proposed tracker uses pairwise distance measure to cope with scale variations and combines pixel-level binary features and global weighted color features for model update. Finally, experimental results on a visual tracking benchmark dataset are presented to demonstrate the effectiveness and competitiveness of the proposed tracker. △ Less

Submitted 10 December, 2017; v1 submitted 7 February, 2017; originally announced February 2017.

Comments: Revised text version. Accepted for publication in Journal of Machine Vision and Applications, December, 2017

arXiv:1611.02364 [pdf, ps, other]

Multiple Object Tracking with Kernelized Correlation Filters in Urban Mixed Traffic

Authors: Yuebin Yang, Guillaume-Alexandre Bilodeau

Abstract: Recently, the Kernelized Correlation Filters tracker (KCF) achieved competitive performance and robustness in visual object tracking. On the other hand, visual trackers are not typically used in multiple object tracking. In this paper, we investigate how a robust visual tracker like KCF can improve multiple object tracking. Since KCF is a fast tracker, many can be used in parallel and still result… ▽ More Recently, the Kernelized Correlation Filters tracker (KCF) achieved competitive performance and robustness in visual object tracking. On the other hand, visual trackers are not typically used in multiple object tracking. In this paper, we investigate how a robust visual tracker like KCF can improve multiple object tracking. Since KCF is a fast tracker, many can be used in parallel and still result in fast tracking. We build a multiple object tracking system based on KCF and background subtraction. Background subtraction is applied to extract moving objects and get their scale and size in combination with KCF outputs, while KCF is used for data association and to handle fragmentation and occlusion problems. As a result, KCF and background subtraction help each other to take tracking decision at every frame. Sometimes KCF outputs are the most trustworthy (e.g. during occlusion), while in some other case, it is the background subtraction outputs. To validate the effectiveness of our system, the algorithm is demonstrated on four urban video recordings from a standard dataset. Results show that our method is competitive with state-of-the-art trackers even if we use a much simpler data association step. △ Less

Submitted 27 March, 2017; v1 submitted 7 November, 2016; originally announced November 2016.

Comments: Accepted for CRV 2017

arXiv:1610.07238 [pdf, other]

SPiKeS: Superpixel-Keypoints Structure for Robust Visual Tracking

Authors: François-Xavier Derue, Guillaume-Alexandre Bilodeau, Robert Bergevin

Abstract: In visual tracking, part-based trackers are attractive since they are robust against occlusion and deformation. However, a part represented by a rectangular patch does not account for the shape of the target, while a superpixel does thanks to its boundary evidence. Nevertheless, tracking superpixels is difficult due to their lack of discriminative power. Therefore, to enable superpixels to be trac… ▽ More In visual tracking, part-based trackers are attractive since they are robust against occlusion and deformation. However, a part represented by a rectangular patch does not account for the shape of the target, while a superpixel does thanks to its boundary evidence. Nevertheless, tracking superpixels is difficult due to their lack of discriminative power. Therefore, to enable superpixels to be tracked discriminatively as object parts, we propose to enhance them with keypoints. By combining properties of these two features, we build a novel element designated as a Superpixel-Keypoints structure (SPiKeS). Being discriminative, these new object parts can be located efficiently by a simple nearest neighbor matching process. Then, in a tracking process, each match votes for the target's center to give its location. In addition, the interesting properties of our new feature allows the development of an efficient model update for more robust tracking. According to experimental results, our SPiKeS-based tracker proves to be robust in many challenging scenarios by performing favorably against the state-of-the-art. △ Less

Submitted 23 October, 2016; originally announced October 2016.

arXiv:1505.04502 [pdf, other]

Reproducible Evaluation of Pan-Tilt-Zoom Tracking

Authors: Gengjie Chen, Pierre-Luc St-Charles, Wassim Bouachir, Thomas Joeisseint, Guillaume-Alexandre Bilodeau, Robert Bergevin

Abstract: Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in computer vision for many years. However, it is very difficult to assess the progress that has been made on this topic because there is no standard evaluation methodology. The difficulty in evaluating PTZ tracking algorithms arises from their dynamic nature. In contrast to other forms of tracking, PTZ tracking involves both loc… ▽ More Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in computer vision for many years. However, it is very difficult to assess the progress that has been made on this topic because there is no standard evaluation methodology. The difficulty in evaluating PTZ tracking algorithms arises from their dynamic nature. In contrast to other forms of tracking, PTZ tracking involves both locating the target in the image and controlling the motors of the camera to aim it so that the target stays in its field of view. This type of tracking can only be performed online. In this paper, we propose a new evaluation framework based on a virtual PTZ camera. With this framework, tracking scenarios do not change for each experiment and we are able to replicate online PTZ camera control and behavior including camera positioning delays, tracker processing delays, and numerical zoom. We tested our evaluation framework with the Camshift tracker to show its viability and to establish baseline results. △ Less

Submitted 17 May, 2015; originally announced May 2015.

Comments: This is an extended version of the 2015 ICIP paper "Reproducible Evaluation of Pan-Tilt-Zoom Tracking"

arXiv:1403.4232 [pdf, other]

Automatic Image Registration in Infrared-Visible Videos using Polygon Vertices

Authors: Tanushri Chakravorty, Guillaume-Alexandre Bilodeau, Eric Granger

Abstract: In this paper, an automatic method is proposed to perform image registration in visible and infrared pair of video sequences for multiple targets. In multimodal image analysis like image fusion systems, color and IR sensors are placed close to each other and capture a same scene simultaneously, but the videos are not properly aligned by default because of different fields of view, image capturing… ▽ More In this paper, an automatic method is proposed to perform image registration in visible and infrared pair of video sequences for multiple targets. In multimodal image analysis like image fusion systems, color and IR sensors are placed close to each other and capture a same scene simultaneously, but the videos are not properly aligned by default because of different fields of view, image capturing information, working principle and other camera specifications. Because the scenes are usually not planar, alignment needs to be performed continuously by extracting relevant common information. In this paper, we approximate the shape of the targets by polygons and use affine transformation for aligning the two video sequences. After background subtraction, keypoints on the contour of the foreground blobs are detected using DCE (Discrete Curve Evolution)technique. These keypoints are then described by the local shape at each point of the obtained polygon. The keypoints are matched based on the convexity of polygon's vertices and Euclidean distance between them. Only good matches for each local shape polygon in a frame, are kept. To achieve a global affine transformation that maximises the overlap** of infrared and visible foreground pixels, the matched keypoints of each local shape polygon are stored temporally in a buffer for a few number of frames. The matrix is evaluated at each frame using the temporal buffer and the best matrix is selected, based on an overlap** ratio criterion. Our experimental results demonstrate that this method can provide highly accurate registered images and that we outperform a previous related method. △ Less

Submitted 17 March, 2014; originally announced March 2014.

arXiv:1305.3189 [pdf, other]

A Bag of Words Approach for Semantic Segmentation of Monitored Scenes

Authors: Wassim Bouachir, Atousa Torabi, Guillaume-Alexandre Bilodeau, Pascal Blais

Abstract: This paper proposes a semantic segmentation method for outdoor scenes captured by a surveillance camera. Our algorithm classifies each perceptually homogenous region as one of the predefined classes learned from a collection of manually labelled images. The proposed approach combines two different types of information. First, color segmentation is performed to divide the scene into perceptually si… ▽ More This paper proposes a semantic segmentation method for outdoor scenes captured by a surveillance camera. Our algorithm classifies each perceptually homogenous region as one of the predefined classes learned from a collection of manually labelled images. The proposed approach combines two different types of information. First, color segmentation is performed to divide the scene into perceptually similar regions. Then, the second step is based on SIFT keypoints and uses the bag of words representation of the regions for the classification. The prediction is done using a Naïve Bayesian Network as a generative classifier. Compared to existing techniques, our method provides more compact representations of scene contents and the segmentation result is more consistent with human perception due to the combination of the color information with the image keypoints. The experiments conducted on a publicly available data set demonstrate the validity of the proposed method. △ Less

Submitted 14 May, 2013; originally announced May 2013.

Comments: École Polytechnique de Montréal, iWatchLife Inc

arXiv:1204.6326 [pdf, other]

Background subtraction based on Local Shape

Authors: Jean-Philippe Jodoin, Guillaume-Alexandre Bilodeau, Nicolas Saunier

Abstract: We present a novel approach to background subtraction that is based on the local shape of small image regions. In our approach, an image region centered on a pixel is mod-eled using the local self-similarity descriptor. We aim at obtaining a reliable change detection based on local shape change in an image when foreground objects are moving. The method first builds a background model and compares… ▽ More We present a novel approach to background subtraction that is based on the local shape of small image regions. In our approach, an image region centered on a pixel is mod-eled using the local self-similarity descriptor. We aim at obtaining a reliable change detection based on local shape change in an image when foreground objects are moving. The method first builds a background model and compares the local self-similarities between the background model and the subsequent frames to distinguish background and foreground objects. Post-processing is then used to refine the boundaries of moving objects. Results show that this approach is promising as the foregrounds obtained are com-plete, although they often include shadows. △ Less

Submitted 17 May, 2012; v1 submitted 27 April, 2012; originally announced April 2012.

Comments: 4 pages, 5 figures, 3 table

Showing 51–60 of 60 results for author: Bilodeau, G