Search | arXiv e-print repository

BoQ: A Place is Worth a Bag of Learnable Queries

Authors: Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère

Abstract: In visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a set of global queries designed to capture universal place-specific attributes. Unlike existing methods that employ self-attention and gener… ▽ More In visual place recognition, accurately identifying and matching images of locations under varying environmental conditions and viewpoints remains a significant challenge. In this paper, we introduce a new technique, called Bag-of-Queries (BoQ), which learns a set of global queries designed to capture universal place-specific attributes. Unlike existing methods that employ self-attention and generate the queries directly from the input features, BoQ employs distinct learnable global queries, which probe the input features via cross-attention, ensuring consistent information aggregation. In addition, our technique provides an interpretable attention mechanism and integrates with both CNN and Vision Transformer backbones. The performance of BoQ is demonstrated through extensive experiments on 14 large-scale benchmarks. It consistently outperforms current state-of-the-art techniques including NetVLAD, MixVPR and EigenPlaces. Moreover, as a global retrieval technique (one-stage), BoQ surpasses two-stage retrieval methods, such as Patch-NetVLAD, TransVPR and R2Former, all while being orders of magnitude faster and more efficient. The code and model weights are publicly available at https://github.com/amaralibey/Bag-of-Queries. △ Less

Submitted 12 May, 2024; originally announced May 2024.

Comments: Accepted at CVPR 2024

arXiv:2405.00199 [pdf, other]

Field Report on a Wearable and Versatile Solution for Field Acquisition and Exploration

Authors: Olivier Gamache, Jean-Michel Fortin, Matěj Boxan, François Pomerleau, Philippe Giguère

Abstract: This report presents a wearable plug-and-play platform for data acquisition in the field. The platform, extending a waterproof Pelican Case into a 20 kg backpack offers 5.5 hours of power autonomy, while recording data with two cameras, a lidar, an Inertial Measurement Unit (IMU), and a Global Navigation Satellite System (GNSS) receiver. The system only requires a single operator and is readily co… ▽ More This report presents a wearable plug-and-play platform for data acquisition in the field. The platform, extending a waterproof Pelican Case into a 20 kg backpack offers 5.5 hours of power autonomy, while recording data with two cameras, a lidar, an Inertial Measurement Unit (IMU), and a Global Navigation Satellite System (GNSS) receiver. The system only requires a single operator and is readily controlled with a built-in screen and buttons. Due to its small footprint, it offers greater flexibility than large vehicles typically deployed in off-trail environments. We describe the platform's design, detailing the mechanical parts, electrical components, and software stack. We explain the system's limitations, drawing from its extensive deployment spanning over 20 kilometers of trajectories across various seasons, environments, and weather conditions. We derive valuable lessons learned from these deployments and present several possible applications for the system. The possible use cases consider not only academic research but also insights from consultations with our industrial partners. The mechanical design including all CAD files, as well as the software stack, are publicly available at https://github.com/norlab-ulaval/backpack_workspace. △ Less

Submitted 30 April, 2024; originally announced May 2024.

Comments: 5 pages, 6 figures, Accepted for the Workshop on Field Robotics at ICRA2024

arXiv:2403.16877 [pdf, other]

Proprioception Is All You Need: Terrain Classification for Boreal Forests

Authors: Damien LaRocque, William Guimont-Martin, David-Alexandre Duclos, Philippe Giguère, François Pomerleau

Abstract: Recent works in field robotics highlighted the importance of resiliency against different types of terrains. Boreal forests, in particular, are home to many mobility-impeding terrains that should be considered for off-road autonomous navigation. Also, being one of the largest land biomes on Earth, boreal forests are an area where autonomous vehicles are expected to become increasingly common. In t… ▽ More Recent works in field robotics highlighted the importance of resiliency against different types of terrains. Boreal forests, in particular, are home to many mobility-impeding terrains that should be considered for off-road autonomous navigation. Also, being one of the largest land biomes on Earth, boreal forests are an area where autonomous vehicles are expected to become increasingly common. In this paper, we address this issue by introducing BorealTC, a publicly available dataset for proprioceptive-based terrain classification (TC). Recorded with a Husky A200, our dataset contains 116 min of Inertial Measurement Unit (IMU), motor current, and wheel odometry data, focusing on typical boreal forest terrains, notably snow, ice, and silty loam. Combining our dataset with another dataset from the state-of-the-art, we evaluate both a Convolutional Neural Network (CNN) and the novel state space model (SSM)-based Mamba architecture on a TC task. Interestingly, we show that while CNN outperforms Mamba on each separate dataset, Mamba achieves greater accuracy when trained on a combination of both. In addition, we demonstrate that Mamba's learning capacity is greater than a CNN for increasing amounts of data. We show that the combination of two TC datasets yields a latent space that can be interpreted with the properties of the terrains. We also discuss the implications of merging datasets on classification. Our source code and dataset are publicly available online: https://github.com/norlab-ulaval/BorealTC. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Submitted to the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv:2310.07844 [pdf, other]

Saturation-Aware Angular Velocity Estimation: Extending the Robustness of SLAM to Aggressive Motions

Authors: Simon-Pierre Deschênes, Dominic Baril, Matěj Boxan, Johann Laconte, Philippe Giguère, François Pomerleau

Abstract: We propose a novel angular velocity estimation method to increase the robustness of Simultaneous Localization And Map** (SLAM) algorithms against gyroscope saturations induced by aggressive motions. Field robotics expose robots to various hazards, including steep terrains, landslides, and staircases, where substantial accelerations and angular velocities can occur if the robot loses stability an… ▽ More We propose a novel angular velocity estimation method to increase the robustness of Simultaneous Localization And Map** (SLAM) algorithms against gyroscope saturations induced by aggressive motions. Field robotics expose robots to various hazards, including steep terrains, landslides, and staircases, where substantial accelerations and angular velocities can occur if the robot loses stability and tumbles. These extreme motions can saturate sensor measurements, especially gyroscopes, which are the first sensors to become inoperative. While the structural integrity of the robot is at risk, the resilience of the SLAM framework is oftentimes given little consideration. Consequently, even if the robot is physically capable of continuing the mission, its operation will be compromised due to a corrupted representation of the world. Regarding this problem, we propose a way to estimate the angular velocity using accelerometers during extreme rotations caused by tumbling. We show that our method reduces the median localization error by 71.5 % in translation and 65.5 % in rotation and reduces the number of SLAM failures by 73.3 % on the collected data. We also propose the Tumbling-Induced Gyroscope Saturation (TIGS) dataset, which consists of outdoor experiments recording the motion of a lidar subject to angular velocities four times higher than other available datasets. The dataset is available online at https://github.com/norlab-ulaval/Norlab_wiki/wiki/TIGS-Dataset. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 7 pages, 7 figures, submitted to the 2024 IEEE International Conference on Robotics and Automation (ICRA2024), Yokohama, Japan

arXiv:2309.13139 [pdf, other]

Exposing the Unseen: Exposure Time Emulation for Offline Benchmarking of Vision Algorithms

Authors: Olivier Gamache, Jean-Michel Fortin, Matěj Boxan, Maxime Vaidis, François Pomerleau, Philippe Giguère

Abstract: Visual Odometry (VO) is one of the fundamental tasks in computer vision for robotics. However, its performance is deeply affected by High Dynamic Range (HDR) scenes, omnipresent outdoor. While new Automatic-Exposure (AE) approaches to mitigate this have appeared, their comparison in a reproducible manner is problematic. This stems from the fact that the behavior of AE depends on the environment, a… ▽ More Visual Odometry (VO) is one of the fundamental tasks in computer vision for robotics. However, its performance is deeply affected by High Dynamic Range (HDR) scenes, omnipresent outdoor. While new Automatic-Exposure (AE) approaches to mitigate this have appeared, their comparison in a reproducible manner is problematic. This stems from the fact that the behavior of AE depends on the environment, and it affects the image acquisition process. Consequently, AE has traditionally only been benchmarked in an online manner, making the experiments non-reproducible. To solve this, we propose a new methodology based on an emulator that can generate images at any exposure time. It leverages BorealHDR, a unique multi-exposure stereo dataset collected over 10 km, on 55 trajectories with challenging illumination conditions. Moreover, it includes lidar-inertial-based global maps with pose estimation for each image frame as well as Global Navigation Satellite System (GNSS) data, for comparison. We show that using these images acquired at different exposure times, we can emulate realistic images, kee** a Root-Mean-Square Error (RMSE) below 1.78 % compared to ground truth images. To demonstrate the practicality of our approach for offline benchmarking, we compared three state-of-the-art AE algorithms on key elements of Visual Simultaneous Localization And Map** (VSLAM) pipeline, against four baselines. Consequently, reproducible evaluation of AE is now possible, speeding up the development of future approaches. Our code and dataset are available online at this link: https://github.com/norlab-ulaval/BorealHDR △ Less

Submitted 20 March, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: 8 pages, 6 figures, submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv:2309.11935 [pdf, other]

RTS-GT: Robotic Total Stations Ground Truthing dataset

Authors: Maxime Vaidis, Mohsen Hassanzadeh Shahraji, Effie Daum, William Dubois, Philippe Giguère, François Pomerleau

Abstract: Numerous datasets and benchmarks exist to assess and compare Simultaneous Localization and Map** (SLAM) algorithms. Nevertheless, their precision must follow the rate at which SLAM algorithms improved in recent years. Moreover, current datasets fall short of comprehensive data-collection protocol for reproducibility and the evaluation of the precision or accuracy of the recorded trajectories. Wi… ▽ More Numerous datasets and benchmarks exist to assess and compare Simultaneous Localization and Map** (SLAM) algorithms. Nevertheless, their precision must follow the rate at which SLAM algorithms improved in recent years. Moreover, current datasets fall short of comprehensive data-collection protocol for reproducibility and the evaluation of the precision or accuracy of the recorded trajectories. With this objective in mind, we proposed the Robotic Total Stations Ground Truthing dataset (RTS-GT) dataset to support localization research with the generation of six-Degrees Of Freedom (DOF) ground truth trajectories. This novel dataset includes six-DOF ground truth trajectories generated using a system of three Robotic Total Stations (RTSs) tracking moving robotic platforms. Furthermore, we compare the performance of the RTS-based system to a Global Navigation Satellite System (GNSS)-based setup. The dataset comprises around sixty experiments conducted in various conditions over a period of 17 months, and encompasses over 49 kilometers of trajectories, making it the most extensive dataset of RTS-based measurements to date. Additionally, we provide the precision of all poses for each experiment, a feature not found in the current state-of-the-art datasets. Our results demonstrate that RTSs provide measurements that are 22 times more stable than GNSS in various environmental settings, making them a valuable resource for SLAM benchmark development. △ Less

Submitted 12 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 7 pages; Accepted to ICRA 2024

arXiv:2309.10718 [pdf, other]

DRIVE: Data-driven Robot Input Vector Exploration

Authors: Dominic Baril, Simon-Pierre Deschênes, Luc Coupal, Cyril Goffin, Julien Lépine, Philippe Giguère, François Pomerleau

Abstract: An accurate motion model is a fundamental component of most autonomous navigation systems. While much work has been done on improving model formulation, no standard protocol exists for gathering empirical data required to train models. In this work, we address this issue by proposing Data-driven Robot Input Vector Exploration (DRIVE), a protocol that enables characterizing uncrewed ground vehicles… ▽ More An accurate motion model is a fundamental component of most autonomous navigation systems. While much work has been done on improving model formulation, no standard protocol exists for gathering empirical data required to train models. In this work, we address this issue by proposing Data-driven Robot Input Vector Exploration (DRIVE), a protocol that enables characterizing uncrewed ground vehicles (UGVs) input limits and gathering empirical model training data. We also propose a novel learned slip approach outperforming similar acceleration learning approaches. Our contributions are validated through an extensive experimental evaluation, cumulating over 7 km and 1.8 h of driving data over three distinct UGVs and four terrain types. We show that our protocol offers increased predictive performance over common human-driven data-gathering protocols. Furthermore, our protocol converges with 46 s of training data, almost four times less than the shortest human dataset gathering protocol. We show that the operational limit for our model is reached in extreme slip conditions encountered on surfaced ice. DRIVE is an efficient way of characterizing UGV motion in its operational conditions. Our code and dataset are both available online at this link: https://github.com/norlab-ulaval/DRIVE. △ Less

Submitted 27 March, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: 8 pages, 7 figures, 1 table, accepted for publication at the 2024 IEEE International Conference on Robotics and Automation (ICRA2024), Yokohama, Japan

arXiv:2307.01864 [pdf, other]

doi 10.1109/IROS55552.2023.10342294

MaskBEV: Joint Object Detection and Footprint Completion for Bird's-eye View 3D Point Clouds

Authors: William Guimont-Martin, Jean-Michel Fortin, François Pomerleau, Philippe Giguère

Abstract: Recent works in object detection in LiDAR point clouds mostly focus on predicting bounding boxes around objects. This prediction is commonly achieved using anchor-based or anchor-free detectors that predict bounding boxes, requiring significant explicit prior knowledge about the objects to work properly. To remedy these limitations, we propose MaskBEV, a bird's-eye view (BEV) mask-based object det… ▽ More Recent works in object detection in LiDAR point clouds mostly focus on predicting bounding boxes around objects. This prediction is commonly achieved using anchor-based or anchor-free detectors that predict bounding boxes, requiring significant explicit prior knowledge about the objects to work properly. To remedy these limitations, we propose MaskBEV, a bird's-eye view (BEV) mask-based object detector neural architecture. MaskBEV predicts a set of BEV instance masks that represent the footprints of detected objects. Moreover, our approach allows object detection and footprint completion in a single pass. MaskBEV also reformulates the detection problem purely in terms of classification, doing away with regression usually done to predict bounding boxes. We evaluate the performance of MaskBEV on both SemanticKITTI and KITTI datasets while analyzing the architecture advantages and limitations. △ Less

Submitted 31 July, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: \c{opyright} 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2303.02190 [pdf, other]

MixVPR: Feature Mixing for Visual Place Recognition

Authors: Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère

Abstract: Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving as well as other computer vision tasks. It refers to the process of identifying a place depicted in a query image using only computer vision. At large scale, repetitive structures, weather and illumination changes pose a real challenge, as appearances can drastically change over time. Along with tackling the… ▽ More Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving as well as other computer vision tasks. It refers to the process of identifying a place depicted in a query image using only computer vision. At large scale, repetitive structures, weather and illumination changes pose a real challenge, as appearances can drastically change over time. Along with tackling these challenges, an efficient VPR technique must also be practical in real-world scenarios where latency matters. To address this, we introduce MixVPR, a new holistic feature aggregation technique that takes feature maps from pre-trained backbones as a set of global features. Then, it incorporates a global relationship between elements in each feature map in a cascade of feature mixing, eliminating the need for local or pyramidal aggregation as done in NetVLAD or TransVPR. We demonstrate the effectiveness of our technique through extensive experiments on multiple large-scale benchmarks. Our method outperforms all existing techniques by a large margin while having less than half the number of parameters compared to CosPlace and NetVLAD. We achieve a new all-time high recall@1 score of 94.6% on Pitts250k-test, 88.0% on MapillarySLS, and more importantly, 58.4% on Nordland. Finally, our method outperforms two-stage retrieval techniques such as Patch-NetVLAD, TransVPR and SuperGLUE all while being orders of magnitude faster. Our code and trained models are available at https://github.com/amaralibey/MixVPR. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: Accepted at WACV 2023

arXiv:2302.14217 [pdf, other]

Global Proxy-based Hard Mining for Visual Place Recognition

Authors: Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère

Abstract: Learning deep representations for visual place recognition is commonly performed using pairwise or triple loss functions that highly depend on the hardness of the examples sampled at each training iteration. Existing techniques address this by using computationally and memory expensive offline hard mining, which consists of identifying, at each iteration, the hardest samples from the training set.… ▽ More Learning deep representations for visual place recognition is commonly performed using pairwise or triple loss functions that highly depend on the hardness of the examples sampled at each training iteration. Existing techniques address this by using computationally and memory expensive offline hard mining, which consists of identifying, at each iteration, the hardest samples from the training set. In this paper we introduce a new technique that performs global hard mini-batch sampling based on proxies. To do so, we add a new end-to-end trainable branch to the network, which generates efficient place descriptors (one proxy for each place). These proxy representations are thus used to construct a global index that encompasses the similarities between all places in the dataset, allowing for highly informative mini-batch sampling at each training iteration. Our method can be used in combination with all existing pairwise and triplet loss functions with negligible additional memory and computation cost. We run extensive ablation studies and show that our technique brings new state-of-the-art performance on multiple large-scale benchmarks such as Pittsburgh, Mapillary-SLS and SPED. In particular, our method provides more than 100% relative improvement on the challenging Nordland dataset. Our code is available at https://github.com/amaralibey/GPM △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted at BMVC 2022

arXiv:2210.17424 [pdf, other]

doi 10.1093/forestry/cpac043

Tree Detection and Diameter Estimation Based on Deep Learning

Authors: Vincent Grondin, Jean-Michel Fortin, François Pomerleau, Philippe Giguère

Abstract: Tree perception is an essential building block toward autonomous forestry operations. Current developments generally consider input data from lidar sensors to solve forest navigation, tree detection and diameter estimation problems. Whereas cameras paired with deep learning algorithms usually address species classification or forest anomaly detection. In either of these cases, data unavailability… ▽ More Tree perception is an essential building block toward autonomous forestry operations. Current developments generally consider input data from lidar sensors to solve forest navigation, tree detection and diameter estimation problems. Whereas cameras paired with deep learning algorithms usually address species classification or forest anomaly detection. In either of these cases, data unavailability and forest diversity restrain deep learning developments for autonomous systems. So, we propose two densely annotated image datasets - 43k synthetic, 100 real - for bounding box, segmentation mask and keypoint detections to assess the potential of vision-based methods. Deep neural network models trained on our datasets achieve a precision of 90.4% for tree detection, 87.2% for tree segmentation, and centimeter accurate keypoint estimations. We measure our models' generalizability when testing it on other forest datasets, and their scalability with different dataset sizes and architectural improvements. Overall, the experimental results offer promising avenues toward autonomous tree felling operations and other applied forestry problems. The datasets and pre-trained models in this article are publicly available on \href{https://github.com/norlab-ulaval/PercepTreeV1}{GitHub} (https://github.com/norlab-ulaval/PercepTreeV1). △ Less

Submitted 31 October, 2022; originally announced October 2022.

arXiv:2210.10239 [pdf, other]

doi 10.1016/j.neucom.2022.09.127

GSV-Cities: Toward Appropriate Supervised Visual Place Recognition

Authors: Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère

Abstract: This paper aims to investigate representation learning for large scale visual place recognition, which consists of determining the location depicted in a query image by referring to a database of reference images. This is a challenging task due to the large-scale environmental changes that can occur over time (i.e., weather, illumination, season, traffic, occlusion). Progress is currently challeng… ▽ More This paper aims to investigate representation learning for large scale visual place recognition, which consists of determining the location depicted in a query image by referring to a database of reference images. This is a challenging task due to the large-scale environmental changes that can occur over time (i.e., weather, illumination, season, traffic, occlusion). Progress is currently challenged by the lack of large databases with accurate ground truth. To address this challenge, we introduce GSV-Cities, a new image dataset providing the widest geographic coverage to date with highly accurate ground truth, covering more than 40 cities across all continents over a 14-year period. We subsequently explore the full potential of recent advances in deep metric learning to train networks specifically for place recognition, and evaluate how different loss functions influence performance. In addition, we show that performance of existing methods substantially improves when trained on GSV-Cities. Finally, we introduce a new fully convolutional aggregation layer that outperforms existing techniques, including GeM, NetVLAD and CosPlace, and establish a new state-of-the-art on large-scale benchmarks, such as Pittsburgh, Mapillary-SLS, SPED and Nordland. The dataset and code are available for research purposes at https://github.com/amaralibey/gsv-cities. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: Neurocomputing 2022

arXiv:2210.04104 [pdf, other]

Training Deep Learning Algorithms on Synthetic Forest Images for Tree Detection

Authors: Vincent Grondin, François Pomerleau, Philippe Giguère

Abstract: Vision-based segmentation in forested environments is a key functionality for autonomous forestry operations such as tree felling and forwarding. Deep learning algorithms demonstrate promising results to perform visual tasks such as object detection. However, the supervised learning process of these algorithms requires annotations from a large diversity of images. In this work, we propose to use s… ▽ More Vision-based segmentation in forested environments is a key functionality for autonomous forestry operations such as tree felling and forwarding. Deep learning algorithms demonstrate promising results to perform visual tasks such as object detection. However, the supervised learning process of these algorithms requires annotations from a large diversity of images. In this work, we propose to use simulated forest environments to automatically generate 43 k realistic synthetic images with pixel-level annotations, and use it to train deep learning algorithms for tree detection. This allows us to address the following questions: i) what kind of performance should we expect from deep learning in harsh synthetic forest environments, ii) which annotations are the most important for training, and iii) what modality should be used between RGB and depth. We also report the promising transfer learning capability of features learned on our synthetic dataset by directly predicting bounding box, segmentation masks and keypoints on real images. Code available on GitHub (https://github.com/norlab-ulaval/PercepTreeV1). △ Less

Submitted 8 October, 2022; originally announced October 2022.

Comments: Work presented at ICRA 2022 Workshop in Innovation in Forestry Robotics: Research and Industry Adoption

arXiv:2203.01902 [pdf, other]

Instance Segmentation for Autonomous Log Gras** in Forestry Operations

Authors: Jean-Michel Fortin, Olivier Gamache, Vincent Grondin, François Pomerleau, Philippe Giguère

Abstract: Wood logs picking is a challenging task to automate. Indeed, logs usually come in cluttered configurations, randomly orientated and overlap**. Recent work on log picking automation usually assume that the logs' pose is known, with little consideration given to the actual perception problem. In this paper, we squarely address the latter, using a data-driven approach. First, we introduce a novel d… ▽ More Wood logs picking is a challenging task to automate. Indeed, logs usually come in cluttered configurations, randomly orientated and overlap**. Recent work on log picking automation usually assume that the logs' pose is known, with little consideration given to the actual perception problem. In this paper, we squarely address the latter, using a data-driven approach. First, we introduce a novel dataset, named TimberSeg 1.0, that is densely annotated, i.e., that includes both bounding boxes and pixel-level mask annotations for logs. This dataset comprises 220 images with 2500 individually segmented logs. Using our dataset, we then compare three neural network architectures on the task of individual logs detection and segmentation; two region-based methods and one attention-based method. Unsurprisingly, our results show that axis-aligned proposals, failing to take into account the directional nature of logs, underperform with 19.03 mAP. A rotation-aware proposal method significantly improve results to 31.83 mAP. More interestingly, a Transformer-based approach, without any inductive bias on rotations, outperformed the two others, achieving a mAP of 57.53 on our dataset. Our use case demonstrates the limitations of region-based approaches for cluttered, elongated objects. It also highlights the potential of attention-based methods on this specific task, as they work directly at the pixel-level. These encouraging results indicate that such a perception system could be used to assist the operators on the short-term, or to fully automate log picking operations in the future. △ Less

Submitted 18 October, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: 8 pages, 6 figures, accepted at IROS 2022

arXiv:2111.13981 [pdf]

doi 10.55417/fr.2022050

Kilometer-scale autonomous navigation in subarctic forests: challenges and lessons learned

Authors: Dominic Baril, Simon-Pierre Deschênes, Olivier Gamache, Maxime Vaidis, Damien LaRocque, Johann Laconte, Vladimír Kubelka, Philippe Giguère, François Pomerleau

Abstract: Challenges inherent to autonomous wintertime navigation in forests include lack of reliable a Global Navigation Satellite System (GNSS) signal, low feature contrast, high illumination variations and changing environment. This type of off-road environment is an extreme case of situations autonomous cars could encounter in northern regions. Thus, it is important to understand the impact of this hars… ▽ More Challenges inherent to autonomous wintertime navigation in forests include lack of reliable a Global Navigation Satellite System (GNSS) signal, low feature contrast, high illumination variations and changing environment. This type of off-road environment is an extreme case of situations autonomous cars could encounter in northern regions. Thus, it is important to understand the impact of this harsh environment on autonomous navigation systems. To this end, we present a field report analyzing teach-and-repeat navigation in a subarctic forest while subject to fluctuating weather, including light and heavy snow, rain and drizzle. First, we describe the system, which relies on point cloud registration to localize a mobile robot through a boreal forest, while simultaneously building a map. We experimentally evaluate this system in over 18.8 km of autonomous navigation in the teach-and-repeat mode. Over 14 repeat runs, only four manual interventions were required, three of which were due to localization failure and another one caused by battery power outage. We show that dense vegetation perturbs the GNSS signal, rendering it unsuitable for navigation in forest trails. Furthermore, we highlight the increased uncertainty related to localizing using point cloud registration in forest trails. We demonstrate that it is not snow precipitation, but snow accumulation, that affects our system's ability to localize within the environment. Finally, we expose some challenges and lessons learned from our field campaign to support better experimental work in winter conditions. Our dataset is available online. △ Less

Submitted 26 July, 2022; v1 submitted 27 November, 2021; originally announced November 2021.

Comments: Published in Field Robotics Volume 2. Paper number 50

arXiv:2105.11320 [pdf, other]

doi 10.1109/IROS40897.2019.8967704

SuMa++: Efficient LiDAR-based Semantic SLAM

Authors: Xieyuanli Chen, Andres Milioto, Emanuele Palazzolo, Philippe Giguère, Jens Behley, Cyrill Stachniss

Abstract: Reliable and accurate localization and map** are key components of most autonomous systems. Besides geometric information about the mapped environment, the semantics plays an important role to enable intelligent navigation behaviors. In most realistic environments, this task is particularly complicated due to dynamics caused by moving objects, which can corrupt the map** step or derail localiz… ▽ More Reliable and accurate localization and map** are key components of most autonomous systems. Besides geometric information about the mapped environment, the semantics plays an important role to enable intelligent navigation behaviors. In most realistic environments, this task is particularly complicated due to dynamics caused by moving objects, which can corrupt the map** step or derail localization. In this paper, we propose an extension of a recently published surfel-based map** approach exploiting three-dimensional laser range scans by integrating semantic information to facilitate the map** process. The semantic information is efficiently extracted by a fully convolutional neural network and rendered on a spherical projection of the laser range data. This computed semantic segmentation results in point-wise labels for the whole scan, allowing us to build a semantically-enriched map with labeled surfels. This semantic map enables us to reliably filter moving objects, but also improve the projective scan matching via semantic constraints. Our experimental evaluation on challenging highways sequences from KITTI dataset with very few static structures and a large amount of moving cars shows the advantage of our semantic SLAM approach in comparison to a purely geometric, state-of-the-art approach. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: Accepted by IROS 2019. Code: https://github.com/PRBonn/semantic_suma

arXiv:2105.01215 [pdf, other]

doi 10.1109/CRV52889.2021.00014

Lidar Scan Registration Robust to Extreme Motions

Authors: Simon-Pierre Deschênes, Dominic Baril, Vladimír Kubelka, Philippe Giguère, François Pomerleau

Abstract: Registration algorithms, such as Iterative Closest Point (ICP), have proven effective in mobile robot localization algorithms over the last decades. However, they are susceptible to failure when a robot sustains extreme velocities and accelerations. For example, this kind of motion can happen after a collision, causing a point cloud to be heavily skewed. While point cloud de-skewing methods have b… ▽ More Registration algorithms, such as Iterative Closest Point (ICP), have proven effective in mobile robot localization algorithms over the last decades. However, they are susceptible to failure when a robot sustains extreme velocities and accelerations. For example, this kind of motion can happen after a collision, causing a point cloud to be heavily skewed. While point cloud de-skewing methods have been explored in the past to increase localization and map** accuracy, these methods still rely on highly accurate odometry systems or ideal navigation conditions. In this paper, we present a method taking into account the remaining motion uncertainties of the trajectory used to de-skew a point cloud along with the environment geometry to increase the robustness of current registration algorithms. We compare our method to three other solutions in a test bench producing 3D maps with peak accelerations of 200 m/s^2 and 800 rad/s^2. In these extreme scenarios, we demonstrate that our method decreases the error by 9.26 % in translation and by 21.84 % in rotation. The proposed method is generic enough to be integrated to many variants of weighted ICP without adaptation and supports localization robustness in harsher terrains. △ Less

Submitted 3 May, 2021; originally announced May 2021.

Comments: 8 pages, 8 figures, published in 2021 18th Conference on Robots and Vision (CRV), Burnaby, Canada

Journal ref: 2021 18th Conference on Robots and Vision (CRV), 2021, pp. 17-24

arXiv:2104.14396 [pdf, other]

Accurate outdoor ground truth based on total stations

Authors: Maxime Vaidis, Philippe Giguère, François Pomerleau, Vladimír Kubelka

Abstract: In robotics, accurate ground-truth position fostered the development of map** and localization algorithms through the creation of cornerstone datasets. In outdoor environments and over long distances, total stations are the most accurate and precise measurement instruments for this purpose. Most total station-based systems in the literature are limited to three Degrees Of Freedoms (DOFs), due to… ▽ More In robotics, accurate ground-truth position fostered the development of map** and localization algorithms through the creation of cornerstone datasets. In outdoor environments and over long distances, total stations are the most accurate and precise measurement instruments for this purpose. Most total station-based systems in the literature are limited to three Degrees Of Freedoms (DOFs), due to the use of a single-prism tracking approach. In this paper, we present preliminary work on measuring a full pose of a vehicle, bringing the referencing system to six DOFs. Three total stations are used to track in real time three prisms attached to a target platform. We describe the structure of the referencing system and the protocol for acquiring the ground truth with this system. We evaluated its precision in a variety of different outdoor environments, ranging from open-sky to forest trails, and compare this system with another popular source of reference position, the Real Time Kinematics (RTK) positioning solution. Results show that our approach is the most precise, reaching an average positional error of 10 mm and 0.6 deg. This difference in performance was particularly stark in environments where Global Navigation Satellite System (GNSS) signals can be weaker due to overreaching vegetation. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Comments: Final version submitted and accepted in the 18th Conference on Robots and Vision (CRV) in May 2021

arXiv:2004.05131 [pdf, other]

Evaluation of Skid-Steering Kinematic Models for Subarctic Environments

Authors: Dominic Baril, Vincent Grondin, Simon-Pierre Deschênes, Johann Laconte, Maxime Vaidis, Vladimír Kubelka, André Gallant, Philippe Giguère, François Pomerleau

Abstract: In subarctic and arctic areas, large and heavy skid-steered robots are preferred for their robustness and ability to operate on difficult terrain. State estimation, motion control and path planning for these robots rely on accurate odometry models based on wheel velocities. However, the state-of-the-art odometry models for skid-steer mobile robots (SSMRs) have usually been tested on relatively lig… ▽ More In subarctic and arctic areas, large and heavy skid-steered robots are preferred for their robustness and ability to operate on difficult terrain. State estimation, motion control and path planning for these robots rely on accurate odometry models based on wheel velocities. However, the state-of-the-art odometry models for skid-steer mobile robots (SSMRs) have usually been tested on relatively lightweight platforms. In this paper, we focus on how these models perform when deployed on a large and heavy (590 kg) SSMR. We collected more than 2 km of data on both snow and concrete. We compare the ideal differential-drive, extended differential-drive, radius-of-curvature-based, and full linear kinematic models commonly deployed for SSMRs. Each of the models is fine-tuned by searching their optimal parameters on both snow and concrete. We then discuss the relationship between the parameters, the model tuning, and the final accuracy of the models. △ Less

Submitted 10 April, 2020; originally announced April 2020.

Comments: 8 pages, 8 figures, published in 2020 17th Conference on Computer and Robot Vision (CRV), Ottawa, Canada

arXiv:2001.10657 [pdf, other]

The Indian Chefs Process

Authors: Patrick Dallaire, Luca Ambrogioni, Ludovic Trottier, Umut Güçlü, Max Hinne, Philippe Giguère, Brahim Chaib-Draa, Marcel van Gerven, Francois Laviolette

Abstract: This paper introduces the Indian Chefs Process (ICP), a Bayesian nonparametric prior on the joint space of infinite directed acyclic graphs (DAGs) and orders that generalizes Indian Buffet Processes. As our construction shows, the proposed distribution relies on a latent Beta Process controlling both the orders and outgoing connection probabilities of the nodes, and yields a probability distributi… ▽ More This paper introduces the Indian Chefs Process (ICP), a Bayesian nonparametric prior on the joint space of infinite directed acyclic graphs (DAGs) and orders that generalizes Indian Buffet Processes. As our construction shows, the proposed distribution relies on a latent Beta Process controlling both the orders and outgoing connection probabilities of the nodes, and yields a probability distribution on sparse infinite graphs. The main advantage of the ICP over previously proposed Bayesian nonparametric priors for DAG structures is its greater flexibility. To the best of our knowledge, the ICP is the first Bayesian nonparametric model supporting every possible DAG. We demonstrate the usefulness of the ICP on learning the structure of deep generative sigmoid networks as well as convolutional neural networks. △ Less

Submitted 28 January, 2020; originally announced January 2020.

arXiv:1912.03221 [pdf, other]

Tree bark re-identification using a deep-learning feature descriptor

Authors: Martin Robert, Patrick Dallaire, Philippe Giguère

Abstract: The ability to visually re-identify objects is a fundamental capability in vision systems. Oftentimes, it relies on collections of visual signatures based on descriptors, such as SIFT or SURF. However, these traditional descriptors were designed for a certain domain of surface appearances and geometries (limited relief). Consequently, highly-textured surfaces such as tree bark pose a challenge to… ▽ More The ability to visually re-identify objects is a fundamental capability in vision systems. Oftentimes, it relies on collections of visual signatures based on descriptors, such as SIFT or SURF. However, these traditional descriptors were designed for a certain domain of surface appearances and geometries (limited relief). Consequently, highly-textured surfaces such as tree bark pose a challenge to them. In turn, this makes it more difficult to use trees as identifiable landmarks for navigational purposes (robotics) or to track felled lumber along a supply chain (logistics). We thus propose to use data-driven descriptors trained on bark images for tree surface re-identification. To this effect, we collected a large dataset containing 2,400 bark images with strong illumination changes, annotated by surface and with the ability to pixel-align them. We used this dataset to sample from more than 2 million 64x64 pixel patches to train our novel local descriptors DeepBark and SqueezeBark. Our DeepBark method has shown a clear advantage against the hand-crafted descriptors SIFT and SURF. For instance, we demonstrated that DeepBark can reach a mAP of 87.2% when retrieving 11 relevant bark images, i.e. corresponding to the same physical surface, to a bark query against 7,900 images. Our work thus suggests that re-identifying tree surfaces in a challenging illuminations context is possible. We also make public our dataset, which can be used to benchmark surface re-identification techniques. △ Less

Submitted 1 April, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

arXiv:1911.11822 [pdf, other]

Deep Template-based Object Instance Detection

Authors: Jean-Philippe Mercier, Mathieu Garon, Philippe Giguère, Jean-François Lalonde

Abstract: Much of the focus in the object detection literature has been on the problem of identifying the bounding box of a particular class of object in an image. Yet, in contexts such as robotics and augmented reality, it is often necessary to find a specific object instance---a unique toy or a custom industrial part for example---rather than a generic object class. Here, applications can require a rapid… ▽ More Much of the focus in the object detection literature has been on the problem of identifying the bounding box of a particular class of object in an image. Yet, in contexts such as robotics and augmented reality, it is often necessary to find a specific object instance---a unique toy or a custom industrial part for example---rather than a generic object class. Here, applications can require a rapid shift from one object instance to another, thus requiring fast turnaround which affords little-to-no training time. What is more, gathering a dataset and training a model for every new object instance to be detected can be an expensive and time-consuming process. In this context, we propose a generic 2D object instance detection approach that uses example viewpoints of the target object at test time to retrieve its 2D location in RGB images, without requiring any additional training (i.e. fine-tuning) step. To this end, we present an end-to-end architecture that extracts global and local information of the object from its viewpoints. The global information is used to tune early filters in the backbone while local viewpoints are correlated with the input image. Our method offers an improvement of almost 30 mAP over the previous template matching methods on the challenging Occluded Linemod dataset (overall mAP of 50.7). Our experiments also show that our single generic model (not trained on any of the test objects) yields detection results that are on par with approaches that are trained specifically on the target objects. △ Less

Submitted 14 November, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

arXiv:1910.11968 [pdf, other]

Driving Datasets Literature Review

Authors: Charles-Éric Noël Laflamme, François Pomerleau, Philippe Giguère

Abstract: This report is a survey of the different autonomous driving datasets which have been published up to date. The first section introduces the many sensor types used in autonomous driving datasets. The second section investigates the calibration and synchronization procedure required to generate accurate data. The third section describes the diverse driving tasks explored by the datasets. Finally, th… ▽ More This report is a survey of the different autonomous driving datasets which have been published up to date. The first section introduces the many sensor types used in autonomous driving datasets. The second section investigates the calibration and synchronization procedure required to generate accurate data. The third section describes the diverse driving tasks explored by the datasets. Finally, the fourth section provides comprehensive lists of datasets, mainly in the form of tables. △ Less

Submitted 25 October, 2019; originally announced October 2019.

arXiv:1905.02082 [pdf, other]

ReFusion: 3D Reconstruction in Dynamic Environments for RGB-D Cameras Exploiting Residuals

Authors: Emanuele Palazzolo, Jens Behley, Philipp Lottes, Philippe Giguère, Cyrill Stachniss

Abstract: Map** and localization are essential capabilities of robotic systems. Although the majority of map** systems focus on static environments, the deployment in real-world situations requires them to handle dynamic objects. In this paper, we propose an approach for an RGB-D sensor that is able to consistently map scenes containing multiple dynamic elements. For localization and map**, we employ… ▽ More Map** and localization are essential capabilities of robotic systems. Although the majority of map** systems focus on static environments, the deployment in real-world situations requires them to handle dynamic objects. In this paper, we propose an approach for an RGB-D sensor that is able to consistently map scenes containing multiple dynamic elements. For localization and map**, we employ an efficient direct tracking on the truncated signed distance function (TSDF) and leverage color information encoded in the TSDF to estimate the pose of the sensor. The TSDF is efficiently represented using voxel hashing, with most computations parallelized on a GPU. For detecting dynamics, we exploit the residuals obtained after an initial registration, together with the explicit modeling of free space in the model. We evaluate our approach on existing datasets, and provide a new dataset showing highly dynamic scenes. These experiments show that our approach often surpass other state-of-the-art dense SLAM methods. We make available our dataset with the ground truth for both the trajectory of the RGB-D sensor obtained by a motion capture system and the model of the static environment using a high-precision terrestrial laser scanner. Finally, we release our approach as open source code. △ Less

Submitted 28 August, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

Comments: Accepted at the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS) 2019. See teaser video at https://www.youtube.com/watch?v=1P9ZfIS5-p4. See open source code at https://github.com/PRBonn/refusion

arXiv:1904.07837 [pdf, other]

Predicting GNSS satellite visibility from dense point clouds

Authors: Philippe Dandurand, Philippe Babin, Vladimır Kubelka, Philippe Giguère, François Pomerleau

Abstract: To help future mobile agents plan their movement in harsh environments,a predictive model has been designed to determine what areas would be favorable for Global Navigation Satellite System (GNSS) positioning. The model is able to predict the number of viable satellites for a GNSS receiver, based on a 3D point cloud map and a satellite constellation. Both occlusion and absorption effects of the en… ▽ More To help future mobile agents plan their movement in harsh environments,a predictive model has been designed to determine what areas would be favorable for Global Navigation Satellite System (GNSS) positioning. The model is able to predict the number of viable satellites for a GNSS receiver, based on a 3D point cloud map and a satellite constellation. Both occlusion and absorption effects of the environment are considered. A rugged mobile platform was designed to collect data in order to generate the point cloud maps. It was deployed during the Canadian winter known for large amounts of snow and extremely low temperatures. The test environments include a highly dense boreal forest and a university campus with high buildings. The experiment results indicate that the model performs well in both structured and unstructured environments △ Less

Submitted 1 May, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

arXiv:1904.07814 [pdf, other]

Large-scale 3D Map** of Subarctic Forests

Authors: Philippe Babin, Philippe Dandurand, Vladimír Kubelka, Philippe Giguère, François Pomerleau

Abstract: The ability to map challenging subarctic environments opens new horizons for robotic deployments in industries such as forestry, surveillance, and open-pit mining. In this paper, we explore possibilities of large-scale lidar map** in a boreal forest. Computational and sensory requirements with regards to contemporary hardware are considered as well. The lidar map** is often based on the SLAM t… ▽ More The ability to map challenging subarctic environments opens new horizons for robotic deployments in industries such as forestry, surveillance, and open-pit mining. In this paper, we explore possibilities of large-scale lidar map** in a boreal forest. Computational and sensory requirements with regards to contemporary hardware are considered as well. The lidar map** is often based on the SLAM technique relying on pose graph optimization, which fuses the Iterative Closest Point (ICP) algorithm, Global Navigation Satellite System (GNSS) positioning, and Inertial Measurement Unit (IMU) measurements. To handle those sensors directly within the ICP minimization process, we propose an alternative approach of embedding external constraints. Furthermore, a novel formulation of a cost function is presented and cast into the problem of handling uncertainties from GNSS and lidar points. To test our approach, we acquired a large-scale dataset in the Foret Montmorency research forest. We report on the technical problems faced during our winter deployments aiming at building 3D maps using our new cost function. Those maps demonstrate both global and local consistency over 4.1km. △ Less

Submitted 13 September, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

Comments: Final version published in Field and Service Robotics (FSR) 2019

arXiv:1904.05281 [pdf, other]

Automatic 3D Map** for Tree Diameter Measurements in Inventory Operations

Authors: Jean-François Tremblay, Martin Béland, François Pomerleau, Richard Gagnon, Philippe Giguère

Abstract: Forestry is a major industry in many parts of the world. It relies on forest inventory, which consists of measuring tree attributes. We propose to use 3D map**, based on the iterative closest point algorithm, to automatically measure tree diameters in forests from mobile robot observations. While previous studies showed the potential for such technology, they lacked a rigorous analysis of diamet… ▽ More Forestry is a major industry in many parts of the world. It relies on forest inventory, which consists of measuring tree attributes. We propose to use 3D map**, based on the iterative closest point algorithm, to automatically measure tree diameters in forests from mobile robot observations. While previous studies showed the potential for such technology, they lacked a rigorous analysis of diameter estimation methods in challenging forest environments. Here, we validated multiple diameter estimation methods, including two novel ones, in a new varied dataset of four different forest sites, 11 trajectories, totaling 1458 tree observations and 1.4 hectares. We provide recommendations for the deployment of mobile robots in a forestry context. We conclude that our map** method is usable in the context of automated forest inventory, with our best method yielding a root mean square error of 3.45 cm for our whole dataset, and 2.04 cm in ideal conditions consisting of mature forest with well spaced trees. △ Less

Submitted 11 July, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

arXiv:1903.02489 [pdf, other]

GQ-STN: Optimizing One-Shot Grasp Detection based on Robustness Classifier

Authors: Alexandre Gariépy, Jean-Christophe Ruel, Brahim Chaib-draa, Philippe Giguère

Abstract: Gras** is a fundamental robotic task needed for the deployment of household robots or furthering warehouse automation. However, few approaches are able to perform grasp detection in real time (frame rate). To this effect, we present Grasp Quality Spatial Transformer Network (GQ-STN), a one-shot grasp detection network. Being based on the Spatial Transformer Network (STN), it produces not only a… ▽ More Gras** is a fundamental robotic task needed for the deployment of household robots or furthering warehouse automation. However, few approaches are able to perform grasp detection in real time (frame rate). To this effect, we present Grasp Quality Spatial Transformer Network (GQ-STN), a one-shot grasp detection network. Being based on the Spatial Transformer Network (STN), it produces not only a grasp configuration, but also directly outputs a depth image centered at this configuration. By connecting our architecture to an externally-trained grasp robustness evaluation network, we can train efficiently to satisfy a robustness metric via the backpropagation of the gradient emanating from the evaluation network. This removes the difficulty of training detection networks on sparsely annotated databases, a common issue in gras**. We further propose to use this robustness classifier to compare approaches, being more reliable than the traditional rectangle metric. Our GQ-STN is able to detect robust grasps on the depth images of the Dex-Net 2.0 dataset with 92.4 % accuracy in a single pass of the network. We finally demonstrate in a physical benchmark that our method can propose robust grasps more often than previous sampling-based methods, while being more than 60 times faster. △ Less

Submitted 31 July, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

arXiv:1810.01474 [pdf, other]

Analysis of Robust Functions for Registration Algorithms

Authors: Philippe Babin, Philippe Giguère, François Pomerleau

Abstract: Registration accuracy is influenced by the presence of outliers and numerous robust solutions have been developed over the years to mitigate their effect. However, without a large scale comparison of solutions to filter outliers, it is becoming tedious to select an appropriate algorithm for a given application. This paper presents a comprehensive analyses of the effects of outlier filters on the I… ▽ More Registration accuracy is influenced by the presence of outliers and numerous robust solutions have been developed over the years to mitigate their effect. However, without a large scale comparison of solutions to filter outliers, it is becoming tedious to select an appropriate algorithm for a given application. This paper presents a comprehensive analyses of the effects of outlier filters on the ICP algorithm aimed at mobile robotic application. Fourteen of the most common outlier filters (such as M-estimators) have been tested in different types of environments, for a total of more than two million registrations. Furthermore, the influence of tuning parameters have been thoroughly explored. The experimental results show that most outlier filters have similar performance if they are correctly tuned. Nonetheless, filters such as Var. Trim., Cauchy, and Cauchy MAD are more stable against different environment types. Interestingly, the simple norm L1 produces comparable accuracy, while been parameterless. △ Less

Submitted 2 October, 2018; originally announced October 2018.

arXiv:1810.01470 [pdf, other]

CELLO-3D: Estimating the Covariance of ICP in the Real World

Authors: David Landry, François Pomerleau, Philippe Giguère

Abstract: The fusion of Iterative Closest Point (ICP) reg- istrations in existing state estimation frameworks relies on an accurate estimation of their uncertainty. In this paper, we study the estimation of this uncertainty in the form of a covariance. First, we scrutinize the limitations of existing closed-form covariance estimation algorithms over 3D datasets. Then, we set out to estimate the covariance o… ▽ More The fusion of Iterative Closest Point (ICP) reg- istrations in existing state estimation frameworks relies on an accurate estimation of their uncertainty. In this paper, we study the estimation of this uncertainty in the form of a covariance. First, we scrutinize the limitations of existing closed-form covariance estimation algorithms over 3D datasets. Then, we set out to estimate the covariance of ICP registrations through a data-driven approach, with over 5 100 000 registrations on 1020 pairs from real 3D point clouds. We assess our solution upon a wide spectrum of environments, ranging from structured to unstructured and indoor to outdoor. The capacity of our algorithm to predict covariances is accurately assessed, as well as the usefulness of these estimations for uncertainty estimation over trajectories. The proposed method estimates covariances better than existing closed-form solutions, and makes predictions that are consistent with observed trajectories. △ Less

Submitted 2 October, 2018; originally announced October 2018.

arXiv:1806.06888 [pdf, other]

Learning Object Localization and 6D Pose Estimation from Simulation and Weakly Labeled Real Images

Authors: Jean-Philippe Mercier, Chaitanya Mitash, Philippe Giguère, Abdeslam Boularias

Abstract: This work proposes a process for efficiently training a point-wise object detector that enables localizing objects and computing their 6D poses in cluttered and occluded scenes. Accurate pose estimation is typically a requirement for robust robotic gras** and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. To minimize the human labor requir… ▽ More This work proposes a process for efficiently training a point-wise object detector that enables localizing objects and computing their 6D poses in cluttered and occluded scenes. Accurate pose estimation is typically a requirement for robust robotic gras** and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. To minimize the human labor required for annotation, the proposed object detector is first trained in simulation by using automatically annotated synthetic images. We then show that the performance of the detector can be substantially improved by using a small set of weakly annotated real images, where a human provides only a list of objects present in each image without indicating the location of the objects. To close the gap between real and synthetic images, we adopt a domain adaptation approach through adversarial training. The detector resulting from this training process can be used to localize objects by using its per-object activation maps. In this work, we use the activation maps to guide the search of 6D poses of objects. Our proposed approach is evaluated on several publicly available datasets for pose estimation. We also evaluated our model on classification and localization in unsupervised and semi-supervised settings. The results clearly indicate that this approach could provide an efficient way toward fully automating the training process of computer vision models used in robotics. △ Less

Submitted 20 February, 2019; v1 submitted 18 June, 2018; originally announced June 2018.

arXiv:1803.00949 [pdf, other]

Tree Species Identification from Bark Images Using Convolutional Neural Networks

Authors: Mathieu Carpentier, Philippe Giguère, Jonathan Gaudreault

Abstract: Tree species identification using bark images is a challenging problem that could prove useful for many forestry related tasks. However, while the recent progress in deep learning showed impressive results on standard vision problems, a lack of datasets prevented its use on tree bark species classification. In this work, we present, and make publicly available, a novel dataset called BarkNet 1.0 c… ▽ More Tree species identification using bark images is a challenging problem that could prove useful for many forestry related tasks. However, while the recent progress in deep learning showed impressive results on standard vision problems, a lack of datasets prevented its use on tree bark species classification. In this work, we present, and make publicly available, a novel dataset called BarkNet 1.0 containing more than 23,000 high-resolution bark images from 23 different tree species over a wide range of tree diameters. With it, we demonstrate the feasibility of species recognition through bark images, using deep learning. More specifically, we obtain an accuracy of 93.88% on single crop, and an accuracy of 97.81% using a majority voting approach on all of the images of a tree. We also empirically demonstrate that, for a fixed number of images, it is better to maximize the number of tree individuals in the training database, thus directing future data collection efforts. △ Less

Submitted 31 July, 2018; v1 submitted 2 March, 2018; originally announced March 2018.

Comments: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:1711.00111 [pdf, other]

Multi-Task Learning by Deep Collaboration and Application in Facial Landmark Detection

Authors: Ludovic Trottier, Philippe Giguère, Brahim Chaib-draa

Abstract: Convolutional neural networks (CNNs) have become the most successful approach in many vision-related domains. However, they are limited to domains where data is abundant. Recent works have looked at multi-task learning (MTL) to mitigate data scarcity by leveraging domain-specific information from related tasks. In this paper, we present a novel soft-parameter sharing mechanism for CNNs in a MTL se… ▽ More Convolutional neural networks (CNNs) have become the most successful approach in many vision-related domains. However, they are limited to domains where data is abundant. Recent works have looked at multi-task learning (MTL) to mitigate data scarcity by leveraging domain-specific information from related tasks. In this paper, we present a novel soft-parameter sharing mechanism for CNNs in a MTL setting, which we refer to as Deep Collaboration. We propose taking into account the notion that task relevance depends on depth by using lateral transformation blocs with skip connections. This allows extracting task-specific features at various depth without sacrificing features relevant to all tasks. We show that CNNs connected with our Deep Collaboration obtain better accuracy on facial landmark detection with related tasks. We finally verify that our approach effectively allows knowledge sharing by showing depth-specific influence of tasks that we know are related. △ Less

Submitted 15 March, 2018; v1 submitted 27 October, 2017; originally announced November 2017.

Comments: Under review at the 15th European Conference on Computer Vision (ECCV) (2018)

arXiv:1606.00538 [pdf, other]

Dictionary Learning for Robotic Grasp Recognition and Detection

Authors: Ludovic Trottier, Philippe Giguère, Brahim Chaib-draa

Abstract: The ability to grasp ordinary and potentially never-seen objects is an important feature in both domestic and industrial robotics. For a system to accomplish this, it must autonomously identify gras** locations by using information from various sensors, such as Microsoft Kinect 3D camera. Despite numerous progress, significant work still remains to be done in this field. To this effect, we propo… ▽ More The ability to grasp ordinary and potentially never-seen objects is an important feature in both domestic and industrial robotics. For a system to accomplish this, it must autonomously identify gras** locations by using information from various sensors, such as Microsoft Kinect 3D camera. Despite numerous progress, significant work still remains to be done in this field. To this effect, we propose a dictionary learning and sparse representation (DLSR) framework for representing RGBD images from 3D sensors in the context of determining such good gras** locations. In contrast to previously proposed approaches that relied on sophisticated regularization or very large datasets, the derived perception system has a fast training phase and can work with small datasets. It is also theoretically founded for dealing with masked-out entries, which are common with 3D sensors. We contribute by presenting a comparative study of several DLSR approach combinations for recognizing and detecting grasp candidates on the standard Cornell dataset. Importantly, experimental results show a performance improvement of 1.69% in detection and 3.16% in recognition over current state-of-the-art convolutional neural network (CNN). Even though nowadays most popular vision-based approach is CNN, this suggests that DLSR is also a viable alternative with interesting advantages that CNN has not. △ Less

Submitted 2 June, 2016; originally announced June 2016.

Comments: Submitted at the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2016)

arXiv:1605.09332 [pdf, ps, other]

Parametric Exponential Linear Unit for Deep Convolutional Neural Networks

Authors: Ludovic Trottier, Philippe Giguère, Brahim Chaib-draa

Abstract: Object recognition is an important task for improving the ability of visual systems to perform complex scene understanding. Recently, the Exponential Linear Unit (ELU) has been proposed as a key component for managing bias shift in Convolutional Neural Networks (CNNs), but defines a parameter that must be set by hand. In this paper, we propose learning a parameterization of ELU in order to learn t… ▽ More Object recognition is an important task for improving the ability of visual systems to perform complex scene understanding. Recently, the Exponential Linear Unit (ELU) has been proposed as a key component for managing bias shift in Convolutional Neural Networks (CNNs), but defines a parameter that must be set by hand. In this paper, we propose learning a parameterization of ELU in order to learn the proper activation shape at each layer in the CNNs. Our results on the MNIST, CIFAR-10/100 and ImageNet datasets using the NiN, Overfeat, All-CNN and ResNet networks indicate that our proposed Parametric ELU (PELU) has better performances than the non-parametric ELU. We have observed as much as a 7.28% relative error improvement on ImageNet with the NiN network, with only 0.0003% parameter increase. Our visual examination of the non-linear behaviors adopted by Vgg using PELU shows that the network took advantage of the added flexibility by learning different activations at different layers. △ Less

Submitted 10 January, 2018; v1 submitted 30 May, 2016; originally announced May 2016.

Comments: 16th IEEE International Conference On Machine Learning And Applications, 2017

arXiv:1503.05830 [pdf, other]

doi 10.1109/CRV.2014.20

Sign Language Fingerspelling Classification from Depth and Color Images using a Deep Belief Network

Authors: Lucas Rioux-Maldague, Philippe Giguère

Abstract: Automatic sign language recognition is an open problem that has received a lot of attention recently, not only because of its usefulness to signers, but also due to the numerous applications a sign classifier can have. In this article, we present a new feature extraction technique for hand pose recognition using depth and intensity images captured from a Microsoft Kinect sensor. We applied our tec… ▽ More Automatic sign language recognition is an open problem that has received a lot of attention recently, not only because of its usefulness to signers, but also due to the numerous applications a sign classifier can have. In this article, we present a new feature extraction technique for hand pose recognition using depth and intensity images captured from a Microsoft Kinect sensor. We applied our technique to American Sign Language fingerspelling classification using a Deep Belief Network, for which our feature extraction technique is tailored. We evaluated our results on a multi-user data set with two scenarios: one with all known users and one with an unseen user. We achieved 99% recall and precision on the first, and 77% recall and 79% precision on the second. Our method is also capable of real-time sign classification and is adaptive to any environment or lightning intensity. △ Less

Submitted 19 March, 2015; originally announced March 2015.

Comments: Published in 2014 Canadian Conference on Computer and Robot Vision

Showing 1–36 of 36 results for author: Giguère, P