Search | arXiv e-print repository

arXiv:2308.11983 [pdf, other]

doi 10.1109/LRA.2023.3295254

Multi-Modal Multi-Task (3MT) Road Segmentation

Authors: Erkan Milli, Özgür Erkent, Asım Egemen Yılmaz

Abstract: Multi-modal systems have the capacity of producing more reliable results than systems with a single modality in road detection due to perceiving different aspects of the scene. We focus on using raw sensor inputs instead of, as it is typically done in many SOTA works, leveraging architectures that require high pre-processing costs such as surface normals or dense depth predictions. By using raw se… ▽ More Multi-modal systems have the capacity of producing more reliable results than systems with a single modality in road detection due to perceiving different aspects of the scene. We focus on using raw sensor inputs instead of, as it is typically done in many SOTA works, leveraging architectures that require high pre-processing costs such as surface normals or dense depth predictions. By using raw sensor inputs, we aim to utilize a low-cost model thatminimizes both the pre-processing andmodel computation costs. This study presents a cost-effective and highly accurate solution for road segmentation by integrating data from multiple sensorswithin a multi-task learning architecture.Afusion architecture is proposed in which RGB and LiDAR depth images constitute the inputs of the network. Another contribution of this study is to use IMU/GNSS (inertial measurement unit/global navigation satellite system) inertial navigation system whose data is collected synchronously and calibrated with a LiDAR-camera to compute aggregated dense LiDAR depth images. It has been demonstrated by experiments on the KITTI dataset that the proposed method offers fast and high-performance solutions. We have also shown the performance of our method on Cityscapes where raw LiDAR data is not available. The segmentation results obtained for both full and half resolution images are competitive with existing methods. Therefore, we conclude that our method is not dependent only on raw LiDAR data; rather, it can be used with different sensor modalities. The inference times obtained in all experiments are very promising for real-time experiments. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Journal ref: in IEEE Robotics and Automation Letters, vol. 8, no. 9, pp. 5408-5415, Sept. 2023

arXiv:2302.06414 [pdf, other]

LAPTNet-FPN: Multi-scale LiDAR-aided Projective Transform Network for Real Time Semantic Grid Prediction

Authors: Manuel Alejandro Diaz-Zapata, David Sierra González, Özgür Erkent, Jilles Dibangoye, Christian Laugier

Abstract: Semantic grids can be useful representations of the scene around an autonomous system. By having information about the layout of the space around itself, a robot can leverage this type of representation for crucial tasks such as navigation or tracking. By fusing information from multiple sensors, robustness can be increased and the computational load for the task can be lowered, achieving real tim… ▽ More Semantic grids can be useful representations of the scene around an autonomous system. By having information about the layout of the space around itself, a robot can leverage this type of representation for crucial tasks such as navigation or tracking. By fusing information from multiple sensors, robustness can be increased and the computational load for the task can be lowered, achieving real time performance. Our multi-scale LiDAR-Aided Perspective Transform network uses information available in point clouds to guide the projection of image features to a top-view representation, resulting in a relative improvement in the state of the art for semantic grid generation for human (+8.67%) and movable object (+49.07%) classes in the nuScenes dataset, as well as achieving results close to the state of the art for the vehicle, drivable area and walkway classes, while performing inference at 25 FPS. △ Less

Submitted 10 February, 2023; originally announced February 2023.

Comments: 2023 IEEE International Conference on Robotics and Automation (ICRA), IEEE Robotics and Automation Society, May 2023, London, United Kingdom

arXiv:2211.14445 [pdf, other]

LAPTNet: LiDAR-Aided Perspective Transform Network

Authors: Manuel Alejandro Diaz-Zapata, Özgür Erkent, Christian Laugier, Jilles Dibangoye, David Sierra González

Abstract: Semantic grids are a useful representation of the environment around a robot. They can be used in autonomous vehicles to concisely represent the scene around the car, capturing vital information for downstream tasks like navigation or collision assessment. Information from different sensors can be used to generate these grids. Some methods rely only on RGB images, whereas others choose to incorpor… ▽ More Semantic grids are a useful representation of the environment around a robot. They can be used in autonomous vehicles to concisely represent the scene around the car, capturing vital information for downstream tasks like navigation or collision assessment. Information from different sensors can be used to generate these grids. Some methods rely only on RGB images, whereas others choose to incorporate information from other sensors, such as radar or LiDAR. In this paper, we present an architecture that fuses LiDAR and camera information to generate semantic grids. By using the 3D information from a LiDAR point cloud, the LiDAR-Aided Perspective Transform Network (LAPTNet) is able to associate features in the camera plane to the bird's eye view without having to predict any depth information about the scene. Compared to state-of-theart camera-only methods, LAPTNet achieves an improvement of up to 8.8 points (or 38.13%) over state-of-art competing approaches for the classes proposed in the NuScenes dataset validation split. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: ICARCV 2022 - 17th International Conference on Control, Automation, Robotics and Vision, Dec 2022, Singapore, Singapore

Showing 1–3 of 3 results for author: Erkent, Ö