-
Lidar Annotation Is All You Need
Authors:
Dinar Sharafutdinov,
Stanislav Kuskov,
Saian Protasov,
Alexey Voropaev
Abstract:
In recent years, computer vision has transformed fields such as medical imaging, object recognition, and geospatial analytics. One of the fundamental tasks in computer vision is semantic image segmentation, which is vital for precise object delineation. Autonomous driving represents one of the key areas where computer vision algorithms are applied. The task of road surface segmentation is crucial…
▽ More
In recent years, computer vision has transformed fields such as medical imaging, object recognition, and geospatial analytics. One of the fundamental tasks in computer vision is semantic image segmentation, which is vital for precise object delineation. Autonomous driving represents one of the key areas where computer vision algorithms are applied. The task of road surface segmentation is crucial in self-driving systems, but it requires a labor-intensive annotation process in several data domains. The work described in this paper aims to improve the efficiency of image segmentation using a convolutional neural network in a multi-sensor setup. This approach leverages lidar (Light Detection and Ranging) annotations to directly train image segmentation models on RGB images. Lidar supplements the images by emitting laser pulses and measuring reflections to provide depth information. However, lidar's sparse point clouds often create difficulties for accurate object segmentation. Segmentation of point clouds requires time-consuming preliminary data preparation and a large amount of computational resources. The key innovation of our approach is the masked loss, addressing sparse ground-truth masks from point clouds. By calculating loss exclusively where lidar points exist, the model learns road segmentation on images by using lidar points as ground truth. This approach allows for blending of different ground-truth data types during model training. Experimental validation of the approach on benchmark datasets shows comparable performance to a high-quality image segmentation model. Incorporating lidar reduces the load on annotations and enables training of image-segmentation models without loss of segmentation quality. The methodology is tested on diverse datasets, both publicly available and proprietary. The strengths and weaknesses of the proposed method are also discussed in the paper.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
Case study on quantum convolutional neural network scalability
Authors:
Marina O. Lisnichenko,
Stanislav I. Protasov
Abstract:
One of the crucial tasks in computer science is the processing time reduction of various data types, i.e., images, which is important for different fields -- from medicine and logistics to virtual shop**. Compared to classical computers, quantum computers are capable of parallel data processing, which reduces the data processing time. This quality of quantum computers inspired intensive research…
▽ More
One of the crucial tasks in computer science is the processing time reduction of various data types, i.e., images, which is important for different fields -- from medicine and logistics to virtual shop**. Compared to classical computers, quantum computers are capable of parallel data processing, which reduces the data processing time. This quality of quantum computers inspired intensive research of the potential of quantum technologies applicability to real-life tasks. Some progress has already revealed on a smaller volumes of the input data. In this research effort, I aimed to increase the amount of input data (I used images from 2 x 2 to 8 x 8), while reducing the processing time, by way of skip** intermediate measurement steps. The hypothesis was that, for increased input data, the omitting of intermediate measurement steps after each quantum convolution layer will improve output metric results and accelerate data processing. To test the hypothesis, I performed experiments to chose the best activation function and its derivative in each network. The hypothesis was partly confirmed in terms of output mean squared error (MSE) -- it dropped from 0.25 in the result of classical convolutional neural network (CNN) training to 0.23 in the result of quantum convolutional neural network (QCNN) training. In terms of the training time, however, which was 1.5 minutes for CNN and 4 hours 37 minutes in the least lengthy training iteration, the hypothesis was rejected.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
CNN-based Omnidirectional Object Detection for HermesBot Autonomous Delivery Robot with Preliminary Frame Classification
Authors:
Saian Protasov,
Pavel Karpyshev,
Ivan Kalinov,
Pavel Kopanev,
Nikita Mikhailovskiy,
Alexander Sedunin,
Dzmitry Tsetserukou
Abstract:
Mobile autonomous robots include numerous sensors for environment perception. Cameras are an essential tool for robot's localization, navigation, and obstacle avoidance. To process a large flow of data from the sensors, it is necessary to optimize algorithms, or to utilize substantial computational power. In our work, we propose an algorithm for optimizing a neural network for object detection usi…
▽ More
Mobile autonomous robots include numerous sensors for environment perception. Cameras are an essential tool for robot's localization, navigation, and obstacle avoidance. To process a large flow of data from the sensors, it is necessary to optimize algorithms, or to utilize substantial computational power. In our work, we propose an algorithm for optimizing a neural network for object detection using preliminary binary frame classification. An autonomous outdoor mobile robot with 6 rolling-shutter cameras on the perimeter providing a 360-degree field of view was used as the experimental setup. The obtained experimental results revealed that the proposed optimization accelerates the inference time of the neural network in the cases with up to 5 out of 6 cameras containing target objects.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Multi-label Class-imbalanced Action Recognition in Hockey Videos via 3D Convolutional Neural Networks
Authors:
Konstantin Sozykin,
Stanislav Protasov,
Adil Khan,
Rasheed Hussain,
Jooyoung Lee
Abstract:
Automatic analysis of the video is one of most complex problems in the fields of computer vision and machine learning. A significant part of this research deals with (human) activity recognition (HAR) since humans, and the activities that they perform, generate most of the video semantics. Video-based HAR has applications in various domains, but one of the most important and challenging is HAR in…
▽ More
Automatic analysis of the video is one of most complex problems in the fields of computer vision and machine learning. A significant part of this research deals with (human) activity recognition (HAR) since humans, and the activities that they perform, generate most of the video semantics. Video-based HAR has applications in various domains, but one of the most important and challenging is HAR in sports videos. Some of the major issues include high inter- and intra-class variations, large class imbalance, the presence of both group actions and single player actions, and recognizing simultaneous actions, i.e., the multi-label learning problem. Kee** in mind these challenges and the recent success of CNNs in solving various computer vision problems, in this work, we implement a 3D CNN based multi-label deep HAR system for multi-label class-imbalanced action recognition in hockey videos. We test our system for two different scenarios: an ensemble of $k$ binary networks vs. a single $k$-output network, on a publicly available dataset. We also compare our results with the system that was originally designed for the chosen dataset. Experimental results show that the proposed approach performs better than the existing solution.
△ Less
Submitted 3 May, 2018; v1 submitted 5 September, 2017;
originally announced September 2017.