Search | arXiv e-print repository

Challenging the Universal Representation of Deep Models for 3D Point Cloud Registration

Authors: David Bojanić, Kristijan Bartol, Josep Forest, Stefan Gumhold, Tomislav Petković, Tomislav Pribanić

Abstract: Learning universal representations across different applications domain is an open research problem. In fact, finding universal architecture within the same application but across different types of datasets is still unsolved problem too, especially in applications involving processing 3D point clouds. In this work we experimentally test several state-of-the-art learning-based methods for 3D point… ▽ More Learning universal representations across different applications domain is an open research problem. In fact, finding universal architecture within the same application but across different types of datasets is still unsolved problem too, especially in applications involving processing 3D point clouds. In this work we experimentally test several state-of-the-art learning-based methods for 3D point cloud registration against the proposed non-learning baseline registration method. The proposed method either outperforms or achieves comparable results w.r.t. learning based methods. In addition, we propose a dataset on which learning based methods have a hard time to generalize. Our proposed method and dataset, along with the provided experiments, can be used in further research in studying effective solutions for universal representations. Our source code is available at: github.com/DavidBoja/greedy-grid-search. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: Accepted at the BMVC 2022 workshop: Universal Representations for Computer Vison (URCV) (https://bmvc2022.mpi-inf.mpg.de/workshops/)

arXiv:2110.00280 [pdf, other]

Generalizable Human Pose Triangulation

Authors: Kristijan Bartol, David Bojanić, Tomislav Petković, Tomislav Pribanić

Abstract: We address the problem of generalizability for multi-view 3D human pose estimation. The standard approach is to first detect 2D keypoints in images and then apply triangulation from multiple views. Even though the existing methods achieve remarkably accurate 3D pose estimation on public benchmarks, most of them are limited to a single spatial camera arrangement and their number. Several methods ad… ▽ More We address the problem of generalizability for multi-view 3D human pose estimation. The standard approach is to first detect 2D keypoints in images and then apply triangulation from multiple views. Even though the existing methods achieve remarkably accurate 3D pose estimation on public benchmarks, most of them are limited to a single spatial camera arrangement and their number. Several methods address this limitation but demonstrate significantly degraded performance on novel views. We propose a stochastic framework for human pose triangulation and demonstrate a superior generalization across different camera arrangements on two public datasets. In addition, we apply the same approach to the fundamental matrix estimation problem, showing that the proposed method can successfully apply to other computer vision problems. The stochastic framework achieves more than 8.8% improvement on the 3D pose estimation task, compared to the state-of-the-art, and more than 30% improvement for fundamental matrix estimation, compared to a standard algorithm. △ Less

Submitted 20 April, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

arXiv:2109.11872 [pdf, other]

Catadioptric Stereo on a Smartphone

Authors: Kristijan Bartol, David Bojanić, Tomislav Petković, Tomislav Pribanić

Abstract: We present a 3D printed adapter with planar mirrors for stereo reconstruction using front and back smartphone camera. The adapter presents a practical and low-cost solution for enabling any smartphone to be used as a stereo camera, which is currently only possible using high-end phones with expensive 3D sensors. Using the prototype version of the adapter, we experiment with parameters like the ang… ▽ More We present a 3D printed adapter with planar mirrors for stereo reconstruction using front and back smartphone camera. The adapter presents a practical and low-cost solution for enabling any smartphone to be used as a stereo camera, which is currently only possible using high-end phones with expensive 3D sensors. Using the prototype version of the adapter, we experiment with parameters like the angles between cameras and mirrors and the distance to each camera (the stereo baseline). We find the most convenient configuration and calibrate the stereo pair. Based on the presented preliminary analysis, we identify possible improvements in the current design. To demonstrate the working prototype, we reconstruct a 3D human pose using 2D keypoint detections from the stereo pair and evaluate extracted body lengths. The result shows that the adapter can be used for anthropometric measurement of several body segments. △ Less

Submitted 24 September, 2021; originally announced September 2021.

arXiv:2101.05645 [pdf, other]

Ensemble of LSTMs and feature selection for human action prediction

Authors: Tomislav Petković, Luka Petrović, Ivan Marković, Ivan Petrović

Abstract: As robots are becoming more and more ubiquitous in human environments, it will be necessary for robotic systems to better understand and predict human actions. However, this is not an easy task, at times not even for us humans, but based on a relatively structured set of possible actions, appropriate cues, and the right model, this problem can be computationally tackled. In this paper, we propose… ▽ More As robots are becoming more and more ubiquitous in human environments, it will be necessary for robotic systems to better understand and predict human actions. However, this is not an easy task, at times not even for us humans, but based on a relatively structured set of possible actions, appropriate cues, and the right model, this problem can be computationally tackled. In this paper, we propose to use an ensemble of long-short term memory (LSTM) networks for human action prediction. To train and evaluate models, we used the MoGaze dataset - currently the most comprehensive dataset capturing poses of human joints and the human gaze. We have thoroughly analyzed the MoGaze dataset and selected a reduced set of cues for this task. Our model can predict (i) which of the labeled objects the human is going to grasp, and (ii) which of the macro locations the human is going to visit (such as table or shelf). We have exhaustively evaluated the proposed method and compared it to individual cue baselines. The results suggest that our LSTM model slightly outperforms the gaze baseline in single object picking accuracy, but achieves better accuracy in macro object prediction. Furthermore, we have also analyzed the prediction accuracy when the gaze is not used, and in this case, the LSTM model considerably outperformed the best single cue baseline △ Less

Submitted 14 January, 2021; originally announced January 2021.

arXiv:2011.03104

Can Human Sex Be Learned Using Only 2D Body Keypoint Estimations?

Authors: Kristijan Bartol, Tomislav Pribanic, David Bojanic, Tomislav Petkovic

Abstract: In this paper, we analyze human male and female sex recognition problem and present a fully automated classification system using only 2D keypoints. The keypoints represent human joints. A keypoint set consists of 15 joints and the keypoint estimations are obtained using an OpenPose 2D keypoint detector. We learn a deep learning model to distinguish males and females using the keypoints as input a… ▽ More In this paper, we analyze human male and female sex recognition problem and present a fully automated classification system using only 2D keypoints. The keypoints represent human joints. A keypoint set consists of 15 joints and the keypoint estimations are obtained using an OpenPose 2D keypoint detector. We learn a deep learning model to distinguish males and females using the keypoints as input and binary labels as output. We use two public datasets in the experimental section - 3DPeople and PETA. On PETA dataset, we report a 77% accuracy. We provide model performance details on both PETA and 3DPeople. To measure the effect of noisy 2D keypoint detections on the performance, we run separate experiments on 3DPeople ground truth and noisy keypoint data. Finally, we extract a set of factors that affect the classification accuracy and propose future work. The advantage of the approach is that the input is small and the architecture is simple, which enables us to run many experiments and keep the real-time performance in inference. The source code, with the experiments and data preparation scripts, are available on GitHub (https://github.com/kristijanbartol/human-sex-classifier). △ Less

Submitted 20 April, 2022; v1 submitted 5 November, 2020; originally announced November 2020.

Comments: There was an error in the implementation of the base experiment (#1), i.e., the data preparation step. More specifically, the labels used both for training and evaluation were wrong, thus making the conclusions invalid

arXiv:2011.03102 [pdf, other]

Smart Time-Multiplexing of Quads Solves the Multicamera Interference Problem

Authors: Tomislav Pribanic, Tomislav Petkovic, David Bojanic, Kristijan Bartol

Abstract: Time-of-flight (ToF) cameras are becoming increasingly popular for 3D imaging. Their optimal usage has been studied from the several aspects. One of the open research problems is the possibility of a multicamera interference problem when two or more ToF cameras are operating simultaneously. In this work we present an efficient method to synchronize multiple operating ToF cameras. Our method is bas… ▽ More Time-of-flight (ToF) cameras are becoming increasingly popular for 3D imaging. Their optimal usage has been studied from the several aspects. One of the open research problems is the possibility of a multicamera interference problem when two or more ToF cameras are operating simultaneously. In this work we present an efficient method to synchronize multiple operating ToF cameras. Our method is based on the time-division multiplexing, but unlike traditional time multiplexing, it does not decrease the effective camera frame rate. Additionally, for unsynchronized cameras, we provide a robust method to extract from their corresponding video streams, frames which are not subject to multicamera interference problem. We demonstrate our approach through a series of experiments and with a different level of support available for triggering, ranging from a hardware triggering to purely random software triggering. △ Less

Submitted 5 November, 2020; originally announced November 2020.

arXiv:2011.03091 [pdf, other]

doi 10.5220/0009190005830589

Towards Keypoint Guided Self-Supervised Depth Estimation

Authors: Kristijan Bartol, David Bojanic, Tomislav Petkovic, Tomislav Pribanic, Yago Diez Donoso

Abstract: This paper proposes to use keypoints as a self-supervision clue for learning depth map estimation from a collection of input images. As ground truth depth from real images is difficult to obtain, there are many unsupervised and self-supervised approaches to depth estimation that have been proposed. Most of these unsupervised approaches use depth map and ego-motion estimations to reproject the pixe… ▽ More This paper proposes to use keypoints as a self-supervision clue for learning depth map estimation from a collection of input images. As ground truth depth from real images is difficult to obtain, there are many unsupervised and self-supervised approaches to depth estimation that have been proposed. Most of these unsupervised approaches use depth map and ego-motion estimations to reproject the pixels from the current image into the adjacent image from the image collection. Depth and ego-motion estimations are evaluated based on pixel intensity differences between the correspondent original and reprojected pixels. Instead of reprojecting the individual pixels, we propose to first select image keypoints in both images and then reproject and compare the correspondent keypoints of the two images. The keypoints should describe the distinctive image features well. By learning a deep model with and without the keypoint extraction technique, we show that using the keypoints improve the depth estimation learning. We also propose some future directions for keypoint-guided learning of structure-from-motion problems. △ Less

Submitted 5 November, 2020; originally announced November 2020.

Journal ref: 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2019

arXiv:2007.10000 [pdf, other]

doi 10.1109/ISPA.2019.8868792

On the Comparison of Classic and Deep Keypoint Detector and Descriptor Methods

Authors: Kristijan Bartol, David Bojanić, Tomislav Pribanić, Tomislav Petković, Yago Diez Donoso, Joaquim Salvi Mas

Abstract: The purpose of this study is to give a performance comparison between several classic hand-crafted and deep key-point detector and descriptor methods. In particular, we consider the following classical algorithms: SIFT, SURF, ORB, FAST, BRISK, MSER, HARRIS, KAZE, AKAZE, AGAST, GFTT, FREAK, BRIEF and RootSIFT, where a subset of all combinations is paired into detector-descriptor pipelines. Addition… ▽ More The purpose of this study is to give a performance comparison between several classic hand-crafted and deep key-point detector and descriptor methods. In particular, we consider the following classical algorithms: SIFT, SURF, ORB, FAST, BRISK, MSER, HARRIS, KAZE, AKAZE, AGAST, GFTT, FREAK, BRIEF and RootSIFT, where a subset of all combinations is paired into detector-descriptor pipelines. Additionally, we analyze the performance of two recent and perspective deep detector-descriptor models, LF-Net and SuperPoint. Our benchmark relies on the HPSequences dataset that provides real and diverse images under various geometric and illumination changes. We analyze the performance on three evaluation tasks: keypoint verification, image matching and keypoint retrieval. The results show that certain classic and deep approaches are still comparable, with some classic detector-descriptor combinations overperforming pretrained deep models. In terms of the execution times of tested implementations, SuperPoint model is the fastest, followed by ORB. The source code is published on \url{https://github.com/kristijanbartol/keypoint-algorithms-benchmark}. △ Less

Submitted 29 July, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

Journal ref: Proceedings of the 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA), Page(s): 64-69

arXiv:2005.11202 [pdf, other]

Human Intention Recognition for Human Aware Planning in Integrated Warehouse Systems

Authors: Tomislav Petković, Jakub Hvězda, Tomáš Rybecký, Ivan Marković, Miroslav Kulich, Libor Přeučil, Ivan Petrović

Abstract: With the substantial growth of logistics businesses the need for larger and more automated warehouses increases, thus giving rise to fully robotized shop-floors with mobile robots in charge of transporting and distributing goods. However, even in fully automatized warehouse systems the need for human intervention frequently arises, whether because of maintenance or because of fulfilling specific o… ▽ More With the substantial growth of logistics businesses the need for larger and more automated warehouses increases, thus giving rise to fully robotized shop-floors with mobile robots in charge of transporting and distributing goods. However, even in fully automatized warehouse systems the need for human intervention frequently arises, whether because of maintenance or because of fulfilling specific orders, thus bringing mobile robots and humans ever closer in an integrated warehouse environment. In order to ensure smooth and efficient operation of such a warehouse, paths of both robots and humans need to be carefully planned; however, due to the possibility of humans deviating from the assigned path, this becomes an even more challenging task. Given that, the supervising system should be able to recognize human intentions and its alternative paths in real-time. In this paper, we propose a framework for human deviation detection and intention recognition which outputs the most probable paths of the humans workers and the planner that acts accordingly by replanning for robots to move out of the human's path. Experimental results demonstrate that the proposed framework increases total number of deliveries, especially human deliveries, and reduces human-robot encounters. △ Less

Submitted 22 May, 2020; originally announced May 2020.

Comments: To be presented in 28th Mediterranean Conference on Control and Automation (MED'2020) conference

arXiv:1811.08269 [pdf, other]

doi 10.1016/j.rcim.2018.11.004

Human Intention Estimation based on Hidden Markov Model Motion Validation for Safe Flexible Robotized Warehouses

Authors: Tomislav Petković, David Puljiz, Ivan Marković, Björn Hein

Abstract: With the substantial growth of logistics businesses the need for larger warehouses and their automation arises, thus using robots as assistants to human workers is becoming a priority. In order to operate efficiently and safely, robot assistants or the supervising system should recognize human intentions in real-time. Theory of mind (ToM) is an intuitive human conception of other humans' mental st… ▽ More With the substantial growth of logistics businesses the need for larger warehouses and their automation arises, thus using robots as assistants to human workers is becoming a priority. In order to operate efficiently and safely, robot assistants or the supervising system should recognize human intentions in real-time. Theory of mind (ToM) is an intuitive human conception of other humans' mental state, i.e., beliefs and desires, and how they cause behavior. In this paper we propose a ToM based human intention estimation algorithm for flexible robotized warehouses. We observe human's, i.e., worker's motion and validate it with respect to the goal locations using generalized Voronoi diagram based path planning. These observations are then processed by the proposed hidden Markov model framework which estimates worker intentions in an online manner, capable of handling changing environments. To test the proposed intention estimation we ran experiments in a real-world laboratory warehouse with a worker wearing Microsoft Hololens augmented reality glasses. Furthermore, in order to demonstrate the scalability of the approach to larger warehouses, we propose to use virtual reality digital warehouse twins in order to realistically simulate worker behavior. We conducted intention estimation experiments in the larger warehouse digital twin with up to 24 running robots. We demonstrate that the proposed framework estimates warehouse worker intentions precisely and in the end we discuss the experimental results. △ Less

Submitted 20 November, 2018; originally announced November 2018.

Journal ref: Robotics and Computer-Integrated Manufacturing 57 (2019): 182-196

arXiv:1804.01774 [pdf, other]

Human Intention Recognition in Flexible Robotized Warehouses based on Markov Decision Processes

Authors: Tomislav Petković, Ivan Marković, Ivan Petrović

Abstract: The rapid growth of e-commerce increases the need for larger warehouses and their automation, thus using robots as assistants to human workers becomes a priority. In order to operate efficiently and safely, robot assistants or the supervising system should recognize human intentions. Theory of mind (ToM) is an intuitive conception of other agents' mental state, i.e., beliefs and desires, and how t… ▽ More The rapid growth of e-commerce increases the need for larger warehouses and their automation, thus using robots as assistants to human workers becomes a priority. In order to operate efficiently and safely, robot assistants or the supervising system should recognize human intentions. Theory of mind (ToM) is an intuitive conception of other agents' mental state, i.e., beliefs and desires, and how they cause behavior. In this paper we present a ToM-based algorithm for human intention recognition in flexible robotized warehouses. We have placed the warehouse worker in a simulated 2D environment with three potential goals. We observe agent's actions and validate them with respect to the goal locations using a Markov decision process framework. Those observations are then processed by the proposed hidden Markov model framework which estimated agent's desires. We demonstrate that the proposed framework predicts human warehouse worker's desires in an intuitive manner and in the end we discuss the simulation results. △ Less

Submitted 5 April, 2018; originally announced April 2018.

arXiv:1510.04863 [pdf]

An Extension to Hough Transform Based on Gradient Orientation

Authors: Tomislav Petković, Sven Lončarić

Abstract: The Hough transform is one of the most common methods for line detection. In this paper we propose a novel extension of the regular Hough transform. The proposed extension combines the extension of the accumulator space and the local gradient orientation resulting in clutter reduction and yielding more prominent peaks, thus enabling better line identification. We demonstrate benefits in applicatio… ▽ More The Hough transform is one of the most common methods for line detection. In this paper we propose a novel extension of the regular Hough transform. The proposed extension combines the extension of the accumulator space and the local gradient orientation resulting in clutter reduction and yielding more prominent peaks, thus enabling better line identification. We demonstrate benefits in applications such as visual quality inspection and rectangle detection. △ Less

Submitted 16 October, 2015; originally announced October 2015.

Comments: Part of the Proceedings of the Croatian Computer Vision Workshop, CCVW 2015, Year 3

Report number: UniZg-CRV-CCVW/2015/0012

arXiv:1310.0306 [pdf]

Flexible Visual Quality Inspection in Discrete Manufacturing

Authors: Tomislav Petković, Darko Jurić, Sven Lončarić

Abstract: Most visual quality inspections in discrete manufacturing are composed of length, surface, angle or intensity measurements. Those are implemented as end-user configurable inspection tools that should not require an image processing expert to set up. Currently available software solutions providing such capability use a flowchart based programming environment, but do not fully address an inspection… ▽ More Most visual quality inspections in discrete manufacturing are composed of length, surface, angle or intensity measurements. Those are implemented as end-user configurable inspection tools that should not require an image processing expert to set up. Currently available software solutions providing such capability use a flowchart based programming environment, but do not fully address an inspection flowchart robustness and can require a redefinition of the flowchart if a small variation is introduced. In this paper we propose an acquire-register-analyze image processing pattern designed for discrete manufacturing that aims to increase the robustness of the inspection flowchart by consistently addressing variations in product position, orientation and size. A proposed pattern is transparent to the end-user and simplifies the flowchart. We describe a developed software solution that is a practical implementation of the proposed pattern. We give an example of its real-life use in industrial production of electric components. △ Less

Submitted 1 October, 2013; originally announced October 2013.

Comments: Part of the Proceedings of the Croatian Computer Vision Workshop, CCVW 2013, Year 1

Report number: UniZg-CRV-CCVW/2013/0020

Showing 1–13 of 13 results for author: Petković, T