Search | arXiv e-print repository

Challenging the Universal Representation of Deep Models for 3D Point Cloud Registration

Authors: David Bojanić, Kristijan Bartol, Josep Forest, Stefan Gumhold, Tomislav Petković, Tomislav Pribanić

Abstract: Learning universal representations across different applications domain is an open research problem. In fact, finding universal architecture within the same application but across different types of datasets is still unsolved problem too, especially in applications involving processing 3D point clouds. In this work we experimentally test several state-of-the-art learning-based methods for 3D point… ▽ More Learning universal representations across different applications domain is an open research problem. In fact, finding universal architecture within the same application but across different types of datasets is still unsolved problem too, especially in applications involving processing 3D point clouds. In this work we experimentally test several state-of-the-art learning-based methods for 3D point cloud registration against the proposed non-learning baseline registration method. The proposed method either outperforms or achieves comparable results w.r.t. learning based methods. In addition, we propose a dataset on which learning based methods have a hard time to generalize. Our proposed method and dataset, along with the provided experiments, can be used in further research in studying effective solutions for universal representations. Our source code is available at: github.com/DavidBoja/greedy-grid-search. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: Accepted at the BMVC 2022 workshop: Universal Representations for Computer Vison (URCV) (https://bmvc2022.mpi-inf.mpg.de/workshops/)

arXiv:2110.00280 [pdf, other]

Generalizable Human Pose Triangulation

Authors: Kristijan Bartol, David Bojanić, Tomislav Petković, Tomislav Pribanić

Abstract: We address the problem of generalizability for multi-view 3D human pose estimation. The standard approach is to first detect 2D keypoints in images and then apply triangulation from multiple views. Even though the existing methods achieve remarkably accurate 3D pose estimation on public benchmarks, most of them are limited to a single spatial camera arrangement and their number. Several methods ad… ▽ More We address the problem of generalizability for multi-view 3D human pose estimation. The standard approach is to first detect 2D keypoints in images and then apply triangulation from multiple views. Even though the existing methods achieve remarkably accurate 3D pose estimation on public benchmarks, most of them are limited to a single spatial camera arrangement and their number. Several methods address this limitation but demonstrate significantly degraded performance on novel views. We propose a stochastic framework for human pose triangulation and demonstrate a superior generalization across different camera arrangements on two public datasets. In addition, we apply the same approach to the fundamental matrix estimation problem, showing that the proposed method can successfully apply to other computer vision problems. The stochastic framework achieves more than 8.8% improvement on the 3D pose estimation task, compared to the state-of-the-art, and more than 30% improvement for fundamental matrix estimation, compared to a standard algorithm. △ Less

Submitted 20 April, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

arXiv:2109.11872 [pdf, other]

Catadioptric Stereo on a Smartphone

Authors: Kristijan Bartol, David Bojanić, Tomislav Petković, Tomislav Pribanić

Abstract: We present a 3D printed adapter with planar mirrors for stereo reconstruction using front and back smartphone camera. The adapter presents a practical and low-cost solution for enabling any smartphone to be used as a stereo camera, which is currently only possible using high-end phones with expensive 3D sensors. Using the prototype version of the adapter, we experiment with parameters like the ang… ▽ More We present a 3D printed adapter with planar mirrors for stereo reconstruction using front and back smartphone camera. The adapter presents a practical and low-cost solution for enabling any smartphone to be used as a stereo camera, which is currently only possible using high-end phones with expensive 3D sensors. Using the prototype version of the adapter, we experiment with parameters like the angles between cameras and mirrors and the distance to each camera (the stereo baseline). We find the most convenient configuration and calibrate the stereo pair. Based on the presented preliminary analysis, we identify possible improvements in the current design. To demonstrate the working prototype, we reconstruct a 3D human pose using 2D keypoint detections from the stereo pair and evaluate extracted body lengths. The result shows that the adapter can be used for anthropometric measurement of several body segments. △ Less

Submitted 24 September, 2021; originally announced September 2021.

arXiv:2011.03104

Can Human Sex Be Learned Using Only 2D Body Keypoint Estimations?

Authors: Kristijan Bartol, Tomislav Pribanic, David Bojanic, Tomislav Petkovic

Abstract: In this paper, we analyze human male and female sex recognition problem and present a fully automated classification system using only 2D keypoints. The keypoints represent human joints. A keypoint set consists of 15 joints and the keypoint estimations are obtained using an OpenPose 2D keypoint detector. We learn a deep learning model to distinguish males and females using the keypoints as input a… ▽ More In this paper, we analyze human male and female sex recognition problem and present a fully automated classification system using only 2D keypoints. The keypoints represent human joints. A keypoint set consists of 15 joints and the keypoint estimations are obtained using an OpenPose 2D keypoint detector. We learn a deep learning model to distinguish males and females using the keypoints as input and binary labels as output. We use two public datasets in the experimental section - 3DPeople and PETA. On PETA dataset, we report a 77% accuracy. We provide model performance details on both PETA and 3DPeople. To measure the effect of noisy 2D keypoint detections on the performance, we run separate experiments on 3DPeople ground truth and noisy keypoint data. Finally, we extract a set of factors that affect the classification accuracy and propose future work. The advantage of the approach is that the input is small and the architecture is simple, which enables us to run many experiments and keep the real-time performance in inference. The source code, with the experiments and data preparation scripts, are available on GitHub (https://github.com/kristijanbartol/human-sex-classifier). △ Less

Submitted 20 April, 2022; v1 submitted 5 November, 2020; originally announced November 2020.

Comments: There was an error in the implementation of the base experiment (#1), i.e., the data preparation step. More specifically, the labels used both for training and evaluation were wrong, thus making the conclusions invalid

arXiv:2011.03102 [pdf, other]

Smart Time-Multiplexing of Quads Solves the Multicamera Interference Problem

Authors: Tomislav Pribanic, Tomislav Petkovic, David Bojanic, Kristijan Bartol

Abstract: Time-of-flight (ToF) cameras are becoming increasingly popular for 3D imaging. Their optimal usage has been studied from the several aspects. One of the open research problems is the possibility of a multicamera interference problem when two or more ToF cameras are operating simultaneously. In this work we present an efficient method to synchronize multiple operating ToF cameras. Our method is bas… ▽ More Time-of-flight (ToF) cameras are becoming increasingly popular for 3D imaging. Their optimal usage has been studied from the several aspects. One of the open research problems is the possibility of a multicamera interference problem when two or more ToF cameras are operating simultaneously. In this work we present an efficient method to synchronize multiple operating ToF cameras. Our method is based on the time-division multiplexing, but unlike traditional time multiplexing, it does not decrease the effective camera frame rate. Additionally, for unsynchronized cameras, we provide a robust method to extract from their corresponding video streams, frames which are not subject to multicamera interference problem. We demonstrate our approach through a series of experiments and with a different level of support available for triggering, ranging from a hardware triggering to purely random software triggering. △ Less

Submitted 5 November, 2020; originally announced November 2020.

arXiv:2011.03091 [pdf, other]

doi 10.5220/0009190005830589

Towards Keypoint Guided Self-Supervised Depth Estimation

Authors: Kristijan Bartol, David Bojanic, Tomislav Petkovic, Tomislav Pribanic, Yago Diez Donoso

Abstract: This paper proposes to use keypoints as a self-supervision clue for learning depth map estimation from a collection of input images. As ground truth depth from real images is difficult to obtain, there are many unsupervised and self-supervised approaches to depth estimation that have been proposed. Most of these unsupervised approaches use depth map and ego-motion estimations to reproject the pixe… ▽ More This paper proposes to use keypoints as a self-supervision clue for learning depth map estimation from a collection of input images. As ground truth depth from real images is difficult to obtain, there are many unsupervised and self-supervised approaches to depth estimation that have been proposed. Most of these unsupervised approaches use depth map and ego-motion estimations to reproject the pixels from the current image into the adjacent image from the image collection. Depth and ego-motion estimations are evaluated based on pixel intensity differences between the correspondent original and reprojected pixels. Instead of reprojecting the individual pixels, we propose to first select image keypoints in both images and then reproject and compare the correspondent keypoints of the two images. The keypoints should describe the distinctive image features well. By learning a deep model with and without the keypoint extraction technique, we show that using the keypoints improve the depth estimation learning. We also propose some future directions for keypoint-guided learning of structure-from-motion problems. △ Less

Submitted 5 November, 2020; originally announced November 2020.

Journal ref: 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, 2019

arXiv:2007.10000 [pdf, other]

doi 10.1109/ISPA.2019.8868792

On the Comparison of Classic and Deep Keypoint Detector and Descriptor Methods

Authors: Kristijan Bartol, David Bojanić, Tomislav Pribanić, Tomislav Petković, Yago Diez Donoso, Joaquim Salvi Mas

Abstract: The purpose of this study is to give a performance comparison between several classic hand-crafted and deep key-point detector and descriptor methods. In particular, we consider the following classical algorithms: SIFT, SURF, ORB, FAST, BRISK, MSER, HARRIS, KAZE, AKAZE, AGAST, GFTT, FREAK, BRIEF and RootSIFT, where a subset of all combinations is paired into detector-descriptor pipelines. Addition… ▽ More The purpose of this study is to give a performance comparison between several classic hand-crafted and deep key-point detector and descriptor methods. In particular, we consider the following classical algorithms: SIFT, SURF, ORB, FAST, BRISK, MSER, HARRIS, KAZE, AKAZE, AGAST, GFTT, FREAK, BRIEF and RootSIFT, where a subset of all combinations is paired into detector-descriptor pipelines. Additionally, we analyze the performance of two recent and perspective deep detector-descriptor models, LF-Net and SuperPoint. Our benchmark relies on the HPSequences dataset that provides real and diverse images under various geometric and illumination changes. We analyze the performance on three evaluation tasks: keypoint verification, image matching and keypoint retrieval. The results show that certain classic and deep approaches are still comparable, with some classic detector-descriptor combinations overperforming pretrained deep models. In terms of the execution times of tested implementations, SuperPoint model is the fastest, followed by ORB. The source code is published on \url{https://github.com/kristijanbartol/keypoint-algorithms-benchmark}. △ Less

Submitted 29 July, 2020; v1 submitted 20 July, 2020; originally announced July 2020.

Journal ref: Proceedings of the 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA), Page(s): 64-69

arXiv:1310.0302 [pdf]

Surface Registration Using Genetic Algorithm in Reduced Search Space

Authors: Vedran Hrgetić, Tomislav Pribanić

Abstract: Surface registration is a technique that is used in various areas such as object recognition and 3D model reconstruction. Problem of surface registration can be analyzed as an optimization problem of seeking a rigid motion between two different views. Genetic algorithms can be used for solving this optimization problem, both for obtaining the robust parameter estimation and for its fine-tuning. Th… ▽ More Surface registration is a technique that is used in various areas such as object recognition and 3D model reconstruction. Problem of surface registration can be analyzed as an optimization problem of seeking a rigid motion between two different views. Genetic algorithms can be used for solving this optimization problem, both for obtaining the robust parameter estimation and for its fine-tuning. The main drawback of genetic algorithms is that they are time consuming which makes them unsuitable for online applications. Modern acquisition systems enable the implementation of the solutions that would immediately give the information on the rotational angles between the different views, thus reducing the dimension of the optimization problem. The paper gives an analysis of the genetic algorithm implemented in the conditions when the rotation matrix is known and a comparison of these results with results when this information is not available. △ Less

Submitted 1 October, 2013; originally announced October 2013.

Comments: Part of the Proceedings of the Croatian Computer Vision Workshop, CCVW 2013, Year 1

Report number: UniZg-CRV-CCVW/2013/0018

Showing 1–8 of 8 results for author: Pribanić, T