Search | arXiv e-print repository

3D Surface Reconstruction in the Wild by Deforming Shape Priors from Synthetic Data

Authors: Nicolai Häni, Jun-Jee Chao, Volkan Isler

Abstract: Reconstructing the underlying 3D surface of an object from a single image is a challenging problem that has received extensive attention from the computer vision community. Many learning-based approaches tackle this problem by learning a 3D shape prior from either ground truth 3D data or multi-view observations. To achieve state-of-the-art results, these methods assume that the objects are specifi… ▽ More Reconstructing the underlying 3D surface of an object from a single image is a challenging problem that has received extensive attention from the computer vision community. Many learning-based approaches tackle this problem by learning a 3D shape prior from either ground truth 3D data or multi-view observations. To achieve state-of-the-art results, these methods assume that the objects are specified with respect to a fixed canonical coordinate frame, where instances of the same category are perfectly aligned. In this work, we present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image. We show that one can leverage shape priors learned on purely synthetic 3D data together with a point cloud pose canonicalization method to achieve high-quality 3D reconstruction in the wild. Given a single depth image at test time, we first transform this partial point cloud into a learned canonical frame. Then, we use a neural deformation field to reconstruct the 3D surface of the object. Finally, we jointly optimize object pose and 3D shape to fit the partial depth observation. Our approach achieves state-of-the-art reconstruction performance across several real-world datasets, even when trained only on synthetic data. We further show that our method generalizes to different input modalities, from dense depth images to sparse and noisy LIDAR scans. △ Less

Submitted 24 February, 2023; originally announced February 2023.

arXiv:2209.14419 [pdf, other]

Category-Level Global Camera Pose Estimation with Multi-Hypothesis Point Cloud Correspondences

Authors: Jun-Jee Chao, Selim Engin, Nicolai Häni, Volkan Isler

Abstract: Correspondence search is an essential step in rigid point cloud registration algorithms. Most methods maintain a single correspondence at each step and gradually remove wrong correspondances. However, building one-to-one correspondence with hard assignments is extremely difficult, especially when matching two point clouds with many locally similar features. This paper proposes an optimization meth… ▽ More Correspondence search is an essential step in rigid point cloud registration algorithms. Most methods maintain a single correspondence at each step and gradually remove wrong correspondances. However, building one-to-one correspondence with hard assignments is extremely difficult, especially when matching two point clouds with many locally similar features. This paper proposes an optimization method that retains all possible correspondences for each keypoint when matching a partial point cloud to a complete point cloud. These uncertain correspondences are then gradually updated with the estimated rigid transformation by considering the matching cost. Moreover, we propose a new point feature descriptor that measures the similarity between local point cloud regions. Extensive experiments show that our method outperforms the state-of-the-art (SoTA) methods even when matching different objects within the same category. Notably, our method outperforms the SoTA methods when registering real-world noisy depth images to a template shape by up to 20% performance. △ Less

Submitted 28 September, 2022; originally announced September 2022.

Comments: 8 pages

arXiv:2208.11566 [pdf, other]

doi 10.1109/IROS.2018.8594304

Apple Counting using Convolutional Neural Networks

Authors: Nicolai Häni, Pravakar Roy, Volkan Isler

Abstract: Estimating accurate and reliable fruit and vegetable counts from images in real-world settings, such as orchards, is a challenging problem that has received significant recent attention. Estimating fruit counts before harvest provides useful information for logistics planning. While considerable progress has been made toward fruit detection, estimating the actual counts remains challenging. In pra… ▽ More Estimating accurate and reliable fruit and vegetable counts from images in real-world settings, such as orchards, is a challenging problem that has received significant recent attention. Estimating fruit counts before harvest provides useful information for logistics planning. While considerable progress has been made toward fruit detection, estimating the actual counts remains challenging. In practice, fruits are often clustered together. Therefore, methods that only detect fruits fail to offer general solutions to estimate accurate fruit counts. Furthermore, in horticultural studies, rather than a single yield estimate, finer information such as the distribution of the number of apples per cluster is desirable. In this work, we formulate fruit counting from images as a multi-class classification problem and solve it by training a Convolutional Neural Network. We first evaluate the per-image accuracy of our method and compare it with a state-of-the-art method based on Gaussian Mixture Models over four test datasets. Even though the parameters of the Gaussian Mixture Model-based method are specifically tuned for each dataset, our network outperforms it in three out of four datasets with a maximum of 94\% accuracy. Next, we use the method to estimate the yield for two datasets for which we have ground truth. Our method achieved 96-97\% accuracies. For additional details please see our video here: https://www.youtube.com/watch?v=Le0mb5P-SYc}{https://www.youtube.com/watch?v=Le0mb5P-SYc. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Journal ref: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2208.11538 [pdf, other]

doi 10.1109/IROS.2016.7759456

Visual Servoing in Orchard Settings

Authors: Nicolai Häni, Volkan Isler

Abstract: We present a general framework for accurate positioning of sensors and end effectors in farm settings using a camera mounted on a robotic manipulator. Our main contribution is a visual servoing approach based on a new and robust feature tracking algorithm. Results from field experiments performed at an apple orchard demonstrate that our approach converges to a given termination criterion even unde… ▽ More We present a general framework for accurate positioning of sensors and end effectors in farm settings using a camera mounted on a robotic manipulator. Our main contribution is a visual servoing approach based on a new and robust feature tracking algorithm. Results from field experiments performed at an apple orchard demonstrate that our approach converges to a given termination criterion even under environmental influences such as strong winds, varying illumination conditions and partial occlusion of the target object. Further, we show experimentally that the system converges to the desired view for a wide range of initial conditions. This approach opens possibilities for new applications such as automated fruit inspection, fruit picking or precise pesticide application. △ Less

Submitted 24 August, 2022; originally announced August 2022.

Journal ref: In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 2946-2953)

arXiv:2011.08319 [pdf, other]

Multi-Step Recurrent Q-Learning for Robotic Velcro Peeling

Authors: Jiacheng Yuan, Nicolai Häni, Volkan Isler

Abstract: Learning object manipulation is a critical skill for robots to interact with their environment. Even though there has been significant progress in robotic manipulation of rigid objects, interacting with non-rigid objects remains challenging for robots. In this work, we introduce velcro peeling as a representative application for robotic manipulation of non-rigid objects in complex environments. We… ▽ More Learning object manipulation is a critical skill for robots to interact with their environment. Even though there has been significant progress in robotic manipulation of rigid objects, interacting with non-rigid objects remains challenging for robots. In this work, we introduce velcro peeling as a representative application for robotic manipulation of non-rigid objects in complex environments. We present a method of learning force-based manipulation from noisy and incomplete sensor inputs in partially observable environments by modeling long term dependencies between measurements with a multi-step deep recurrent network. We present experiments on a real robot to show the necessity of modeling these long term dependencies and validate our approach in simulation and robot experiments. Our results show that using tactile input enables the robot to overcome geometric uncertainties present in the environment with high fidelity in ~90% of all cases, outperforming the baselines by a large margin. △ Less

Submitted 22 February, 2022; v1 submitted 16 November, 2020; originally announced November 2020.

arXiv:2007.15627 [pdf, other]

Continuous Object Representation Networks: Novel View Synthesis without Target View Supervision

Authors: Nicolai Häni, Selim Engin, Jun-Jee Chao, Volkan Isler

Abstract: Novel View Synthesis (NVS) is concerned with synthesizing views under camera viewpoint transformations from one or multiple input images. NVS requires explicit reasoning about 3D object structure and unseen parts of the scene to synthesize convincing results. As a result, current approaches typically rely on supervised training with either ground truth 3D models or multiple target images. We propo… ▽ More Novel View Synthesis (NVS) is concerned with synthesizing views under camera viewpoint transformations from one or multiple input images. NVS requires explicit reasoning about 3D object structure and unseen parts of the scene to synthesize convincing results. As a result, current approaches typically rely on supervised training with either ground truth 3D models or multiple target images. We propose Continuous Object Representation Networks (CORN), a conditional architecture that encodes an input image's geometry and appearance that map to a 3D consistent scene representation. We can train CORN with only two source images per object by combining our model with a neural renderer. A key feature of CORN is that it requires no ground truth 3D models or target view supervision. Regardless, CORN performs well on challenging tasks such as novel view synthesis and single-view 3D reconstruction and achieves performance comparable to state-of-the-art approaches that use direct supervision. For up-to-date information, data, and code, please see our project page: https://nicolaihaeni.github.io/corn/. △ Less

Submitted 23 October, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

Comments: To appear at Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

arXiv:1909.06441 [pdf, other]

doi 10.1109/LRA.2020.2965061

MinneApple: A Benchmark Dataset for Apple Detection and Segmentation

Authors: Nicolai Häni, Pravakar Roy, Volkan Isler

Abstract: In this work, we present a new dataset to advance the state-of-the-art in fruit detection, segmentation, and counting in orchard environments. While there has been significant recent interest in solving these problems, the lack of a unified dataset has made it difficult to compare results. We hope to enable direct comparisons by providing a large variety of high-resolution images acquired in orcha… ▽ More In this work, we present a new dataset to advance the state-of-the-art in fruit detection, segmentation, and counting in orchard environments. While there has been significant recent interest in solving these problems, the lack of a unified dataset has made it difficult to compare results. We hope to enable direct comparisons by providing a large variety of high-resolution images acquired in orchards, together with human annotations of the fruit on trees. The fruits are labeled using polygonal masks for each object instance to aid in precise object detection, localization, and segmentation. Additionally, we provide data for patch-based counting of clustered fruits. Our dataset contains over 41, 000 annotated object instances in 1000 images. We present a detailed overview of the dataset together with baseline performance analysis for bounding box detection, segmentation, and fruit counting as well as representative results for yield estimation. We make this dataset publicly available and host a CodaLab challenge to encourage comparison of results on a common dataset. To download the data and learn more about MinneApple please see the project website: http://rsn.cs.umn.edu/index.php/MinneApple. Up to date information is available online. △ Less

Submitted 3 January, 2020; v1 submitted 13 September, 2019; originally announced September 2019.

arXiv:1904.02203 [pdf, other]

Semantics-Aware Image to Image Translation and Domain Transfer

Authors: Pravakar Roy, Nicolai Häni, Jun-Jee Chao, Volkan Isler

Abstract: Image to image translation is the problem of transferring an image from a source domain to a different (but related) target domain. We present a new unsupervised image to image translation technique that leverages the underlying semantic information for object transfiguration and domain transfer tasks. Specifically, we present a generative adversarial learning approach that jointly translates imag… ▽ More Image to image translation is the problem of transferring an image from a source domain to a different (but related) target domain. We present a new unsupervised image to image translation technique that leverages the underlying semantic information for object transfiguration and domain transfer tasks. Specifically, we present a generative adversarial learning approach that jointly translates images and labels from a source domain to a target domain. Our main technical contribution is an encoder-decoder based network architecture that jointly encodes the image and its underlying semantics and translates both individually to the target domain. Additionally, we propose object transfiguration and cross-domain semantic consistency losses that preserve semantic labels. Through extensive experimental evaluation, we demonstrate the effectiveness of our approach as compared to the state-of-the-art methods on unsupervised image-to-image translation, domain adaptation, and object transfiguration. △ Less

Submitted 1 March, 2021; v1 submitted 3 April, 2019; originally announced April 2019.

arXiv:1810.09499 [pdf, other]

doi 10.1002/rob.21902

A Comparative Study of Fruit Detection and Counting Methods for Yield Map** in Apple Orchards

Authors: Nicolai Häni, Pravakar Roy, Volkan Isler

Abstract: We present new methods for apple detection and counting based on recent deep learning approaches and compare them with state-of-the-art results based on classical methods. Our goal is to quantify performance improvements by neural network-based methods compared to methods based on classical approaches. Additionally, we introduce a complete system for counting apples in an entire row. This task is… ▽ More We present new methods for apple detection and counting based on recent deep learning approaches and compare them with state-of-the-art results based on classical methods. Our goal is to quantify performance improvements by neural network-based methods compared to methods based on classical approaches. Additionally, we introduce a complete system for counting apples in an entire row. This task is challenging as it requires tracking fruits in images from both sides of the row. We evaluate the performances of three fruit detection methods and two fruit counting methods on six datasets. Results indicate that the classical detection approach still outperforms the deep learning based methods in the majority of the datasets. For fruit counting though, the deep learning based approach performs better for all of the datasets. Combining the classical detection method together with the neural network based counting approach, we achieve remarkable yield accuracies ranging from 95.56% to 97.83%. △ Less

Submitted 6 March, 2019; v1 submitted 22 October, 2018; originally announced October 2018.

Comments: 28 pages

arXiv:1705.08374 [pdf, other]

Classification of Aerial Photogrammetric 3D Point Clouds

Authors: Carlos Becker, Nicolai Häni, Elena Rosinskaya, Emmanuel d'Angelo, Christoph Strecha

Abstract: We present a powerful method to extract per-point semantic class labels from aerialphotogrammetry data. Labeling this kind of data is important for tasks such as environmental modelling, object classification and scene understanding. Unlike previous point cloud classification methods that rely exclusively on geometric features, we show that incorporating color information yields a significant incr… ▽ More We present a powerful method to extract per-point semantic class labels from aerialphotogrammetry data. Labeling this kind of data is important for tasks such as environmental modelling, object classification and scene understanding. Unlike previous point cloud classification methods that rely exclusively on geometric features, we show that incorporating color information yields a significant increase in accuracy in detecting semantic classes. We test our classification method on three real-world photogrammetry datasets that were generated with Pix4Dmapper Pro, and with varying point densities. We show that off-the-shelf machine learning techniques coupled with our new features allow us to train highly accurate classifiers that generalize well to unseen data, processing point clouds containing 10 million points in less than 3 minutes on a desktop computer. △ Less

Submitted 23 May, 2017; originally announced May 2017.

Comments: ISPRS 2017

Showing 1–10 of 10 results for author: Häni, N