-
ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part Fusion
Authors:
Tianhan Xu,
Takuya Ikeda,
Koichi Nishiwaki
Abstract:
In this paper, we propose a method to segment and recover a static, clean background and multiple 360$^\circ$ objects from observations of scenes at different timestamps. Recent works have used neural radiance fields to model 3D scenes and improved the quality of novel view synthesis, while few studies have focused on modeling the invisible or occluded parts of the training images. These under-rec…
▽ More
In this paper, we propose a method to segment and recover a static, clean background and multiple 360$^\circ$ objects from observations of scenes at different timestamps. Recent works have used neural radiance fields to model 3D scenes and improved the quality of novel view synthesis, while few studies have focused on modeling the invisible or occluded parts of the training images. These under-reconstruction parts constrain both scene editing and rendering view selection, thereby limiting their utility for synthetic data generation for downstream tasks. Our basic idea is that, by observing the same set of objects in various arrangement, so that parts that are invisible in one scene may become visible in others. By fusing the visible parts from each scene, occlusion-free rendering of both background and foreground objects can be achieved.
We decompose the multi-scene fusion task into two main components: (1) objects/background segmentation and alignment, where we leverage point cloud-based methods tailored to our novel problem formulation; (2) radiance fields fusion, where we introduce visibility field to quantify the visible information of radiance fields, and propose visibility-aware rendering for the fusion of series of scenes, ultimately obtaining clean background and 360$^\circ$ object rendering. Comprehensive experiments were conducted on synthetic and real datasets, and the results demonstrate the effectiveness of our method.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval
Authors:
Kento Tatsuno,
Daisuke Miyashita,
Taiga Ikeda,
Kiyoshi Ishiyama,
Kazunari Sumiyoshi,
Jun Deguchi
Abstract:
In approximate nearest neighbor search (ANNS) methods based on approximate proximity graphs, DiskANN achieves good recall-speed balance for large-scale datasets using both of RAM and storage. Despite it claims to save memory usage by loading compressed vectors by product quantization (PQ), its memory usage increases in proportion to the scale of datasets. In this paper, we propose All-in-Storage A…
▽ More
In approximate nearest neighbor search (ANNS) methods based on approximate proximity graphs, DiskANN achieves good recall-speed balance for large-scale datasets using both of RAM and storage. Despite it claims to save memory usage by loading compressed vectors by product quantization (PQ), its memory usage increases in proportion to the scale of datasets. In this paper, we propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads the compressed vectors to storage. Our method achieves $\sim$10 MB memory usage in query search even with billion-scale datasets with minor performance degradation. AiSAQ also reduces the index load time before query search, which enables the index switch between muitiple billion-scale datasets and significantly enhances the flexibility of retrieval-augmented generation (RAG). This method is applicable to all graph-based ANNS algorithms and can be combined with higher-spec ANNS methods in the future.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal Category-level Pose Estimation
Authors:
Takuya Ikeda,
Sergey Zakharov,
Tianyi Ko,
Muhammad Zubair Irshad,
Robert Lee,
Katherine Liu,
Rares Ambrus,
Koichi Nishiwaki
Abstract:
This paper addresses the challenging problem of category-level pose estimation. Current state-of-the-art methods for this task face challenges when dealing with symmetric objects and when attempting to generalize to new environments solely through synthetic data training. In this work, we address these challenges by proposing a probabilistic model that relies on diffusion to estimate dense canonic…
▽ More
This paper addresses the challenging problem of category-level pose estimation. Current state-of-the-art methods for this task face challenges when dealing with symmetric objects and when attempting to generalize to new environments solely through synthetic data training. In this work, we address these challenges by proposing a probabilistic model that relies on diffusion to estimate dense canonical maps crucial for recovering partial object shapes as well as establishing correspondences essential for pose estimation. Furthermore, we introduce critical components to enhance performance by leveraging the strength of the diffusion models with multi-modal input representations. We demonstrate the effectiveness of our method by testing it on a range of real datasets. Despite being trained solely on our generated synthetic data, our approach achieves state-of-the-art performance and unprecedented generalization qualities, outperforming baselines, even those specifically trained on the target domain.
△ Less
Submitted 5 March, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Gravity-aware Grasp Generation with Implicit Grasp Mode Selection for Underactuated Hands
Authors:
Tianyi Ko,
Takuya Ikeda,
Thomas Stewart,
Robert Lee,
Koichi Nishiwaki
Abstract:
Learning-based grasp detectors typically assume a precision grasp, where each finger only has one contact point, and estimate the grasp probability. In this work, we propose a data generation and learning pipeline that can leverage power gras**, which has more contact points with an envelo** configuration and is robust against both positioning error and force disturbance. To train a grasp dete…
▽ More
Learning-based grasp detectors typically assume a precision grasp, where each finger only has one contact point, and estimate the grasp probability. In this work, we propose a data generation and learning pipeline that can leverage power gras**, which has more contact points with an envelo** configuration and is robust against both positioning error and force disturbance. To train a grasp detector to prioritize power gras** while still kee** precision gras** as the secondary choice, we propose to train the network against the magnitude of disturbance in the gravity direction a grasp can resist (gravity-rejection score) rather than the binary classification of success. We also provide an efficient data generation pipeline for a dataset with gravity-rejection score annotation. In addition to thorough ablation studies, quantitative evaluation in both simulation and real-robot clarifies the significant improvement in our approach, especially when the objects are heavy.
△ Less
Submitted 28 February, 2024; v1 submitted 18 December, 2023;
originally announced December 2023.
-
GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence
Authors:
Pengyuan Wang,
Takuya Ikeda,
Robert Lee,
Koichi Nishiwaki
Abstract:
Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics. Recently, deep-learning-based approaches have made great progress, but are typically hindered by the need for large datasets of either pose-labelled real images or carefully tuned photorealistic simulators. This can be avoided by using only geometry inputs such as depth images to…
▽ More
Category-level pose estimation is a challenging task with many potential applications in computer vision and robotics. Recently, deep-learning-based approaches have made great progress, but are typically hindered by the need for large datasets of either pose-labelled real images or carefully tuned photorealistic simulators. This can be avoided by using only geometry inputs such as depth images to reduce the domain-gap but these approaches suffer from a lack of semantic information, which can be vital in the pose estimation problem. To resolve this conflict, we propose to utilize both geometric and semantic features obtained from a pre-trained foundation model.Our approach projects 2D features from this foundation model into 3D for a single object model per category, and then performs matching against this for new single view observations of unseen object instances with a trained matching network. This requires significantly less data to train than prior methods since the semantic features are robust to object texture and appearance. We demonstrate this with a rich evaluation, showing improved performance over prior methods with a fraction of the data required.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
A Probabilistic Rotation Representation for Symmetric Shapes With an Efficiently Computable Bingham Loss Function
Authors:
Hiroya Sato,
Takuya Ikeda,
Koichi Nishiwaki
Abstract:
In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation, it cannot represent the ambiguity of the observation. In order to handle the ambiguity, the Bingham distribution is one promising solution. However, it requires complicated calculation when yielding the negative log-likelihood (NLL) loss. An a…
▽ More
In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation, it cannot represent the ambiguity of the observation. In order to handle the ambiguity, the Bingham distribution is one promising solution. However, it requires complicated calculation when yielding the negative log-likelihood (NLL) loss. An alternative easy-to-implement loss function has been proposed to avoid complex computations but has difficulty expressing symmetric distribution. In this paper, we introduce a fast-computable and easy-to-implement NLL loss function for Bingham distribution. We also create the inference network and show that our loss function can capture the symmetric property of target objects from their point clouds.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
Probabilistic Rotation Representation With an Efficiently Computable Bingham Loss Function and Its Application to Pose Estimation
Authors:
Hiroya Sato,
Takuya Ikeda,
Koichi Nishiwaki
Abstract:
In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation of 6D pose, it cannot represent an uncertainty of the observation. In order to handle the uncertainty, Bingham distribution is one promising solution because this has suitable features, such as a smooth representation over SO(3), in addition to…
▽ More
In recent years, a deep learning framework has been widely used for object pose estimation. While quaternion is a common choice for rotation representation of 6D pose, it cannot represent an uncertainty of the observation. In order to handle the uncertainty, Bingham distribution is one promising solution because this has suitable features, such as a smooth representation over SO(3), in addition to the ambiguity representation. However, it requires the complex computation of the normalizing constants. This is the bottleneck of loss computation in training neural networks based on Bingham representation. As such, we propose a fast-computable and easy-to-implement loss function for Bingham distribution. We also show not only to examine the parametrization of Bingham distribution but also an application based on our loss function.
△ Less
Submitted 8 March, 2022;
originally announced March 2022.
-
Sim2Real Instance-Level Style Transfer for 6D Pose Estimation
Authors:
Takuya Ikeda,
Suomi Tanishige,
Ayako Amma,
Michael Sudano,
Hervé Audren,
Koichi Nishiwaki
Abstract:
In recent years, synthetic data has been widely used in the training of 6D pose estimation networks, in part because it automatically provides perfect annotation at low cost. However, there are still non-trivial domain gaps, such as differences in textures/materials, between synthetic and real data. These gaps have a measurable impact on performance. To solve this problem, we introduce a simulatio…
▽ More
In recent years, synthetic data has been widely used in the training of 6D pose estimation networks, in part because it automatically provides perfect annotation at low cost. However, there are still non-trivial domain gaps, such as differences in textures/materials, between synthetic and real data. These gaps have a measurable impact on performance. To solve this problem, we introduce a simulation to reality (sim2real) instance-level style transfer for 6D pose estimation network training. Our approach transfers the style of target objects individually, from synthetic to real, without human intervention. This improves the quality of synthetic data for training pose estimation networks. We also propose a complete pipeline from data collection to the training of a pose estimation network and conduct extensive evaluation on a real-world robotic platform. Our evaluation shows significant improvement achieved by our method in both pose estimation performance and the realism of images adapted by the style transfer.
△ Less
Submitted 3 March, 2022;
originally announced March 2022.
-
Soft-Bubble grippers for robust and perceptive manipulation
Authors:
Naveen Kuppuswamy,
Alex Alspach,
Avinash Uttamchandani,
Sam Creasey,
Takuya Ikeda,
Russ Tedrake
Abstract:
Manipulation in cluttered environments like homes requires stable grasps, precise placement and robustness against external contact. We present the Soft-Bubble gripper system with a highly compliant grip** surface and dense-geometry visuotactile sensing, capable of multiple kinds of tactile perception. We first present various mechanical design advances and a fabrication technique to deposit cus…
▽ More
Manipulation in cluttered environments like homes requires stable grasps, precise placement and robustness against external contact. We present the Soft-Bubble gripper system with a highly compliant grip** surface and dense-geometry visuotactile sensing, capable of multiple kinds of tactile perception. We first present various mechanical design advances and a fabrication technique to deposit custom patterns to the internal surface of the sensor that enable tracking of shear-induced displacement of the manipuland. The depth maps output by the internal imaging sensor are used in an in-hand proximity pose estimation framework -- the method better captures distances to corners or edges on the manipuland geometry. We also extend our previous work on tactile classification and integrate the system within a robust manipulation pipeline for cluttered home environments. The capabilities of the proposed system are demonstrated through robust execution multiple real-world manipulation tasks. A video of the system in action can be found here: [https://youtu.be/G_wBsbQyBfc].
△ Less
Submitted 28 April, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Inverted-File k-Means Clustering: Performance Analysis
Authors:
Kazuo Aoyama,
Kazumi Saito,
Tetsuo Ikeda
Abstract:
This paper presents an inverted-file k-means clustering algorithm (IVF) suitable for a large-scale sparse data set with potentially numerous classes. Given such a data set, IVF efficiently works at high-speed and with low memory consumption, which keeps the same solution as a standard Lloyd's algorithm. The high performance arises from two distinct data representations. One is a sparse expression…
▽ More
This paper presents an inverted-file k-means clustering algorithm (IVF) suitable for a large-scale sparse data set with potentially numerous classes. Given such a data set, IVF efficiently works at high-speed and with low memory consumption, which keeps the same solution as a standard Lloyd's algorithm. The high performance arises from two distinct data representations. One is a sparse expression for both the object and mean feature vectors. The other is an inverted-file data structure for a set of the mean feature vectors. To confirm the effect of these representations, we design three algorithms using distinct data structures and expressions for comparison. We experimentally demonstrate that IVF achieves better performance than the designed algorithms when they are applied to large-scale real document data sets in a modern computer system equipped with superscalar out-of-order processors and a deep hierarchical memory system. We also introduce a simple yet practical clock-cycle per instruction (CPI) model for speed-performance analysis. Analytical results reveal that IVF suppresses three performance degradation factors: the numbers of cache misses, branch mispredictions, and the completed instructions.
△ Less
Submitted 20 February, 2020;
originally announced February 2020.
-
Deep-learning-based identification of odontogenic keratocysts in hematoxylin- and eosin-stained jaw cyst specimens
Authors:
Kei Sakamoto,
Kei-ichi Morita,
Tohru Ikeda,
Kou Kayamori
Abstract:
The aim of this study was to develop a digital histopathology system for identifying odontogenic keratocysts in hematoxylin- and eosin-stained tissue specimens of jaw cysts. Approximately 5000 microscopy images with 400$\times$ magnification were obtained from 199 odontogenic keratocysts, 208 dentigerous cysts, and 55 radicular cysts. A proportion of these images were used to make training patches…
▽ More
The aim of this study was to develop a digital histopathology system for identifying odontogenic keratocysts in hematoxylin- and eosin-stained tissue specimens of jaw cysts. Approximately 5000 microscopy images with 400$\times$ magnification were obtained from 199 odontogenic keratocysts, 208 dentigerous cysts, and 55 radicular cysts. A proportion of these images were used to make training patches, which were annotated as belonging to one of the following three classes: keratocysts, non-keratocysts, and stroma. The patches for the cysts contained the complete lining epithelium, with the cyst cavity being present on the upper side. The convolutional neural network (CNN) VGG16 was finetuned to this dataset. The trained CNN could recognize the basal cell palisading pattern, which is the definitive criterion for diagnosing keratocysts. Some of the remaining images were scanned and analyzed by the trained CNN, whose output was then used to train another CNN for binary classification (keratocyst or not). The area under the receiver operating characteristics curve for the entire algorithm was 0.997 for the test dataset. Thus, the proposed patch classification strategy is usable for automated keratocyst diagnosis. However, further optimization must be performed to make it suitable for practical use.
△ Less
Submitted 12 January, 2019;
originally announced January 2019.