Search | arXiv e-print repository

arXiv:2310.19798 [pdf, other]

doi 10.1145/3623263.3623364

Gradient-Based Dovetail Joint Shape Optimization for Stiffness

Authors: Xingyuan Sun, Chenyue Cai, Ryan P. Adams, Szymon Rusinkiewicz

Abstract: It is common to manufacture an object by decomposing it into parts that can be assembled. This decomposition is often required by size limits of the machine, the complex structure of the shape, etc. To make it possible to easily assemble the final object, it is often desirable to design geometry that enables robust connections between the subcomponents. In this project, we study the task of doveta… ▽ More It is common to manufacture an object by decomposing it into parts that can be assembled. This decomposition is often required by size limits of the machine, the complex structure of the shape, etc. To make it possible to easily assemble the final object, it is often desirable to design geometry that enables robust connections between the subcomponents. In this project, we study the task of dovetail-joint shape optimization for stiffness using gradient-based optimization. This optimization requires a differentiable simulator that is capable of modeling the contact between the two parts of a joint, making it possible to reason about the gradient of the stiffness with respect to shape parameters. Our simulation approach uses a penalty method that alternates between optimizing each side of the joint, using the adjoint method to compute gradients. We test our method by optimizing the joint shapes in three different joint shape spaces, and evaluate optimized joint shapes in both simulation and real-world tests. The experiments show that optimized joint shapes achieve higher stiffness, both synthetically and in real-world tests. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: ACM SCF 2023: Proceedings of the 8th Annual ACM Symposium on Computational Fabrication

arXiv:2306.12652 [pdf, other]

UltraGlove: Hand Pose Estimation with Mems-Ultrasonic Sensors

Authors: Qiang Zhang, Yuanqiao Lin, Yubin Lin, Szymon Rusinkiewicz

Abstract: Hand tracking is an important aspect of human-computer interaction and has a wide range of applications in extended reality devices. However, current hand motion capture methods suffer from various limitations. For instance, visual-based hand pose estimation is susceptible to self-occlusion and changes in lighting conditions, while IMU-based tracking gloves experience significant drift and are not… ▽ More Hand tracking is an important aspect of human-computer interaction and has a wide range of applications in extended reality devices. However, current hand motion capture methods suffer from various limitations. For instance, visual-based hand pose estimation is susceptible to self-occlusion and changes in lighting conditions, while IMU-based tracking gloves experience significant drift and are not resistant to external magnetic field interference. To address these issues, we propose a novel and low-cost hand-tracking glove that utilizes several MEMS-ultrasonic sensors attached to the fingers, to measure the distance matrix among the sensors. Our lightweight deep network then reconstructs the hand pose from the distance matrix. Our experimental results demonstrate that this approach is both accurate, size-agnostic, and robust to external interference. We also show the design logic for the sensor selection, sensor configurations, circuit diagram, as well as model architecture. △ Less

Submitted 14 September, 2023; v1 submitted 21 June, 2023; originally announced June 2023.

arXiv:2306.07449 [pdf, other]

doi 10.1145/3588432.3591526

Constructing Printable Surfaces with View-Dependent Appearance

Authors: Maxine Perroni-Scharf, Szymon Rusinkiewicz

Abstract: We present a method for the digital fabrication of surfaces whose appearance varies based on viewing direction. The surfaces are constructed from a mesh of bars arranged in a self-occluding colored heightfield that creates the desired view-dependent effects. At the heart of our method is a novel and simple differentiable rendering algorithm specifically designed to render colored 3D heightfields a… ▽ More We present a method for the digital fabrication of surfaces whose appearance varies based on viewing direction. The surfaces are constructed from a mesh of bars arranged in a self-occluding colored heightfield that creates the desired view-dependent effects. At the heart of our method is a novel and simple differentiable rendering algorithm specifically designed to render colored 3D heightfields and enable efficient calculation of the gradient of appearance with respect to heights and colors. This algorithm forms the basis of a coarse-to-fine ML-based optimization process that adjusts the heights and colors of the strips to minimize the loss between the desired and real surface appearance from each viewpoint, deriving meshes that can then be fabricated using a 3D printer. Using our method, we demonstrate both synthetic and real-world fabricated results with view-dependent appearance. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: 10 pages, 16 figures

arXiv:2305.05658 [pdf, other]

doi 10.1007/s10514-023-10139-z

TidyBot: Personalized Robot Assistance with Large Language Models

Authors: Jimmy Wu, Rika Antonova, Adam Kan, Marion Lepert, Andy Zeng, Shuran Song, Jeannette Bohg, Szymon Rusinkiewicz, Thomas Funkhouser

Abstract: For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people's preferences can vary greatly d… ▽ More For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people's preferences can vary greatly depending on personal taste or cultural background. For instance, one person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We aim to build systems that can learn such preferences from just a handful of examples via prior interactions with a particular person. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models (LLMs) to infer generalized user preferences that are broadly applicable to future interactions. This approach enables fast adaptation and achieves 91.2% accuracy on unseen objects in our benchmark dataset. We also demonstrate our approach on a real-world mobile manipulator called TidyBot, which successfully puts away 85.0% of objects in real-world test scenarios. △ Less

Submitted 11 October, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

Comments: Accepted to Autonomous Robots (AuRo) - Special Issue: Large Language Models in Robotics, 2023 and IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023. Project page: https://tidybot.cs.princeton.edu

arXiv:2304.03763 [pdf, other]

Clutter Detection and Removal in 3D Scenes with View-Consistent Inpainting

Authors: Fangyin Wei, Thomas Funkhouser, Szymon Rusinkiewicz

Abstract: Removing clutter from scenes is essential in many applications, ranging from privacy-concerned content filtering to data augmentation. In this work, we present an automatic system that removes clutter from 3D scenes and inpaints with coherent geometry and texture. We propose techniques for its two key components: 3D segmentation from shared properties and 3D inpainting, both of which are important… ▽ More Removing clutter from scenes is essential in many applications, ranging from privacy-concerned content filtering to data augmentation. In this work, we present an automatic system that removes clutter from 3D scenes and inpaints with coherent geometry and texture. We propose techniques for its two key components: 3D segmentation from shared properties and 3D inpainting, both of which are important problems. The definition of 3D scene clutter (frequently-moving objects) is not well captured by commonly-studied object categories in computer vision. To tackle the lack of well-defined clutter annotations, we group noisy fine-grained labels, leverage virtual rendering, and impose an instance-level area-sensitive loss. Once clutter is removed, we inpaint geometry and texture in the resulting holes by merging inpainted RGB-D images. This requires novel voting and pruning strategies that guarantee multi-view consistency across individually inpainted images for mesh reconstruction. Experiments on ScanNet and Matterport dataset show that our method outperforms baselines for clutter segmentation and 3D inpainting, both visually and quantitatively. △ Less

Submitted 1 September, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: 18 pages. ICCV 2023. Project page: https://weify627.github.io/clutter/

arXiv:2205.16008 [pdf, other]

doi 10.1145/3623263.3623356

More Stiffness with Less Fiber: End-to-End Fiber Path Optimization for 3D-Printed Composites

Authors: Xingyuan Sun, Geoffrey Roeder, Tianju Xue, Ryan P. Adams, Szymon Rusinkiewicz

Abstract: In 3D printing, stiff fibers (e.g., carbon fiber) can reinforce thermoplastic polymers with limited stiffness. However, existing commercial digital manufacturing software only provides a few simple fiber layout algorithms, which solely use the geometry of the shape. In this work, we build an automated fiber path planning algorithm that maximizes the stiffness of a 3D print given specified external… ▽ More In 3D printing, stiff fibers (e.g., carbon fiber) can reinforce thermoplastic polymers with limited stiffness. However, existing commercial digital manufacturing software only provides a few simple fiber layout algorithms, which solely use the geometry of the shape. In this work, we build an automated fiber path planning algorithm that maximizes the stiffness of a 3D print given specified external loads. We formalize this as an optimization problem: an objective function is designed to measure the stiffness of the object while regularizing certain properties of fiber paths (e.g., smoothness). To initialize each fiber path, we use finite element analysis to calculate the stress field on the object and greedily "walk" in the direction of the stress field. We then apply a gradient-based optimization algorithm that uses the adjoint method to calculate the gradient of stiffness with respect to fiber layout. We compare our approach, in both simulation and real-world experiments, to three baselines: (1) concentric fiber rings generated by Eiger, a leading digital manufacturing software package developed by Markforged, (2) greedy extraction on the simulated stress field (i.e., our method without optimization), and (3) the greedy algorithm on a fiber orientation field calculated by smoothing the simulated stress fields. The results show that objects with fiber paths generated by our algorithm achieve greater stiffness while using less fiber than the baselines--our algorithm improves the Pareto frontier of object stiffness as a function of fiber usage. Ablation studies show that the smoothing regularizer is needed for feasible fiber paths and stability of optimization, and multi-resolution optimization helps reduce the running time compared to single-resolution optimization. △ Less

Submitted 29 October, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

Comments: ACM SCF 2023: Proceedings of the 8th Annual ACM Symposium on Computational Fabrication

arXiv:2205.14330 [pdf, other]

Differentiable Point-Based Radiance Fields for Efficient View Synthesis

Authors: Qiang Zhang, Seung-Hwan Baek, Szymon Rusinkiewicz, Felix Heide

Abstract: We propose a differentiable rendering algorithm for efficient novel view synthesis. By departing from volume-based representations in favor of a learned point representation, we improve on existing methods more than an order of magnitude in memory and runtime, both in training and inference. The method begins with a uniformly-sampled random point cloud and learns per-point position and view-depend… ▽ More We propose a differentiable rendering algorithm for efficient novel view synthesis. By departing from volume-based representations in favor of a learned point representation, we improve on existing methods more than an order of magnitude in memory and runtime, both in training and inference. The method begins with a uniformly-sampled random point cloud and learns per-point position and view-dependent appearance, using a differentiable splat-based renderer to evolve the model to match a set of input images. Our method is up to 300x faster than NeRF in both training and inference, with only a marginal sacrifice in quality, while using less than 10~MB of memory for a static scene. For dynamic scenes, our method trains two orders of magnitude faster than STNeRF and renders at near interactive rate, while maintaining high image quality and temporal coherence even without imposing any temporal-coherency regularizers. △ Less

Submitted 5 July, 2023; v1 submitted 28 May, 2022; originally announced May 2022.

arXiv:2205.08525 [pdf, other]

Self-supervised Neural Articulated Shape and Appearance Models

Authors: Fangyin Wei, Rohan Chabra, Lingni Ma, Christoph Lassner, Michael Zollhöfer, Szymon Rusinkiewicz, Chris Sweeney, Richard Newcombe, Mira Slavcheva

Abstract: Learning geometry, motion, and appearance priors of object classes is important for the solution of a large variety of computer vision problems. While the majority of approaches has focused on static objects, dynamic objects, especially with controllable articulation, are less explored. We propose a novel approach for learning a representation of the geometry, appearance, and motion of a class of… ▽ More Learning geometry, motion, and appearance priors of object classes is important for the solution of a large variety of computer vision problems. While the majority of approaches has focused on static objects, dynamic objects, especially with controllable articulation, are less explored. We propose a novel approach for learning a representation of the geometry, appearance, and motion of a class of articulated objects given only a set of color images as input. In a self-supervised manner, our novel representation learns shape, appearance, and articulation codes that enable independent control of these semantic dimensions. Our model is trained end-to-end without requiring any articulation annotations. Experiments show that our approach performs well for different joint types, such as revolute and prismatic joints, as well as different combinations of these joints. Compared to state of the art that uses direct 3D supervision and does not output appearance, we recover more faithful geometry and appearance from 2D observations only. In addition, our representation enables a large variety of applications, such as few-shot reconstruction, the generation of novel articulations, and novel view-synthesis. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: 15 pages. CVPR 2022. Project page available at https://weify627.github.io/nasam/

arXiv:2204.02390 [pdf, other]

doi 10.1109/LRA.2022.3187833

Learning Pneumatic Non-Prehensile Manipulation with a Mobile Blower

Authors: Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

Abstract: We investigate pneumatic non-prehensile manipulation (i.e., blowing) as a means of efficiently moving scattered objects into a target receptacle. Due to the chaotic nature of aerodynamic forces, a blowing controller must (i) continually adapt to unexpected changes from its actions, (ii) maintain fine-grained control, since the slightest misstep can result in large unintended consequences (e.g., sc… ▽ More We investigate pneumatic non-prehensile manipulation (i.e., blowing) as a means of efficiently moving scattered objects into a target receptacle. Due to the chaotic nature of aerodynamic forces, a blowing controller must (i) continually adapt to unexpected changes from its actions, (ii) maintain fine-grained control, since the slightest misstep can result in large unintended consequences (e.g., scatter objects already in a pile), and (iii) infer long-range plans (e.g., move the robot to strategic blowing locations). We tackle these challenges in the context of deep reinforcement learning, introducing a multi-frequency version of the spatial action maps framework. This allows for efficient learning of vision-based policies that effectively combine high-level planning and low-level closed-loop control for dynamic mobile manipulation. Experiments show that our system learns efficient behaviors for the task, demonstrating in particular that blowing achieves better downstream performance than pushing, and that our policies improve performance over baselines. Moreover, we show that our system naturally encourages emergent specialization between the different subpolicies spanning low-level fine-grained control and high-level planning. On a real mobile robot equipped with a miniature air blower, we show that our simulation-trained policies transfer well to a real environment and can generalize to novel objects. △ Less

Submitted 30 June, 2022; v1 submitted 5 April, 2022; originally announced April 2022.

Comments: Accepted to IEEE Robotics and Automation Letters (RA-L), 2022 and IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022. Project page: https://learning-dynamic-manipulation.cs.princeton.edu

arXiv:2201.11819 [pdf, other]

doi 10.1145/3528223.3530144

Closed-Loop Control of Direct Ink Writing via Reinforcement Learning

Authors: Michal Piovarci, Michael Foshey, Jie Xu, Timothy Erps, Vahid Babaei, Piotr Didyk, Szymon Rusinkiewicz, Wojciech Matusik, Bernd Bickel

Abstract: Enabling additive manufacturing to employ a wide range of novel, functional materials can be a major boost to this technology. However, making such materials printable requires painstaking trial-and-error by an expert operator, as they typically tend to exhibit peculiar rheological or hysteresis properties. Even in the case of successfully finding the process parameters, there is no guarantee of p… ▽ More Enabling additive manufacturing to employ a wide range of novel, functional materials can be a major boost to this technology. However, making such materials printable requires painstaking trial-and-error by an expert operator, as they typically tend to exhibit peculiar rheological or hysteresis properties. Even in the case of successfully finding the process parameters, there is no guarantee of print-to-print consistency due to material differences between batches. These challenges make closed-loop feedback an attractive option where the process parameters are adjusted on-the-fly. There are several challenges for designing an efficient controller: the deposition parameters are complex and highly coupled, artifacts occur after long time horizons, simulating the deposition is computationally costly, and learning on hardware is intractable. In this work, we demonstrate the feasibility of learning a closed-loop control policy for additive manufacturing using reinforcement learning. We show that approximate, but efficient, numerical simulation is sufficient as long as it allows learning the behavioral patterns of deposition that translate to real-world experiences. In combination with reinforcement learning, our model can be used to discover control policies that outperform baseline controllers. Furthermore, the recovered policies have a minimal sim-to-real gap. We showcase this by applying our control policy in-vivo on a single-layer, direct ink writing printer. △ Less

Submitted 12 September, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Journal ref: ACM Transactions on Graphics 41(4) (2022)

arXiv:2106.09019 [pdf, other]

Amortized Synthesis of Constrained Configurations Using a Differentiable Surrogate

Authors: Xingyuan Sun, Tianju Xue, Szymon Rusinkiewicz, Ryan P. Adams

Abstract: In design, fabrication, and control problems, we are often faced with the task of synthesis, in which we must generate an object or configuration that satisfies a set of constraints while maximizing one or more objective functions. The synthesis problem is typically characterized by a physical process in which many different realizations may achieve the goal. This many-to-one map presents challeng… ▽ More In design, fabrication, and control problems, we are often faced with the task of synthesis, in which we must generate an object or configuration that satisfies a set of constraints while maximizing one or more objective functions. The synthesis problem is typically characterized by a physical process in which many different realizations may achieve the goal. This many-to-one map presents challenges to the supervised learning of feed-forward synthesis, as the set of viable designs may have a complex structure. In addition, the non-differentiable nature of many physical simulations prevents efficient direct optimization. We address both of these problems with a two-stage neural network architecture that we may consider to be an autoencoder. We first learn the decoder: a differentiable surrogate that approximates the many-to-one physical realization process. We then learn the encoder, which maps from goal to design, while using the fixed decoder to evaluate the quality of the realization. We evaluate the approach on two case studies: extruder path planning in additive manufacturing and constrained soft robot inverse kinematics. We compare our approach to direct optimization of the design using the learned surrogate, and to supervised learning of the synthesis problem. We find that our approach produces higher quality solutions than supervised learning, while being competitive in quality with direct optimization, at a greatly reduced computational cost. △ Less

Submitted 5 November, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021, Spotlight. Source code: https://github.com/xingyuansun/amorsyn

arXiv:2103.12710 [pdf, other]

doi 10.1109/ICRA48506.2021.9561359

Spatial Intention Maps for Multi-Agent Mobile Manipulation

Authors: Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Szymon Rusinkiewicz, Thomas Funkhouser

Abstract: The ability to communicate intention enables decentralized multi-agent robots to collaborate while performing physical tasks. In this work, we present spatial intention maps, a new intention representation for multi-agent vision-based deep reinforcement learning that improves coordination between decentralized mobile manipulators. In this representation, each agent's intention is provided to other… ▽ More The ability to communicate intention enables decentralized multi-agent robots to collaborate while performing physical tasks. In this work, we present spatial intention maps, a new intention representation for multi-agent vision-based deep reinforcement learning that improves coordination between decentralized mobile manipulators. In this representation, each agent's intention is provided to other agents, and rendered into an overhead 2D map aligned with visual observations. This synergizes with the recently proposed spatial action maps framework, in which state and action representations are spatially aligned, providing inductive biases that encourage emergent cooperative behaviors requiring spatial coordination, such as passing objects to each other or avoiding collisions. Experiments across a variety of multi-agent environments, including heterogeneous robot teams with different abilities (lifting, pushing, or throwing), show that incorporating spatial intention maps improves performance for different mobile manipulation tasks while significantly enhancing cooperative behaviors. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: To appear at IEEE International Conference on Robotics and Automation (ICRA), 2021. Project page: https://spatial-intention-maps.cs.princeton.edu/

arXiv:2011.04755 [pdf, other]

Learning to Infer Semantic Parameters for 3D Shape Editing

Authors: Fangyin Wei, Elena Sizikova, Avneesh Sud, Szymon Rusinkiewicz, Thomas Funkhouser

Abstract: Many applications in 3D shape design and augmentation require the ability to make specific edits to an object's semantic parameters (e.g., the pose of a person's arm or the length of an airplane's wing) while preserving as much existing details as possible. We propose to learn a deep network that infers the semantic parameters of an input shape and then allows the user to manipulate those paramete… ▽ More Many applications in 3D shape design and augmentation require the ability to make specific edits to an object's semantic parameters (e.g., the pose of a person's arm or the length of an airplane's wing) while preserving as much existing details as possible. We propose to learn a deep network that infers the semantic parameters of an input shape and then allows the user to manipulate those parameters. The network is trained jointly on shapes from an auxiliary synthetic template and unlabeled realistic models, ensuring robustness to shape variability while relieving the need to label realistic exemplars. At testing time, edits within the parameter space drive deformations to be applied to the original shape, which provides semantically-meaningful manipulation while preserving the details. This is in contrast to prior methods that either use autoencoders with a limited latent-space dimensionality, failing to preserve arbitrary detail, or drive deformations with purely-geometric controls, such as cages, losing the ability to update local part regions. Experiments with datasets of chairs, airplanes, and human bodies demonstrate that our method produces more natural edits than prior work. △ Less

Submitted 9 November, 2020; originally announced November 2020.

Comments: 22 pages and 19 figures including supplementary material; to be published in the proceedings of 3DV 2020

arXiv:2008.00485 [pdf, other]

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

Authors: Yifei Shi, Junwen Huang, Hongjia Zhang, Xin Xu, Szymon Rusinkiewicz, Kai Xu

Abstract: We study the problem of symmetry detection of 3D shapes from single-view RGB-D images, where severely missing data renders geometric detection approach infeasible. We propose an end-to-end deep neural network which is able to predict both reflectional and rotational symmetries of 3D objects present in the input RGB-D image. Directly training a deep model for symmetry prediction, however, can quick… ▽ More We study the problem of symmetry detection of 3D shapes from single-view RGB-D images, where severely missing data renders geometric detection approach infeasible. We propose an end-to-end deep neural network which is able to predict both reflectional and rotational symmetries of 3D objects present in the input RGB-D image. Directly training a deep model for symmetry prediction, however, can quickly run into the issue of overfitting. We adopt a multi-task learning approach. Aside from symmetry axis prediction, our network is also trained to predict symmetry correspondences. In particular, given the 3D points present in the RGB-D image, our network outputs for each 3D point its symmetric counterpart corresponding to a specific predicted symmetry. In addition, our network is able to detect for a given shape multiple symmetries of different types. We also contribute a benchmark of 3D symmetry detection based on single-view RGB-D images. Extensive evaluation on the benchmark demonstrates the strong generalization ability of our method, in terms of high accuracy of both symmetry axis prediction and counterpart estimation. In particular, our method is robust in handling unseen object instances with large variation in shape, multi-symmetry composition, as well as novel object categories. △ Less

Submitted 30 August, 2020; v1 submitted 2 August, 2020; originally announced August 2020.

Comments: 15 pages

Journal ref: ACM Transactions on Graphics (Proceeding of SIGGRAPH Asia), 2020

arXiv:2006.13188 [pdf, other]

Efficient Spatially Adaptive Convolution and Correlation

Authors: Thomas W. Mitchel, Benedict Brown, David Koller, Tim Weyrich, Szymon Rusinkiewicz, Michael Kazhdan

Abstract: Fast methods for convolution and correlation underlie a variety of applications in computer vision and graphics, including efficient filtering, analysis, and simulation. However, standard convolution and correlation are inherently limited to fixed filters: spatial adaptation is impossible without sacrificing efficient computation. In early work, Freeman and Adelson have shown how steerable filters… ▽ More Fast methods for convolution and correlation underlie a variety of applications in computer vision and graphics, including efficient filtering, analysis, and simulation. However, standard convolution and correlation are inherently limited to fixed filters: spatial adaptation is impossible without sacrificing efficient computation. In early work, Freeman and Adelson have shown how steerable filters can address this limitation, providing a way for rotating the filter as it is passed over the signal. In this work, we provide a general, representation-theoretic, framework that allows for spatially varying linear transformations to be applied to the filter. This framework allows for efficient implementation of extended convolution and correlation for transformation groups such as rotation (in 2D and 3D) and scale, and provides a new interpretation for previous methods including steerable filters and the generalized Hough transform. We present applications to pattern matching, image feature description, vector field visualization, and adaptive image filtering. △ Less

Submitted 28 July, 2020; v1 submitted 23 June, 2020; originally announced June 2020.

arXiv:2004.09141 [pdf, other]

doi 10.15607/RSS.2020.XVI.035

Spatial Action Maps for Mobile Manipulation

Authors: Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon Rusinkiewicz, Thomas Funkhouser

Abstract: Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e.g., step forward, turn left, turn right, etc.) from images of the current state (e.g., a bird's-eye view of a SLAM reconstruction). Instead, we show that it can be advantageous to learn with dense action representations defined in the same domain as the state. In this work,… ▽ More Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e.g., step forward, turn left, turn right, etc.) from images of the current state (e.g., a bird's-eye view of a SLAM reconstruction). Instead, we show that it can be advantageous to learn with dense action representations defined in the same domain as the state. In this work, we present "spatial action maps," in which the set of possible actions is represented by a pixel map (aligned with the input image of the current state), where each pixel represents a local navigational endpoint at the corresponding scene location. Using ConvNets to infer spatial action maps from state images, action predictions are thereby spatially anchored on local visual features in the scene, enabling significantly faster learning of complex behaviors for mobile manipulation tasks with reinforcement learning. In our experiments, we task a robot with pushing objects to a goal location, and find that policies learned with spatial action maps achieve much better performance than traditional alternatives. △ Less

Submitted 4 June, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

Comments: To appear at Robotics: Science and Systems (RSS), 2020. Project page: https://spatial-action-maps.cs.princeton.edu

arXiv:1906.11367 [pdf, other]

Accelerating Large-Kernel Convolution Using Summed-Area Tables

Authors: Linguang Zhang, Maciej Halber, Szymon Rusinkiewicz

Abstract: Expanding the receptive field to capture large-scale context is key to obtaining good performance in dense prediction tasks, such as human pose estimation. While many state-of-the-art fully-convolutional architectures enlarge the receptive field by reducing resolution using strided convolution or pooling layers, the most straightforward strategy is adopting large filters. This, however, is costly… ▽ More Expanding the receptive field to capture large-scale context is key to obtaining good performance in dense prediction tasks, such as human pose estimation. While many state-of-the-art fully-convolutional architectures enlarge the receptive field by reducing resolution using strided convolution or pooling layers, the most straightforward strategy is adopting large filters. This, however, is costly because of the quadratic increase in the number of parameters and multiply-add operations. In this work, we explore using learnable box filters to allow for convolution with arbitrarily large kernel size, while kee** the number of parameters per filter constant. In addition, we use precomputed summed-area tables to make the computational cost of convolution independent of the filter size. We adapt and incorporate the box filter as a differentiable module in a fully-convolutional neural network, and demonstrate its competitive performance on popular benchmarks for the task of human pose estimation. △ Less

Submitted 26 June, 2019; originally announced June 2019.

arXiv:1803.08407 [pdf, other]

PlaneMatch: Patch Coplanarity Prediction for Robust RGB-D Reconstruction

Authors: Yifei Shi, Kai Xu, Matthias Niessner, Szymon Rusinkiewicz, Thomas Funkhouser

Abstract: We introduce a novel RGB-D patch descriptor designed for detecting coplanar surfaces in SLAM reconstruction. The core of our method is a deep convolutional neural net that takes in RGB, depth, and normal information of a planar patch in an image and outputs a descriptor that can be used to find coplanar patches from other images.We train the network on 10 million triplets of coplanar and non-copla… ▽ More We introduce a novel RGB-D patch descriptor designed for detecting coplanar surfaces in SLAM reconstruction. The core of our method is a deep convolutional neural net that takes in RGB, depth, and normal information of a planar patch in an image and outputs a descriptor that can be used to find coplanar patches from other images.We train the network on 10 million triplets of coplanar and non-coplanar patches, and evaluate on a new coplanarity benchmark created from commodity RGB-D scans. Experiments show that our learned descriptor outperforms alternatives extended for this new task by a significant margin. In addition, we demonstrate the benefits of coplanarity matching in a robust RGBD reconstruction formulation.We find that coplanarity constraints detected with our method are sufficient to get reconstruction results comparable to state-of-the-art frameworks on most scenes, but outperform other methods on standard benchmarks when combined with a simple keypoint method. △ Less

Submitted 27 July, 2018; v1 submitted 22 March, 2018; originally announced March 2018.

Comments: ECCV 2018 oral paper; Supplemental material included

Journal ref: ECCV 2018

arXiv:1710.10687 [pdf, other]

High-Precision Localization Using Ground Texture

Authors: Linguang Zhang, Adam Finkelstein, Szymon Rusinkiewicz

Abstract: Location-aware applications play an increasingly critical role in everyday life. However, satellite-based localization (e.g., GPS) has limited accuracy and can be unusable in dense urban areas and indoors. We introduce an image-based global localization system that is accurate to a few millimeters and performs reliable localization both indoors and outside. The key idea is to capture and index dis… ▽ More Location-aware applications play an increasingly critical role in everyday life. However, satellite-based localization (e.g., GPS) has limited accuracy and can be unusable in dense urban areas and indoors. We introduce an image-based global localization system that is accurate to a few millimeters and performs reliable localization both indoors and outside. The key idea is to capture and index distinctive local keypoints in ground textures. This is based on the observation that ground textures including wood, carpet, tile, concrete, and asphalt may look random and homogeneous, but all contain cracks, scratches, or unique arrangements of fibers. These imperfections are persistent, and can serve as local features. Our system incorporates a downward-facing camera to capture the fine texture of the ground, together with an image processing pipeline that locates the captured texture patch in a compact database constructed offline. We demonstrate the capability of our system to robustly, accurately, and quickly locate test images on various types of outdoor and indoor ground surfaces. △ Less

Submitted 26 June, 2019; v1 submitted 29 October, 2017; originally announced October 2017.

Showing 1–19 of 19 results for author: Rusinkiewicz, S