Search | arXiv e-print repository

FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection

Authors: Hongsuk Choi, Isaac Kasahara, Selim Engin, Moritz Graule, Nikhil Chavan-Dafle, Volkan Isler

Abstract: Recently introduced ControlNet has the ability to steer the text-driven image generation process with geometric input such as human 2D pose, or edge features. While ControlNet provides control over the geometric form of the instances in the generated image, it lacks the capability to dictate the visual appearance of each instance. We present FineControlNet to provide fine control over each instanc… ▽ More Recently introduced ControlNet has the ability to steer the text-driven image generation process with geometric input such as human 2D pose, or edge features. While ControlNet provides control over the geometric form of the instances in the generated image, it lacks the capability to dictate the visual appearance of each instance. We present FineControlNet to provide fine control over each instance's appearance while maintaining the precise pose control capability. Specifically, we develop and demonstrate FineControlNet with geometric control via human pose images and appearance control via instance-level text prompts. The spatial alignment of instance-specific text prompts and 2D poses in latent space enables the fine control capabilities of FineControlNet. We evaluate the performance of FineControlNet with rigorous comparison against state-of-the-art pose-conditioned text-to-image diffusion models. FineControlNet achieves superior performance in generating images that follow the user-provided instance-specific text prompts and poses compared with existing methods. Project webpage: https://samsunglabs.github.io/FineControlNet-project-page △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Hongsuk Choi and Isaac Kasahara have eqaul contributions. 19 pages, 15 figures, 3 tables

arXiv:2311.04783 [pdf, other]

VioLA: Aligning Videos to 2D LiDAR Scans

Authors: Jun-Jee Chao, Selim Engin, Nikhil Chavan-Dafle, Bhoram Lee, Volkan Isler

Abstract: We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the re… ▽ More We study the problem of aligning a video that captures a local portion of an environment to the 2D LiDAR scan of the entire environment. We introduce a method (VioLA) that starts with building a semantic map of the local scene from the image sequence, then extracts points at a fixed height for registering to the LiDAR map. Due to reconstruction errors or partial coverage of the camera scan, the reconstructed semantic map may not contain sufficient information for registration. To address this problem, VioLA makes use of a pre-trained text-to-image inpainting model paired with a depth completion model for filling in the missing scene content in a geometrically consistent fashion to support pose registration. We evaluate VioLA on two real-world RGB-D benchmarks, as well as a self-captured dataset of a large office scene. Notably, our proposed scene completion module improves the pose registration performance by up to 20%. △ Less

Submitted 8 November, 2023; originally announced November 2023.

Comments: 8 pages

arXiv:2309.07891 [pdf, other]

HandNeRF: Learning to Reconstruct Hand-Object Interaction Scene from a Single RGB Image

Authors: Hongsuk Choi, Nikhil Chavan-Dafle, Jiacheng Yuan, Volkan Isler, Hyunsoo Park

Abstract: This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image. The inference as well as training-data generation for 3D hand-object scene reconstruction is challenging due to the depth ambiguity of a single image and occlusions by the hand and object. We turn this challenge into an opportunity by utilizing the hand shape to co… ▽ More This paper presents a method to learn hand-object interaction prior for reconstructing a 3D hand-object scene from a single RGB image. The inference as well as training-data generation for 3D hand-object scene reconstruction is challenging due to the depth ambiguity of a single image and occlusions by the hand and object. We turn this challenge into an opportunity by utilizing the hand shape to constrain the possible relative configuration of the hand and object geometry. We design a generalizable implicit function, HandNeRF, that explicitly encodes the correlation of the 3D hand shape features and 2D object features to predict the hand and object scene geometry. With experiments on real-world datasets, we show that HandNeRF is able to reconstruct hand-object scenes of novel grasp configurations more accurately than comparable methods. Moreover, we demonstrate that object reconstruction from HandNeRF ensures more accurate execution of downstream tasks, such as gras** and motion planning for robotic hand-over and manipulation. The code is released here: https://github.com/SamsungLabs/HandNeRF △ Less

Submitted 11 February, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: In ICRA 2024; 13 pages including the supplementary material, 8 tables, 12 figures

arXiv:2307.13133 [pdf, other]

simPLE: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects

Authors: Maria Bauza, Antonia Bronars, Yifan Hou, Ian Taylor, Nikhil Chavan-Dafle, Alberto Rodriguez

Abstract: Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. ki… ▽ More Existing robotic systems have a clear tension between generality and precision. Deployed solutions for robotic manipulation tend to fall into the paradigm of one robot solving a single task, lacking precise generalization, i.e., the ability to solve many tasks without compromising on precision. This paper explores solutions for precise and general pick-and-place. In precise pick-and-place, i.e. kitting, the robot transforms an unstructured arrangement of objects into an organized arrangement, which can facilitate further manipulation. We propose simPLE (simulation to Pick Localize and PLacE) as a solution to precise pick-and-place. simPLE learns to pick, regrasp and place objects precisely, given only the object CAD model and no prior experience. We develop three main components: task-aware gras**, visuotactile perception, and regrasp planning. Task-aware gras** computes affordances of grasps that are stable, observable, and favorable to placing. The visuotactile perception model relies on matching real observations against a set of simulated ones through supervised learning. Finally, we compute the desired robot motion by solving a shortest path problem on a graph of hand-to-hand regrasps. On a dual-arm robot equipped with visuotactile sensing, we demonstrate pick-and-place of 15 diverse objects with simPLE. The objects span a wide range of shapes and simPLE achieves successful placements into structured arrangements with 1mm clearance over 90% of the time for 6 objects, and over 80% of the time for 11 objects. Videos are available at http://mcube.mit.edu/research/simPLE.html . △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 33 pages, 6 figures, 2 tables, submitted to Science Robotics

arXiv:2307.11932 [pdf, other]

RIC: Rotate-Inpaint-Complete for Generalizable Scene Reconstruction

Authors: Isaac Kasahara, Shubham Agrawal, Selim Engin, Nikhil Chavan-Dafle, Shuran Song, Volkan Isler

Abstract: General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects. In many practical applications such as AR/VR, autonomous navigation, and robotics, only a single view of the scene may be available, making the scene reconstruction task challenging. In this paper, we present a method for scene reconstruction by structural… ▽ More General scene reconstruction refers to the task of estimating the full 3D geometry and texture of a scene containing previously unseen objects. In many practical applications such as AR/VR, autonomous navigation, and robotics, only a single view of the scene may be available, making the scene reconstruction task challenging. In this paper, we present a method for scene reconstruction by structurally breaking the problem into two steps: rendering novel views via inpainting and 2D to 3D scene lifting. Specifically, we leverage the generalization capability of large visual language models (Dalle-2) to inpaint the missing areas of scene color images rendered from different views. Next, we lift these inpainted images to 3D by predicting normals of the inpainted image and solving for the missing depth values. By predicting for normals instead of depth directly, our method allows for robustness to changes in depth distributions and scale. With rigorous quantitative evaluation, we show that our method outperforms multiple baselines while providing generalization to novel objects and scenes. △ Less

Submitted 4 October, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

arXiv:2305.09510 [pdf, other]

Real-time Simultaneous Multi-Object 3D Shape Reconstruction, 6DoF Pose Estimation and Dense Grasp Prediction

Authors: Shubham Agrawal, Nikhil Chavan-Dafle, Isaac Kasahara, Selim Engin, **wook Huh, Volkan Isler

Abstract: Robotic manipulation systems operating in complex environments rely on perception systems that provide information about the geometry (pose and 3D shape) of the objects in the scene along with other semantic information such as object labels. This information is then used for choosing the feasible grasps on relevant objects. In this paper, we present a novel method to provide this geometric and se… ▽ More Robotic manipulation systems operating in complex environments rely on perception systems that provide information about the geometry (pose and 3D shape) of the objects in the scene along with other semantic information such as object labels. This information is then used for choosing the feasible grasps on relevant objects. In this paper, we present a novel method to provide this geometric and semantic information of all objects in the scene as well as feasible grasps on those objects simultaneously. The main advantage of our method is its speed as it avoids sequential perception and grasp planning steps. With detailed quantitative analysis, we show that our method delivers competitive performance compared to the state-of-the-art dedicated methods for object shape, pose, and grasp predictions while providing fast inference at 30 frames per second speed. △ Less

Submitted 16 May, 2023; originally announced May 2023.

ACM Class: I.4.5; I.4.8; I.4.10; I.2.9; I.2.10; I.6.3

arXiv:2304.04100 [pdf, other]

Pick2Place: Task-aware 6DoF Grasp Estimation via Object-Centric Perspective Affordance

Authors: Zhanpeng He, Nikhil Chavan-Dafle, **wook Huh, Shuran Song, Volkan Isler

Abstract: The choice of a grasp plays a critical role in the success of downstream manipulation tasks. Consider a task of placing an object in a cluttered scene; the majority of possible grasps may not be suitable for the desired placement. In this paper, we study the synergy between the picking and placing of an object in a cluttered scene to develop an algorithm for task-aware grasp estimation. We present… ▽ More The choice of a grasp plays a critical role in the success of downstream manipulation tasks. Consider a task of placing an object in a cluttered scene; the majority of possible grasps may not be suitable for the desired placement. In this paper, we study the synergy between the picking and placing of an object in a cluttered scene to develop an algorithm for task-aware grasp estimation. We present an object-centric action space that encodes the relationship between the geometry of the placement scene and the object to be placed in order to provide placement affordance maps directly from perspective views of the placement scene. This action space enables the computation of a one-to-one map** between the placement and picking actions allowing the robot to generate a diverse set of pick-and-place proposals and to optimize for a grasp under other task constraints such as robot kinematics and collision avoidance. With experiments both in simulation and on a real robot we demonstrate that with our method, the robot is able to successfully complete the task of placement-aware gras** with over 89% accuracy in such a way that generalizes to novel objects and scenes. △ Less

Submitted 8 April, 2023; originally announced April 2023.

Comments: IEEE International Conference on Robotics and Automation 2023

arXiv:2109.06837 [pdf, other]

Simultaneous Object Reconstruction and Grasp Prediction using a Camera-centric Object Shell Representation

Authors: Nikhil Chavan-Dafle, Sergiy Popovych, Shubham Agrawal, Daniel D. Lee, Volkan Isler

Abstract: Being able to grasp objects is a fundamental component of most robotic manipulation systems. In this paper, we present a new approach to simultaneously reconstruct a mesh and a dense grasp quality map of an object from a depth image. At the core of our approach is a novel camera-centric object representation called the "object shell" which is composed of an observed "entry image" and a predicted "… ▽ More Being able to grasp objects is a fundamental component of most robotic manipulation systems. In this paper, we present a new approach to simultaneously reconstruct a mesh and a dense grasp quality map of an object from a depth image. At the core of our approach is a novel camera-centric object representation called the "object shell" which is composed of an observed "entry image" and a predicted "exit image". We present an image-to-image residual ConvNet architecture in which the object shell and a grasp-quality map are predicted as separate output channels. The main advantage of the shell representation and the corresponding neural network architecture, ShellGrasp-Net, is that the input-output pixel correspondences in the shell representation are explicitly represented in the architecture. We show that this coupling yields superior generalization capabilities for object reconstruction and accurate grasp quality estimation implicitly considering the object geometry. Our approach yields an efficient dense grasp quality map and an object geometry estimate in a single forward pass. Both of these outputs can be used in a wide range of robotic manipulation applications. With rigorous experimental validation, both in simulation and on a real setup, we show that our shell-based method can be used to generate precise grasps and the associated grasp quality with over 90% accuracy. Diverse grasps computed on shell reconstructions allow the robot to select and execute grasps in cluttered scenes with more than 93% success rate. △ Less

Submitted 19 December, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

Comments: 18 pages, 12 figures, 8 tables

Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

arXiv:1810.00219 [pdf, other]

In-Hand Manipulation via Motion Cones

Authors: Nikhil Chavan-Dafle, Rachel Holladay, Alberto Rodriguez

Abstract: In this paper, we present the mechanics and algorithms to compute the set of feasible motions of an object pushed in a plane. This set is known as the motion cone and was previously described for non-prehensile manipulation tasks in the horizontal plane. We generalize its geometric construction to a broader set of planar tasks, where external forces such as gravity influence the dynamics of pushin… ▽ More In this paper, we present the mechanics and algorithms to compute the set of feasible motions of an object pushed in a plane. This set is known as the motion cone and was previously described for non-prehensile manipulation tasks in the horizontal plane. We generalize its geometric construction to a broader set of planar tasks, where external forces such as gravity influence the dynamics of pushing, and prehensile tasks, where there are complex interactions between the gripper, object, and pusher. We show that the motion cone is defined by a set of low-curvature surfaces and provide a polyhedral cone approximation to it. We verify its validity with 2000 pushing experiments recorded with motion tracking system. Motion cones abstract the algebra involved in simulating frictional pushing by providing bounds on the set of feasible motions and by characterizing which pushes will stick or slip. We demonstrate their use for the dynamic propagation step in a sampling-based planning algorithm for in-hand manipulation. The planner generates trajectories that involve sequences of continuous pushes with 5-1000x speed improvements to equivalent algorithms. Video Summary -- https://youtu.be/tVDO8QMuYhc △ Less

Submitted 23 February, 2019; v1 submitted 29 September, 2018; originally announced October 2018.

Comments: Robotics : Science and Systems, 2018 (Best Student Paper Award Winner)

arXiv:1809.08522 [pdf, other]

Regras** by Fixtureless Fixturing

Authors: Nikhil Chavan-Dafle, Alberto Rodriguez

Abstract: This paper presents a fixturing strategy for regras** that does not require a physical fixture. To regrasp an object in a gripper, a robot pushes the object against external contact/s in the environment such that the external contact keeps the object stationary while the fingers slide over the object. We call this manipulation technique fixtureless fixturing. Exploiting the mechanics of pushing,… ▽ More This paper presents a fixturing strategy for regras** that does not require a physical fixture. To regrasp an object in a gripper, a robot pushes the object against external contact/s in the environment such that the external contact keeps the object stationary while the fingers slide over the object. We call this manipulation technique fixtureless fixturing. Exploiting the mechanics of pushing, we characterize a convex polyhedral set of pushes that results in fixtureless fixturing. These pushes are robust against uncertainty in the object inertia, gras** force, and the friction at the contacts. We propose a sampling-based planner that uses the sets of robust pushes to rapidly build a tree of reachable grasps. A path in this tree is a pushing strategy, possibly involving pushes from different sides, to regrasp the object. We demonstrate the experimental validity and robustness of the proposed manipulation technique with different regrasp examples on a manipulation platform. Such a fast and flexible regrasp planner facilitates versatile and flexible automation solutions. △ Less

Submitted 25 September, 2018; v1 submitted 22 September, 2018; originally announced September 2018.

Comments: IEEE International Conference on Automation Science and Engineering (CASE) 2018

arXiv:1809.08420 [pdf, other]

Pneumatic Shape-shifting Fingers to Reorient and Grasp

Authors: Nikhil Chavan-Dafle, Kyubin Lee, Alberto Rodriguez

Abstract: We present pneumatic shape-shifting fingers to enable a simple parallel-jaw gripper for different manipulation modalities. By changing the finger geometry, the gripper effectively changes the contact type between the fingers and an object to facilitate distinct manipulation primitives. In this paper, we demonstrate the development and application of shape-shifting fingers to reorient and grasp cyl… ▽ More We present pneumatic shape-shifting fingers to enable a simple parallel-jaw gripper for different manipulation modalities. By changing the finger geometry, the gripper effectively changes the contact type between the fingers and an object to facilitate distinct manipulation primitives. In this paper, we demonstrate the development and application of shape-shifting fingers to reorient and grasp cylindrical objects. The shape of the fingers changes based on the air pressure inside them and attains two distinct geometric forms at high and low pressure values. In our implementation, the finger shape switches between a wedge-shaped geometry and V-shaped geometry at high and low pressure, respectively. Using the wedge-shaped geometry, the fingers provide a point contact on a cylindrical object to pivot it to a vertical pose under the effect of gravity. By changing to V-shaped geometry, the fingers localize the object in the vertical pose and securely hold it. Experimental results show that the smooth transition between the two contact types allows a robot with a simple gripper to reorient a cylindrical object lying horizontally on a ground and to grasp it in a vertical pose. △ Less

Submitted 22 September, 2018; originally announced September 2018.

Comments: IEEE International Conference on Automation Science and Engineering (CASE) 2018

arXiv:1710.11097 [pdf, other]

Stable Prehensile Pushing: In-Hand Manipulation with Alternating Sticking Contacts

Authors: Nikhil Chavan-Dafle, Alberto Rodriguez

Abstract: This paper presents an approach to in-hand manipulation planning that exploits the mechanics of alternating sticking contact. Particularly, we consider the problem of manipulating a grasped object using external pushes for which the pusher sticks to the object. Given the physical properties of the object, frictional coefficients at contacts and a desired regrasp on the object, we propose a samplin… ▽ More This paper presents an approach to in-hand manipulation planning that exploits the mechanics of alternating sticking contact. Particularly, we consider the problem of manipulating a grasped object using external pushes for which the pusher sticks to the object. Given the physical properties of the object, frictional coefficients at contacts and a desired regrasp on the object, we propose a sampling-based planning framework that builds a pushing strategy concatenating different feasible stable pushes to achieve the desired regrasp. An efficient dynamics formulation allows us to plan in-hand manipulations 100-1000 times faster than our previous work which builds upon a complementarity formulation. Experimental observations for the generated plans show that the object precisely moves in the grasp as expected by the planner. Video Summary -- youtu.be/qOTKRJMx6Ho △ Less

Submitted 4 March, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

Comments: IEEE International Conference on Robotics and Automation 2018

arXiv:1707.00318 [pdf, other]

Sampling-based Planning of In-Hand Manipulation with External Pushes

Authors: Nikhil Chavan-Dafle, Alberto Rodriguez

Abstract: This paper presents a sampling-based planning algorithm for in-hand manipulation of a grasped object using a series of external pushes. A high-level sampling-based planning framework, in tandem with a low-level inverse contact dynamics solver, effectively explores the space of continuous pushes with discrete pusher contact switch-overs. We model the frictional interaction between gripper, grasped… ▽ More This paper presents a sampling-based planning algorithm for in-hand manipulation of a grasped object using a series of external pushes. A high-level sampling-based planning framework, in tandem with a low-level inverse contact dynamics solver, effectively explores the space of continuous pushes with discrete pusher contact switch-overs. We model the frictional interaction between gripper, grasped object, and pusher, by discretizing complex surface/line contacts into arrays of hard frictional point contacts. The inverse dynamics problem of finding an instantaneous pusher motion that yields a desired instantaneous object motion takes the form of a mixed nonlinear complementarity problem. Building upon this dynamics solver, our planner generates a sequence of pushes that steers the object to a goal grasp. We evaluate the performance of the planner for the case of a parallel-jaw gripper manipulating different objects, both in simulation and with real experiments. Through these examples, we highlight the important properties of the planner: respecting and exploiting the hybrid dynamics of contact sticking/sliding/rolling and a sense of efficiency with respect to discrete contact switch-overs. △ Less

Submitted 1 November, 2017; v1 submitted 2 July, 2017; originally announced July 2017.

Comments: International Symposium on Robotics Research 2017, Puerto Varas, Chile

arXiv:1702.07252 [pdf, other]

Experimental Validation of Contact Dynamics for In-Hand Manipulation

Authors: Roman Kolbert, Nikhil Chavan-Dafle, Alberto Rodriguez

Abstract: This paper evaluates state-of-the-art contact models at predicting the motions and forces involved in simple in-hand robotic manipulations. In particular it focuses on three primitive actions --linear sliding, pivoting, and rolling-- that involve contacts between a gripper, a rigid object, and their environment. The evaluation is done through thousands of controlled experiments designed to capture… ▽ More This paper evaluates state-of-the-art contact models at predicting the motions and forces involved in simple in-hand robotic manipulations. In particular it focuses on three primitive actions --linear sliding, pivoting, and rolling-- that involve contacts between a gripper, a rigid object, and their environment. The evaluation is done through thousands of controlled experiments designed to capture the motion of object and gripper, and all contact forces and torques at 250Hz. We demonstrate that a contact modeling approach based on Coulomb's friction law and maximum energy principle is effective at reasoning about interaction to first order, but limited for making accurate predictions. We attribute the major limitations to 1) the non-uniqueness of force resolution inherent to grasps with multiple hard contacts of complex geometries, 2) unmodeled dynamics due to contact compliance, and 3) unmodeled geometries dueto manufacturing defects. △ Less

Submitted 31 October, 2017; v1 submitted 6 February, 2017; originally announced February 2017.

Comments: International Symposium on Experimental Robotics, ISER 2016, Tokyo, Japan

arXiv:1604.03639 [pdf, other]

A Summary of Team MIT's Approach to the Amazon Picking Challenge 2015

Authors: Kuan-Ting Yu, Nima Fazeli, Nikhil Chavan-Dafle, Orion Taylor, Elliott Donlon, Guillermo Diaz Lankenau, Alberto Rodriguez

Abstract: The Amazon Picking Challenge (APC), held alongside the International Conference on Robotics and Automation in May 2015 in Seattle, challenged roboticists from academia and industry to demonstrate fully automated solutions to the problem of picking objects from shelves in a warehouse fulfillment scenario. Packing density, object variability, speed, and reliability are the main complexities of the t… ▽ More The Amazon Picking Challenge (APC), held alongside the International Conference on Robotics and Automation in May 2015 in Seattle, challenged roboticists from academia and industry to demonstrate fully automated solutions to the problem of picking objects from shelves in a warehouse fulfillment scenario. Packing density, object variability, speed, and reliability are the main complexities of the task. The picking challenge serves both as a motivation and an instrument to focus research efforts on a specific manipulation problem. In this document, we describe Team MIT's approach to the competition, including design considerations, contributions, and performance, and we compile the lessons learned. We also describe what we think are the main remaining challenges. △ Less

Submitted 12 April, 2016; originally announced April 2016.

Comments: 8 pages, 8 figures

Showing 1–15 of 15 results for author: Chavan-Dafle, N