Search | arXiv e-print repository

PressureVision++: Estimating Fingertip Pressure from Diverse RGB Images

Authors: Patrick Grady, Jeremy A. Collins, Chengcheng Tang, Christopher D. Twigg, Kunal Aneja, James Hays, Charles C. Kemp

Abstract: Touch plays a fundamental role in manipulation for humans; however, machine perception of contact and pressure typically requires invasive sensors. Recent research has shown that deep models can estimate hand pressure based on a single RGB image. However, evaluations have been limited to controlled settings since collecting diverse data with ground-truth pressure measurements is difficult. We pres… ▽ More Touch plays a fundamental role in manipulation for humans; however, machine perception of contact and pressure typically requires invasive sensors. Recent research has shown that deep models can estimate hand pressure based on a single RGB image. However, evaluations have been limited to controlled settings since collecting diverse data with ground-truth pressure measurements is difficult. We present a novel approach that enables diverse data to be captured with only an RGB camera and a cooperative participant. Our key insight is that people can be prompted to apply pressure in a certain way, and this prompt can serve as a weak label to supervise models to perform well under varied conditions. We collect a novel dataset with 51 participants making fingertip contact with diverse objects. Our network, PressureVision++, outperforms human annotators and prior work. We also demonstrate an application of PressureVision++ to mixed reality where pressure estimation allows everyday surfaces to be used as arbitrary touch-sensitive interfaces. Code, data, and models are available online. △ Less

Submitted 3 January, 2024; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: WACV 2024

arXiv:2210.09887 [pdf, other]

MotionDeltaCNN: Sparse CNN Inference of Frame Differences in Moving Camera Videos

Authors: Mathias Parger, Chengcheng Tang, Thomas Neff, Christopher D. Twigg, Cem Keskin, Robert Wang, Markus Steinberger

Abstract: Convolutional neural network inference on video input is computationally expensive and requires high memory bandwidth. Recently, DeltaCNN managed to reduce the cost by only processing pixels with significant updates over the previous frame. However, DeltaCNN relies on static camera input. Moving cameras add new challenges in how to fuse newly unveiled image regions with already processed regions e… ▽ More Convolutional neural network inference on video input is computationally expensive and requires high memory bandwidth. Recently, DeltaCNN managed to reduce the cost by only processing pixels with significant updates over the previous frame. However, DeltaCNN relies on static camera input. Moving cameras add new challenges in how to fuse newly unveiled image regions with already processed regions efficiently to minimize the update rate - without increasing memory overhead and without knowing the camera extrinsics of future frames. In this work, we propose MotionDeltaCNN, a sparse CNN inference framework that supports moving cameras. We introduce spherical buffers and padded convolutions to enable seamless fusion of newly unveiled regions and previously processed regions -- without increasing memory footprint. Our evaluation shows that we outperform DeltaCNN by up to 90% for moving camera videos. △ Less

Submitted 14 August, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

arXiv:2204.07268 [pdf, other]

Visual Pressure Estimation and Control for Soft Robotic Grippers

Authors: Patrick Grady, Jeremy A. Collins, Samarth Brahmbhatt, Christopher D. Twigg, Chengcheng Tang, James Hays, Charles C. Kemp

Abstract: Soft robotic grippers facilitate contact-rich manipulation, including robust gras** of varied objects. Yet the beneficial compliance of a soft gripper also results in significant deformation that can make precision manipulation challenging. We present visual pressure estimation & control (VPEC), a method that infers pressure applied by a soft gripper using an RGB image from an external camera. W… ▽ More Soft robotic grippers facilitate contact-rich manipulation, including robust gras** of varied objects. Yet the beneficial compliance of a soft gripper also results in significant deformation that can make precision manipulation challenging. We present visual pressure estimation & control (VPEC), a method that infers pressure applied by a soft gripper using an RGB image from an external camera. We provide results for visual pressure inference when a pneumatic gripper and a tendon-actuated gripper make contact with a flat surface. We also show that VPEC enables precision manipulation via closed-loop control of inferred pressure images. In our evaluation, a mobile manipulator (Stretch RE1 from Hello Robot) uses visual servoing to make contact at a desired pressure; follow a spatial pressure trajectory; and grasp small low-profile objects, including a microSD card, a penny, and a pill. Overall, our results show that visual estimates of applied pressure can enable a soft gripper to perform precision manipulation. △ Less

Submitted 9 August, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: IROS 2022

arXiv:2203.10385 [pdf, other]

PressureVision: Estimating Hand Pressure from a Single RGB Image

Authors: Patrick Grady, Chengcheng Tang, Samarth Brahmbhatt, Christopher D. Twigg, Chengde Wan, James Hays, Charles C. Kemp

Abstract: People often interact with their surroundings by applying pressure with their hands. While hand pressure can be measured by placing pressure sensors between the hand and the environment, doing so can alter contact mechanics, interfere with human tactile perception, require costly sensors, and scale poorly to large environments. We explore the possibility of using a conventional RGB camera to infer… ▽ More People often interact with their surroundings by applying pressure with their hands. While hand pressure can be measured by placing pressure sensors between the hand and the environment, doing so can alter contact mechanics, interfere with human tactile perception, require costly sensors, and scale poorly to large environments. We explore the possibility of using a conventional RGB camera to infer hand pressure, enabling machine perception of hand pressure from uninstrumented hands and surfaces. The central insight is that the application of pressure by a hand results in informative appearance changes. Hands share biomechanical properties that result in similar observable phenomena, such as soft-tissue deformation, blood distribution, hand pose, and cast shadows. We collected videos of 36 participants with diverse skin tone applying pressure to an instrumented planar surface. We then trained a deep model (PressureVisionNet) to infer a pressure image from a single RGB image. Our model infers pressure for participants outside of the training data and outperforms baselines. We also show that the output of our model depends on the appearance of the hand and cast shadows near contact regions. Overall, our results suggest the appearance of a previously unobserved human hand can be used to accurately infer applied pressure. Data, code, and models are available online. △ Less

Submitted 30 September, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

Comments: ECCV 2022 oral

arXiv:2203.03996 [pdf, other]

DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos

Authors: Mathias Parger, Chengcheng Tang, Christopher D. Twigg, Cem Keskin, Robert Wang, Markus Steinberger

Abstract: Convolutional neural network inference on video data requires powerful hardware for real-time processing. Given the inherent coherence across consecutive frames, large parts of a video typically change little. By skip** identical image regions and truncating insignificant pixel updates, computational redundancy can in theory be reduced significantly. However, these theoretical savings have been… ▽ More Convolutional neural network inference on video data requires powerful hardware for real-time processing. Given the inherent coherence across consecutive frames, large parts of a video typically change little. By skip** identical image regions and truncating insignificant pixel updates, computational redundancy can in theory be reduced significantly. However, these theoretical savings have been difficult to translate into practice, as sparse updates hamper computational consistency and memory access coherence; which are key for efficiency on real hardware. With DeltaCNN, we present a sparse convolutional neural network framework that enables sparse frame-by-frame updates to accelerate video inference in practice. We provide sparse implementations for all typical CNN layers and propagate sparse feature updates end-to-end - without accumulating errors over time. DeltaCNN is applicable to all convolutional neural networks without retraining. To the best of our knowledge, we are the first to significantly outperform the dense reference, cuDNN, in practical settings, achieving speedups of up to 7x with only marginal differences in accuracy. △ Less

Submitted 2 September, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: CVPR 2022

arXiv:2109.15299 [pdf, other]

Identity-Disentangled Neural Deformation Model for Dynamic Meshes

Authors: Binbin Xu, Lingni Ma, Yuting Ye, Tanner Schmidt, Christopher D. Twigg, Steven Lovegrove

Abstract: Neural shape models can represent complex 3D shapes with a compact latent space. When applied to dynamically deforming shapes such as the human hands, however, they would need to preserve temporal coherence of the deformation as well as the intrinsic identity of the subject. These properties are difficult to regularize with manually designed loss functions. In this paper, we learn a neural deforma… ▽ More Neural shape models can represent complex 3D shapes with a compact latent space. When applied to dynamically deforming shapes such as the human hands, however, they would need to preserve temporal coherence of the deformation as well as the intrinsic identity of the subject. These properties are difficult to regularize with manually designed loss functions. In this paper, we learn a neural deformation model that disentangles the identity-induced shape variations from pose-dependent deformations using implicit neural functions. We perform template-free unsupervised learning on 3D scans without explicit mesh correspondence or semantic correspondences of shapes across subjects. We can then apply the learned model to reconstruct partial dynamic 4D scans of novel subjects performing unseen actions. We propose two methods to integrate global pose alignment with our neural deformation model. Experiments demonstrate the efficacy of our method in the disentanglement of identities and pose. Our method also outperforms traditional skeleton-driven models in reconstructing surface details such as palm prints or tendons without limitations from a fixed template. △ Less

Submitted 4 October, 2021; v1 submitted 30 September, 2021; originally announced September 2021.

arXiv:2104.07267 [pdf, other]

ContactOpt: Optimizing Contact to Improve Grasps

Authors: Patrick Grady, Chengcheng Tang, Christopher D. Twigg, Minh Vo, Samarth Brahmbhatt, Charles C. Kemp

Abstract: Physical contact between hands and objects plays a critical role in human grasps. We show that optimizing the pose of a hand to achieve expected contact with an object can improve hand poses inferred via image-based methods. Given a hand mesh and an object mesh, a deep model trained on ground truth contact data infers desirable contact across the surfaces of the meshes. Then, ContactOpt efficientl… ▽ More Physical contact between hands and objects plays a critical role in human grasps. We show that optimizing the pose of a hand to achieve expected contact with an object can improve hand poses inferred via image-based methods. Given a hand mesh and an object mesh, a deep model trained on ground truth contact data infers desirable contact across the surfaces of the meshes. Then, ContactOpt efficiently optimizes the pose of the hand to achieve desirable contact using a differentiable contact model. Notably, our contact model encourages mesh interpenetration to approximate deformable soft tissue in the hand. In our evaluations, our methods result in grasps that better match ground truth contact, have lower kinematic error, and are significantly preferred by human participants. Code and models are available online. △ Less

Submitted 15 April, 2021; originally announced April 2021.

Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2021

arXiv:2007.09545 [pdf, other]

ContactPose: A Dataset of Grasps with Object Contact and Hand Pose

Authors: Samarth Brahmbhatt, Chengcheng Tang, Christopher D. Twigg, Charles C. Kemp, James Hays

Abstract: Gras** is natural for humans. However, it involves complex hand configurations and soft tissue deformation that can result in complicated regions of contact between the hand and the object. Understanding and modeling this contact can potentially improve hand models, AR/VR experiences, and robotic gras**. Yet, we currently lack datasets of hand-object contact paired with other data modalities,… ▽ More Gras** is natural for humans. However, it involves complex hand configurations and soft tissue deformation that can result in complicated regions of contact between the hand and the object. Understanding and modeling this contact can potentially improve hand models, AR/VR experiences, and robotic gras**. Yet, we currently lack datasets of hand-object contact paired with other data modalities, which is crucial for develo** and evaluating contact modeling techniques. We introduce ContactPose, the first dataset of hand-object contact paired with hand pose, object pose, and RGB-D images. ContactPose has 2306 unique grasps of 25 household objects grasped with 2 functional intents by 50 participants, and more than 2.9 M RGB-D grasp images. Analysis of ContactPose data reveals interesting relationships between hand pose and contact. We use this data to rigorously evaluate various data representations, heuristics from the literature, and learning methods for contact modeling. Data, code, and trained models are available at https://contactpose.cc.gatech.edu. △ Less

Submitted 18 July, 2020; originally announced July 2020.

Comments: The European Conference on Computer Vision (ECCV) 2020

Showing 1–8 of 8 results for author: Twigg, C D