-
Transferable Tactile Transformers for Representation Learning Across Diverse Sensors and Tasks
Authors:
Jialiang Zhao,
Yuxiang Ma,
Lirui Wang,
Edward H. Adelson
Abstract:
This paper presents T3: Transferable Tactile Transformers, a framework for tactile representation learning that scales across multi-sensors and multi-tasks. T3 is designed to overcome the contemporary issue that camera-based tactile sensing is extremely heterogeneous, i.e. sensors are built into different form factors, and existing datasets were collected for disparate tasks. T3 captures the share…
▽ More
This paper presents T3: Transferable Tactile Transformers, a framework for tactile representation learning that scales across multi-sensors and multi-tasks. T3 is designed to overcome the contemporary issue that camera-based tactile sensing is extremely heterogeneous, i.e. sensors are built into different form factors, and existing datasets were collected for disparate tasks. T3 captures the shared latent information across different sensor-task pairings by constructing a shared trunk transformer with sensor-specific encoders and task-specific decoders. The pre-training of T3 utilizes a novel Foundation Tactile (FoTa) dataset, which is aggregated from several open-sourced datasets and it contains over 3 million data points gathered from 13 sensors and 11 tasks. FoTa is the largest and most diverse dataset in tactile sensing to date and it is made publicly available in a unified format. Across various sensors and tasks, experiments show that T3 pre-trained with FoTa achieved zero-shot transferability in certain sensor-task pairings, can be further fine-tuned with small amounts of domain-specific data, and its performance scales with bigger network sizes. T3 is also effective as a tactile encoder for long horizon contact-rich manipulation. Results from sub-millimeter multi-pin electronics insertion tasks show that T3 achieved a task success rate 25% higher than that of policies trained with tactile encoders trained from scratch, or 53% higher than without tactile sensing. Data, code, and model checkpoints are open-sourced at https://t3.alanz.info.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
A Passively Bendable, Compliant Tactile Palm with RObotic Modular Endoskeleton Optical (ROMEO) Fingers
Authors:
Sandra Q. Liu,
Edward H. Adelson
Abstract:
Many robotic hands currently rely on extremely dexterous robotic fingers and a thumb joint to envelop themselves around an object. Few hands focus on the palm even though human hands greatly benefit from their central fold and soft surface. As such, we develop a novel structurally compliant soft palm, which enables more surface area contact for the objects that are pressed into it. Moreover, this…
▽ More
Many robotic hands currently rely on extremely dexterous robotic fingers and a thumb joint to envelop themselves around an object. Few hands focus on the palm even though human hands greatly benefit from their central fold and soft surface. As such, we develop a novel structurally compliant soft palm, which enables more surface area contact for the objects that are pressed into it. Moreover, this design, along with the development of a new low-cost, flexible illumination system, is able to incorporate a high-resolution tactile sensing system inspired by the GelSight sensors. Concurrently, we design RObotic Modular Endoskeleton Optical (ROMEO) fingers, which are underactuated two-segment soft fingers that are able to house the new illumination system, and we integrate them into these various palm configurations. The resulting robotic hand is slightly bigger than a baseball and represents one of the first soft robotic hands with actuated fingers and a passively compliant palm, all of which have high-resolution tactile sensing. This design also potentially helps researchers discover and explore more soft-rigid tactile robotic hand designs with greater capabilities in the future.
The supplementary video can be found here: https://youtu.be/RKfIFiewqsg
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Scalable, Simulation-Guided Compliant Tactile Finger Design
Authors:
Yuxiang Ma,
Arpit Agarwal,
Sandra Q. Liu,
Wenzhen Yuan,
Edward H. Adelson
Abstract:
Compliant grippers enable robots to work with humans in unstructured environments. In general, these grippers can improve with tactile sensing to estimate the state of objects around them to precisely manipulate objects. However, co-designing compliant structures with high-resolution tactile sensing is a challenging task. We propose a simulation framework for the end-to-end forward design of GelSi…
▽ More
Compliant grippers enable robots to work with humans in unstructured environments. In general, these grippers can improve with tactile sensing to estimate the state of objects around them to precisely manipulate objects. However, co-designing compliant structures with high-resolution tactile sensing is a challenging task. We propose a simulation framework for the end-to-end forward design of GelSight Fin Ray sensors. Our simulation framework consists of mechanical simulation using the finite element method (FEM) and optical simulation including physically based rendering (PBR). To simulate the fluorescent paint used in these GelSight Fin Rays, we propose an efficient method that can be directly integrated in PBR. Using the simulation framework, we investigate design choices available in the compliant grippers, namely gel pad shapes, illumination conditions, Fin Ray gripper sizes, and Fin Ray stiffness. This infrastructure enables faster design and prototype time frames of new Fin Ray sensors that have various sensing areas, ranging from 48 mm $\times$ \18 mm to 70 mm $\times$ 35 mm. Given the parameters we choose, we can thus optimize different Fin Ray designs and show their utility in gras** day-to-day objects.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
PoCo: Policy Composition from and for Heterogeneous Robot Learning
Authors:
Lirui Wang,
Jialiang Zhao,
Yilun Du,
Edward H. Adelson,
Russ Tedrake
Abstract:
Training general robotic policies from heterogeneous data for different tasks is a significant challenge. Existing robotic datasets vary in different modalities such as color, depth, tactile, and proprioceptive information, and collected in different domains such as simulation, real robots, and human videos. Current methods usually collect and pool all data from one domain to train a single policy…
▽ More
Training general robotic policies from heterogeneous data for different tasks is a significant challenge. Existing robotic datasets vary in different modalities such as color, depth, tactile, and proprioceptive information, and collected in different domains such as simulation, real robots, and human videos. Current methods usually collect and pool all data from one domain to train a single policy to handle such heterogeneity in tasks and domains, which is prohibitively expensive and difficult. In this work, we present a flexible approach, dubbed Policy Composition, to combine information across such diverse modalities and domains for learning scene-level and task-level generalized manipulation skills, by composing different data distributions represented with diffusion models. Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time. We train our method on simulation, human, and real robot data and evaluate in tool-use tasks. The composed policy achieves robust and dexterous performance under varying scenes and tasks and outperforms baselines from a single data source in both simulation and real-world experiments. See https://liruiw.github.io/policycomp for more details .
△ Less
Submitted 27 May, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
GelSight Svelte Hand: A Three-finger, Two-DoF, Tactile-rich, Low-cost Robot Hand for Dexterous Manipulation
Authors:
Jialiang Zhao,
Edward H. Adelson
Abstract:
This paper presents GelSight Svelte Hand, a novel 3-finger 2-DoF tactile robotic hand that is capable of performing precision grasps, power grasps, and intermediate grasps. Rich tactile signals are obtained from one camera on each finger, with an extended sensing area similar to the full length of a human finger. Each finger of GelSight Svelte Hand is supported by a semi-rigid endoskeleton and cov…
▽ More
This paper presents GelSight Svelte Hand, a novel 3-finger 2-DoF tactile robotic hand that is capable of performing precision grasps, power grasps, and intermediate grasps. Rich tactile signals are obtained from one camera on each finger, with an extended sensing area similar to the full length of a human finger. Each finger of GelSight Svelte Hand is supported by a semi-rigid endoskeleton and covered with soft silicone materials, which provide both rigidity and compliance. We describe the design, fabrication, functionalities, and tactile sensing capability of GelSight Svelte Hand in this paper. More information is available on our website: \url{https://gelsight-svelte.alanz.info}.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
GelSight Svelte: A Human Finger-shaped Single-camera Tactile Robot Finger with Large Sensing Coverage and Proprioceptive Sensing
Authors:
Jialiang Zhao,
Edward H. Adelson
Abstract:
Camera-based tactile sensing is a low-cost, popular approach to obtain highly detailed contact geometry information. However, most existing camera-based tactile sensors are fingertip sensors, and longer fingers often require extraneous elements to obtain an extended sensing area similar to the full length of a human finger. Moreover, existing methods to estimate proprioceptive information such as…
▽ More
Camera-based tactile sensing is a low-cost, popular approach to obtain highly detailed contact geometry information. However, most existing camera-based tactile sensors are fingertip sensors, and longer fingers often require extraneous elements to obtain an extended sensing area similar to the full length of a human finger. Moreover, existing methods to estimate proprioceptive information such as total forces and torques applied on the finger from camera-based tactile sensors are not effective when the contact geometry is complex. We introduce GelSight Svelte, a curved, human finger-sized, single-camera tactile sensor that is capable of both tactile and proprioceptive sensing over a large area. GelSight Svelte uses curved mirrors to achieve the desired shape and sensing coverage. Proprioceptive information, such as the total bending and twisting torques applied on the finger, is reflected as deformations on the flexible backbone of GelSight Svelte, which are also captured by the camera. We train a convolutional neural network to estimate the bending and twisting torques from the captured images. We conduct gel deformation experiments at various locations of the finger to evaluate the tactile sensing capability and proprioceptive sensing accuracy. To demonstrate the capability and potential uses of GelSight Svelte, we conduct an object holding task with three different gras** modes that utilize different areas of the finger. More information is available on our website: https://gelsight-svelte.alanz.info
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
GelSight360: An Omnidirectional Camera-Based Tactile Sensor for Dexterous Robotic Manipulation
Authors:
Megha H. Tippur,
Edward H. Adelson
Abstract:
Camera-based tactile sensors have shown great promise in enhancing a robot's ability to perform a variety of dexterous manipulation tasks. Advantages of their use can be attributed to the high resolution tactile data and 3D depth map reconstructions they can provide. Unfortunately, many of these tactile sensors use either a flat sensing surface, sense on only one side of the sensor's body, or have…
▽ More
Camera-based tactile sensors have shown great promise in enhancing a robot's ability to perform a variety of dexterous manipulation tasks. Advantages of their use can be attributed to the high resolution tactile data and 3D depth map reconstructions they can provide. Unfortunately, many of these tactile sensors use either a flat sensing surface, sense on only one side of the sensor's body, or have a bulky form-factor, making it difficult to integrate the sensors with a variety of robotic grippers. Of the camera-based sensors that do have all-around, curved sensing surfaces, many cannot provide 3D depth maps; those that do often require optical designs specified to a particular sensor geometry. In this work, we introduce GelSight360, a fingertip-like, omnidirectional, camera-based tactile sensor capable of producing depth maps of objects deforming the sensor's surface. In addition, we introduce a novel cross-LED lighting scheme that can be implemented in different all-around sensor geometries and sizes, allowing the sensor to easily be reconfigured and attached to different grippers of varying DOFs. With this work, we enable roboticists to quickly and easily customize high resolution tactile sensors to fit their robotic system's needs.
△ Less
Submitted 9 April, 2023;
originally announced April 2023.
-
GelSight EndoFlex: A Soft Endoskeleton Hand with Continuous High-Resolution Tactile Sensing
Authors:
Sandra Q. Liu,
Leonardo Zamora YaƱez,
Edward H. Adelson
Abstract:
We describe a novel three-finger robot hand that has high resolution tactile sensing along the entire length of each finger. The fingers are compliant, constructed with a soft shell supported with a flexible endoskeleton. Each finger contains two cameras, allowing tactile data to be gathered along the front and side surfaces of the fingers. The gripper can perform an envelo** grasp of an object…
▽ More
We describe a novel three-finger robot hand that has high resolution tactile sensing along the entire length of each finger. The fingers are compliant, constructed with a soft shell supported with a flexible endoskeleton. Each finger contains two cameras, allowing tactile data to be gathered along the front and side surfaces of the fingers. The gripper can perform an envelo** grasp of an object and extract a large amount of rich tactile data in a single grasp. By capturing data from many parts of the grasped object at once, we can do object recognition with a single grasp rather than requiring multiple touches. We describe our novel design and construction techniques which allow us to simultaneously satisfy the requirements of compliance and strength, and high resolution tactile sensing over large areas. The supplementary video can be found here: https://youtu.be/H1OYADtgj9k
△ Less
Submitted 31 March, 2023;
originally announced March 2023.
-
GelSight Baby Fin Ray: A Compact, Compliant, Flexible Finger with High-Resolution Tactile Sensing
Authors:
Sandra Q. Liu,
Yuxiang Ma,
Edward H. Adelson
Abstract:
The synthesis of tactile sensing with compliance is essential to many fields, from agricultural usages like fruit picking, to sustainability practices such as sorting recycling, to the creation of safe home-care robots for the elderly to age with dignity. From tactile sensing, we can discern material properties, recognize textures, and determine softness, while with compliance, we are able to secu…
▽ More
The synthesis of tactile sensing with compliance is essential to many fields, from agricultural usages like fruit picking, to sustainability practices such as sorting recycling, to the creation of safe home-care robots for the elderly to age with dignity. From tactile sensing, we can discern material properties, recognize textures, and determine softness, while with compliance, we are able to securely and safely interact with the objects and the environment around us. These two abilities can culminate into a useful soft robotic gripper, such as the original GelSight Fin Ray, which is able to grasp a large variety of different objects and also perform a simple household manipulation task: wine glass reorientation. Although the original GelSight Fin Ray solves the problem of interfacing a generally rigid, high-resolution sensor with a soft, compliant structure, we can improve the robustness of the sensor and implement techniques that make such camera-based tactile sensors applicable to a wider variety of soft robot designs. We first integrate flexible mirrors and incorporate the rigid electronic components into the base of the gripper, which greatly improves the compliance of the Fin Ray structure. Then, we synthesize a flexible and high-elongation silicone adhesive-based fluorescent paint, which can provide good quality 2D tactile localization results for our sensor. Finally, we incorporate all of these techniques into a new design: the Baby Fin Ray, which we use to dig through clutter, and perform successful classification of nuts in their shells. The supplementary video can be found here: https://youtu.be/_oD_QFtYTPM
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
FingerSLAM: Closed-loop Unknown Object Localization and Reconstruction from Visuo-tactile Feedback
Authors:
Jialiang Zhao,
Maria Bauza,
Edward H. Adelson
Abstract:
In this paper, we address the problem of using visuo-tactile feedback for 6-DoF localization and 3D reconstruction of unknown in-hand objects. We propose FingerSLAM, a closed-loop factor graph-based pose estimator that combines local tactile sensing at finger-tip and global vision sensing from a wrist-mount camera. FingerSLAM is constructed with two constituent pose estimators: a multi-pass refine…
▽ More
In this paper, we address the problem of using visuo-tactile feedback for 6-DoF localization and 3D reconstruction of unknown in-hand objects. We propose FingerSLAM, a closed-loop factor graph-based pose estimator that combines local tactile sensing at finger-tip and global vision sensing from a wrist-mount camera. FingerSLAM is constructed with two constituent pose estimators: a multi-pass refined tactile-based pose estimator that captures movements from detailed local textures, and a single-pass vision-based pose estimator that predicts from a global view of the object. We also design a loop closure mechanism that actively matches current vision and tactile images to previously stored key-frames to reduce accumulated error. FingerSLAM incorporates the two sensing modalities of tactile and vision, as well as the loop closure mechanism with a factor graph-based optimization framework. Such a framework produces an optimized pose estimation solution that is more accurate than the standalone estimators. The estimated poses are then used to reconstruct the shape of the unknown object incrementally by stitching the local point clouds recovered from tactile images. We train our system on real-world data collected with 20 objects. We demonstrate reliable visuo-tactile pose estimation and shape reconstruction through quantitative and qualitative real-world evaluations on 6 objects that are unseen during training.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
GelSight Fin Ray: Incorporating Tactile Sensing into a Soft Compliant Robotic Gripper
Authors:
Sandra Q. Liu,
Edward H. Adelson
Abstract:
To adapt to constantly changing environments and be safe for human interaction, robots should have compliant and soft characteristics as well as the ability to sense the world around them. Even so, the incorporation of tactile sensing into a soft compliant robot, like the Fin Ray finger, is difficult due to its deformable structure. Not only does the frame need to be modified to allow room for a v…
▽ More
To adapt to constantly changing environments and be safe for human interaction, robots should have compliant and soft characteristics as well as the ability to sense the world around them. Even so, the incorporation of tactile sensing into a soft compliant robot, like the Fin Ray finger, is difficult due to its deformable structure. Not only does the frame need to be modified to allow room for a vision sensor, which enables intricate tactile sensing, the robot must also retain its original mechanically compliant properties. However, adding high-resolution tactile sensors to soft fingers is difficult since many sensorized fingers, such as GelSight-based ones, are rigid and function under the assumption that changes in the sensing region are only from tactile contact and not from finger compliance. A sensorized soft robotic finger needs to be able to separate its overall proprioceptive changes from its tactile information. To this end, this paper introduces the novel design of a GelSight Fin Ray, which embodies both the ability to passively adapt to any object it grasps and the ability to perform high-resolution tactile reconstruction, object orientation estimation, and marker tracking for shear and torsional forces. Having these capabilities allow soft and compliant robots to perform more manipulation tasks that require sensing. One such task the finger is able to perform successfully is a kitchen task: wine glass reorientation and placement, which is difficult to do with external vision sensors but is easy with tactile sensing. The development of this sensing technology could also potentially be applied to other soft compliant grippers, increasing their viability in many different fields.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
3D Shape Perception from Monocular Vision, Touch, and Shape Priors
Authors:
Shaoxiong Wang,
Jiajun Wu,
Xingyuan Sun,
Wenzhen Yuan,
William T. Freeman,
Joshua B. Tenenbaum,
Edward H. Adelson
Abstract:
Perceiving accurate 3D object shape is important for robots to interact with the physical world. Current research along this direction has been primarily relying on visual observations. Vision, however useful, has inherent limitations due to occlusions and the 2D-3D ambiguities, especially for perception with a monocular camera. In contrast, touch gets precise local shape information, though its e…
▽ More
Perceiving accurate 3D object shape is important for robots to interact with the physical world. Current research along this direction has been primarily relying on visual observations. Vision, however useful, has inherent limitations due to occlusions and the 2D-3D ambiguities, especially for perception with a monocular camera. In contrast, touch gets precise local shape information, though its efficiency for reconstructing the entire shape could be low. In this paper, we propose a novel paradigm that efficiently perceives accurate 3D object shape by incorporating visual and tactile observations, as well as prior knowledge of common object shapes learned from large-scale shape repositories. We use vision first, applying neural networks with learned shape priors to predict an object's 3D shape from a single-view color image. We then use tactile sensing to refine the shape; the robot actively touches the object regions where the visual prediction has high uncertainty. Our method efficiently builds the 3D shape of common objects from a color image and a small number of tactile explorations (around 10). Our setup is easy to apply and has potentials to help robots better perform gras** or manipulation tasks on real-world objects.
△ Less
Submitted 9 August, 2018;
originally announced August 2018.
-
More Than a Feeling: Learning to Grasp and Regrasp using Vision and Touch
Authors:
Roberto Calandra,
Andrew Owens,
Dinesh Jayaraman,
Justin Lin,
Wenzhen Yuan,
Jitendra Malik,
Edward H. Adelson,
Sergey Levine
Abstract:
For humans, the process of gras** an object relies heavily on rich tactile feedback. Most recent robotic gras** work, however, has been based only on visual input, and thus cannot easily benefit from feedback after initiating contact. In this paper, we investigate how a robot can learn to use tactile information to iteratively and efficiently adjust its grasp. To this end, we propose an end-to…
▽ More
For humans, the process of gras** an object relies heavily on rich tactile feedback. Most recent robotic gras** work, however, has been based only on visual input, and thus cannot easily benefit from feedback after initiating contact. In this paper, we investigate how a robot can learn to use tactile information to iteratively and efficiently adjust its grasp. To this end, we propose an end-to-end action-conditional model that learns regras** policies from raw visuo-tactile data. This model -- a deep, multimodal convolutional network -- predicts the outcome of a candidate grasp adjustment, and then executes a grasp by iteratively selecting the most promising actions. Our approach requires neither calibration of the tactile sensors, nor any analytical modeling of contact forces, thus reducing the engineering effort required to obtain efficient gras** policies. We train our model with data from about 6,450 gras** trials on a two-finger gripper equipped with GelSight high-resolution tactile sensors on each finger. Across extensive experiments, our approach outperforms a variety of baselines at (i) estimating grasp adjustment outcomes, (ii) selecting efficient grasp adjustments for quick gras**, and (iii) reducing the amount of force applied at the fingers, while maintaining competitive performance. Finally, we study the choices made by our model and show that it has successfully acquired useful and interpretable gras** behaviors.
△ Less
Submitted 26 July, 2018; v1 submitted 28 May, 2018;
originally announced May 2018.
-
The Feeling of Success: Does Touch Sensing Help Predict Grasp Outcomes?
Authors:
Roberto Calandra,
Andrew Owens,
Manu Upadhyaya,
Wenzhen Yuan,
Justin Lin,
Edward H. Adelson,
Sergey Levine
Abstract:
A successful grasp requires careful balancing of the contact forces. Deducing whether a particular grasp will be successful from indirect measurements, such as vision, is therefore quite challenging, and direct sensing of contacts through touch sensing provides an appealing avenue toward more successful and consistent robotic gras**. However, in order to fully evaluate the value of touch sensing…
▽ More
A successful grasp requires careful balancing of the contact forces. Deducing whether a particular grasp will be successful from indirect measurements, such as vision, is therefore quite challenging, and direct sensing of contacts through touch sensing provides an appealing avenue toward more successful and consistent robotic gras**. However, in order to fully evaluate the value of touch sensing for grasp outcome prediction, we must understand how touch sensing can influence outcome prediction accuracy when combined with other modalities. Doing so using conventional model-based techniques is exceptionally difficult. In this work, we investigate the question of whether touch sensing aids in predicting grasp outcomes within a multimodal sensing framework that combines vision and touch. To that end, we collected more than 9,000 gras** trials using a two-finger gripper equipped with GelSight high-resolution tactile sensors on each finger, and evaluated visuo-tactile deep neural network models to directly predict grasp outcomes from either modality individually, and from both modalities together. Our experimental results indicate that incorporating tactile readings substantially improve gras** performance.
△ Less
Submitted 16 October, 2017;
originally announced October 2017.
-
Shape-independent Hardness Estimation Using Deep Learning and a GelSight Tactile Sensor
Authors:
Wenzhen Yuan,
Chenzhuo Zhu,
Andrew Owens,
Mandayam A. Srinivasan,
Edward H. Adelson
Abstract:
Hardness is among the most important attributes of an object that humans learn about through touch. However, approaches for robots to estimate hardness are limited, due to the lack of information provided by current tactile sensors. In this work, we address these limitations by introducing a novel method for hardness estimation, based on the GelSight tactile sensor, and the method does not require…
▽ More
Hardness is among the most important attributes of an object that humans learn about through touch. However, approaches for robots to estimate hardness are limited, due to the lack of information provided by current tactile sensors. In this work, we address these limitations by introducing a novel method for hardness estimation, based on the GelSight tactile sensor, and the method does not require accurate control of contact conditions or the shape of objects. A GelSight has a soft contact interface, and provides high resolution tactile images of contact geometry, as well as contact force and slip conditions. In this paper, we try to use the sensor to measure hardness of objects with multiple shapes, under a loosely controlled contact condition. The contact is made manually or by a robot hand, while the force and trajectory are unknown and uneven. We analyze the data using a deep constitutional (and recurrent) neural network. Experiments show that the neural net model can estimate the hardness of objects with different shapes and hardness ranging from 8 to 87 in Shore 00 scale.
△ Less
Submitted 12 April, 2017;
originally announced April 2017.
-
Visually Indicated Sounds
Authors:
Andrew Owens,
Phillip Isola,
Josh McDermott,
Antonio Torralba,
Edward H. Adelson,
William T. Freeman
Abstract:
Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people…
▽ More
Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object's material properties, as well as the actions that produced them. In this paper, we propose the task of predicting what sound an object makes when struck as a way of studying physical interactions within a visual scene. We present an algorithm that synthesizes sound from silent videos of people hitting and scratching objects with a drumstick. This algorithm uses a recurrent neural network to predict sound features from videos and then produces a waveform from these features with an example-based synthesis procedure. We show that the sounds predicted by our model are realistic enough to fool participants in a "real or fake" psychophysical experiment, and that they convey significant information about material properties and physical interactions.
△ Less
Submitted 29 April, 2016; v1 submitted 28 December, 2015;
originally announced December 2015.
-
Learning visual groups from co-occurrences in space and time
Authors:
Phillip Isola,
Daniel Zoran,
Dilip Krishnan,
Edward H. Adelson
Abstract:
We propose a self-supervised framework that learns to group visual entities based on their rate of co-occurrence in space and time. To model statistical dependencies between the entities, we set up a simple binary classification problem in which the goal is to predict if two visual primitives occur in the same spatial or temporal context. We apply this framework to three domains: learning patch af…
▽ More
We propose a self-supervised framework that learns to group visual entities based on their rate of co-occurrence in space and time. To model statistical dependencies between the entities, we set up a simple binary classification problem in which the goal is to predict if two visual primitives occur in the same spatial or temporal context. We apply this framework to three domains: learning patch affinities from spatial adjacency in images, learning frame affinities from temporal adjacency in videos, and learning photo affinities from geospatial proximity in image collections. We demonstrate that in each case the learned affinities uncover meaningful semantic grou**s. From patch affinities we generate object proposals that are competitive with state-of-the-art supervised methods. From frame affinities we generate movie scene segmentations that correlate well with DVD chapter structure. Finally, from geospatial affinities we learn groups that relate well to semantic place categories.
△ Less
Submitted 20 November, 2015;
originally announced November 2015.
-
Sparkle Vision: Seeing the World through Random Specular Microfacets
Authors:
Zhengdong Zhang,
Phillip Isola,
Edward H. Adelson
Abstract:
In this paper, we study the problem of reproducing the world lighting from a single image of an object covered with random specular microfacets on the surface. We show that such reflectors can be interpreted as a randomized map** from the lighting to the image. Such specular objects have very different optical properties from both diffuse surfaces and smooth specular objects like metals, so we d…
▽ More
In this paper, we study the problem of reproducing the world lighting from a single image of an object covered with random specular microfacets on the surface. We show that such reflectors can be interpreted as a randomized map** from the lighting to the image. Such specular objects have very different optical properties from both diffuse surfaces and smooth specular objects like metals, so we design special imaging system to robustly and effectively photograph them. We present simple yet reliable algorithms to calibrate the proposed system and do the inference. We conduct experiments to verify the correctness of our model assumptions and prove the effectiveness of our pipeline.
△ Less
Submitted 25 December, 2014;
originally announced December 2014.