-
PhotoBot: Reference-Guided Interactive Photography via Natural Language
Authors:
Oliver Limoyo,
Jimmy Li,
Dmitriy Rivkin,
Jonathan Kelly,
Gregory Dudek
Abstract:
We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textu…
▽ More
We introduce PhotoBot, a framework for fully automated photo acquisition based on an interplay between high-level human language guidance and a robot photographer. We propose to communicate photography suggestions to the user via reference images that are selected from a curated gallery. We leverage a visual language model (VLM) and an object detector to characterize the reference images via textual descriptions and then use a large language model (LLM) to retrieve relevant reference images based on a user's language query through text-based reasoning. To correspond the reference image and the observed scene, we exploit pre-trained features from a vision transformer capable of capturing semantic similarity across marked appearance variations. Using these features, we compute suggested pose adjustments for an RGB-D camera by solving a perspective-n-point (PnP) problem. We demonstrate our approach using a manipulator equipped with a wrist camera. Our user studies show that photos taken by PhotoBot are often more aesthetically pleasing than those taken by users themselves, as measured by human feedback. We also show that PhotoBot can generalize to other reference sources such as paintings.
△ Less
Submitted 4 July, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Working Backwards: Learning to Place by Picking
Authors:
Oliver Limoyo,
Abhisek Konar,
Trevor Ablett,
Jonathan Kelly,
Francois R. Hogan,
Gregory Dudek
Abstract:
We present placing via picking (PvP), a method to autonomously collect real-world demonstrations for a family of placing tasks in which objects must be manipulated to specific, contact-constrained locations. With PvP, we approach the collection of robotic object placement demonstrations by reversing the gras** process and exploiting the inherent symmetry of the pick and place problems. Specifica…
▽ More
We present placing via picking (PvP), a method to autonomously collect real-world demonstrations for a family of placing tasks in which objects must be manipulated to specific, contact-constrained locations. With PvP, we approach the collection of robotic object placement demonstrations by reversing the gras** process and exploiting the inherent symmetry of the pick and place problems. Specifically, we obtain placing demonstrations from a set of grasp sequences of objects initially located at their target placement locations. Our system can collect hundreds of demonstrations in contact-constrained environments without human intervention using two modules: compliant control for gras** and tactile regras**. We train a policy directly from visual observations through behavioural cloning, using the autonomously-collected demonstrations. By doing so, the policy can generalize to object placement scenarios outside of the training environment without privileged information (e.g., placing a plate picked up from a table). We validate our approach in home robot scenarios that include dishwasher loading and table setting. Our approach yields robotic placing policies that outperform policies trained with kinesthetic teaching, both in terms of success rate and data efficiency, while requiring no human supervision.
△ Less
Submitted 9 July, 2024; v1 submitted 4 December, 2023;
originally announced December 2023.
-
Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor
Authors:
Trevor Ablett,
Oliver Limoyo,
Adam Sigal,
Affan Jilani,
Jonathan Kelly,
Kaleem Siddiqi,
Francois Hogan,
Gregory Dudek
Abstract:
Contact-rich tasks continue to present a variety of challenges for robotic manipulation. In this work, we leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact rich tasks that involve relative motion (slip**/sliding) between the end-effector and object. We introduce two algorithmic contributions, tactile force matching and learned mode switc…
▽ More
Contact-rich tasks continue to present a variety of challenges for robotic manipulation. In this work, we leverage a multimodal visuotactile sensor within the framework of imitation learning (IL) to perform contact rich tasks that involve relative motion (slip**/sliding) between the end-effector and object. We introduce two algorithmic contributions, tactile force matching and learned mode switching, as complimentary methods for improving IL. Tactile force matching enhances kinesthetic teaching by reading approximate forces during the demonstration and generating an adapted robot trajectory that recreates the recorded forces. Learned mode switching uses IL to couple visual and tactile sensor modes with the learned motion policy, simplifying the transition from reaching to contacting. We perform robotic manipulation experiments on four door opening tasks with a variety of observation and method configurations to study the utility of our proposed improvements and multimodal visuotactile sensing. Our results show that the inclusion of force matching raises average policy success rates by 62.5%, visuotactile mode switching by 30.3%, and visuotactile data as a policy input by 42.5%, emphasizing the value of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.
△ Less
Submitted 26 June, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Euclidean Equivariant Models for Generative Graphical Inverse Kinematics
Authors:
Oliver Limoyo,
Filip Marić,
Matthew Giamou,
Petra Alexson,
Ivan Petrović,
Jonathan Kelly
Abstract:
Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for robotic manipulation. Existing numerical solvers typically produce a single solution only and rely on local search techniques to minimize a highly nonconvex objective function. Recently, learning-based approaches that approximate the entire feasible set of solutions have shown promise as a mea…
▽ More
Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for robotic manipulation. Existing numerical solvers typically produce a single solution only and rely on local search techniques to minimize a highly nonconvex objective function. Recently, learning-based approaches that approximate the entire feasible set of solutions have shown promise as a means to generate multiple fast and accurate IK results in parallel. However, existing learning-based techniques have a significant drawback: each robot of interest requires a specialized model that must be trained from scratch. To address this shortcoming, we investigate a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the flexibility of graph neural networks (GNNs). We use this approach to train a generative graphical inverse kinematics solver (GGIK) that is able to produce a large number of diverse solutions in parallel while also generalizing well -- a single learned model can be used to produce IK solutions for a variety of different robots. The graphical formulation elegantly exposes the symmetry and Euclidean equivariance of the IK problem that stems from the spatial nature of robot manipulators. We exploit this symmetry by encoding it into the architecture of our learned model, yielding a flexible solver that is able to produce sets of IK solutions for multiple robots.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.
-
ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence
Authors:
Dmitriy Rivkin,
Gregory Dudek,
Nikhil Kakodkar,
David Meger,
Oliver Limoyo,
Xue Liu,
Francois Hogan
Abstract:
Our work examines the way in which large language models can be used for robotic planning and sampling, specifically the context of automated photographic documentation. Specifically, we illustrate how to produce a photo-taking robot with an exceptional level of semantic awareness by leveraging recent advances in general purpose language (LM) and vision-language (VLM) models. Given a high-level de…
▽ More
Our work examines the way in which large language models can be used for robotic planning and sampling, specifically the context of automated photographic documentation. Specifically, we illustrate how to produce a photo-taking robot with an exceptional level of semantic awareness by leveraging recent advances in general purpose language (LM) and vision-language (VLM) models. Given a high-level description of an event we use an LM to generate a natural-language list of photo descriptions that one would expect a photographer to capture at the event. We then use a VLM to identify the best matches to these descriptions in the robot's video stream. The photo portfolios generated by our method are consistently rated as more appropriate to the event by human evaluators than those generated by existing methods.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Generative Graphical Inverse Kinematics
Authors:
Oliver Limoyo,
Filip Marić,
Matthew Giamou,
Petra Alexson,
Ivan Petrović,
Jonathan Kelly
Abstract:
Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for many robot manipulators. Existing numerical solvers are broadly applicable but typically only produce a single solution and rely on local search techniques to minimize nonconvex objective functions. More recent learning-based approaches that approximate the entire feasible set of solutions hav…
▽ More
Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for many robot manipulators. Existing numerical solvers are broadly applicable but typically only produce a single solution and rely on local search techniques to minimize nonconvex objective functions. More recent learning-based approaches that approximate the entire feasible set of solutions have shown promise as a means to generate multiple fast and accurate IK results in parallel. However, existing learning-based techniques have a significant drawback: each robot of interest requires a specialized model that must be trained from scratch. To address this key shortcoming, we propose a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the sample efficiency of Euclidean equivariant functions and the generalizability of graph neural networks (GNNs). Our approach is generative graphical inverse kinematics (GGIK), the first learned IK solver able to accurately and efficiently produce a large number of diverse solutions in parallel while also displaying the ability to generalize -- a single learned model can be used to produce IK solutions for a variety of different robots. When compared to several other learned IK methods, GGIK provides more accurate solutions with the same amount of data. GGIK can generalize reasonably well to robot manipulators unseen during training. Additionally, GGIK can learn a constrained distribution that encodes joint limits and scales efficiently to larger robots and a high number of sampled solutions. Finally, GGIK can be used to complement local IK solvers by providing reliable initializations for a local optimization process.
△ Less
Submitted 24 March, 2024; v1 submitted 19 September, 2022;
originally announced September 2022.
-
Learning Sequential Latent Variable Models from Multimodal Time Series Data
Authors:
Oliver Limoyo,
Trevor Ablett,
Jonathan Kelly
Abstract:
Sequential modelling of high-dimensional data is an important problem that appears in many domains including model-based reinforcement learning and dynamics identification for control. Latent variable models applied to sequential data (i.e., latent dynamics models) have been shown to be a particularly effective probabilistic approach to solve this problem, especially when dealing with images. Howe…
▽ More
Sequential modelling of high-dimensional data is an important problem that appears in many domains including model-based reinforcement learning and dynamics identification for control. Latent variable models applied to sequential data (i.e., latent dynamics models) have been shown to be a particularly effective probabilistic approach to solve this problem, especially when dealing with images. However, in many application areas (e.g., robotics), information from multiple sensing modalities is available -- existing latent dynamics methods have not yet been extended to effectively make use of such multimodal sequential data. Multimodal sensor streams can be correlated in a useful manner and often contain complementary information across modalities. In this work, we present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics. Using synthetic and real-world datasets from a multimodal robotic planar pushing task, we demonstrate that our approach leads to significant improvements in prediction and representation quality. Furthermore, we compare to the common learning baseline of concatenating each modality in the latent space and show that our principled probabilistic formulation performs better. Finally, despite being fully self-supervised, we demonstrate that our method is nearly as effective as an existing supervised approach that relies on ground truth labels.
△ Less
Submitted 20 January, 2023; v1 submitted 21 April, 2022;
originally announced April 2022.
-
Heteroscedastic Uncertainty for Robust Generative Latent Dynamics
Authors:
Oliver Limoyo,
Bryan Chan,
Filip Marić,
Brandon Wagstaff,
Rupam Mahmood,
Jonathan Kelly
Abstract:
Learning or identifying dynamics from a sequence of high-dimensional observations is a difficult challenge in many domains, including reinforcement learning and control. The problem has recently been studied from a generative perspective through latent dynamics: high-dimensional observations are embedded into a lower-dimensional space in which the dynamics can be learned. Despite some successes, l…
▽ More
Learning or identifying dynamics from a sequence of high-dimensional observations is a difficult challenge in many domains, including reinforcement learning and control. The problem has recently been studied from a generative perspective through latent dynamics: high-dimensional observations are embedded into a lower-dimensional space in which the dynamics can be learned. Despite some successes, latent dynamics models have not yet been applied to real-world robotic systems where learned representations must be robust to a variety of perceptual confounds and noise sources not seen during training. In this paper, we present a method to jointly learn a latent state representation and the associated dynamics that is amenable for long-term planning and closed-loop control under perceptually difficult conditions. As our main contribution, we describe how our representation is able to capture a notion of heteroscedastic or input-specific uncertainty at test time by detecting novel or out-of-distribution (OOD) inputs. We present results from prediction and control experiments on two image-based tasks: a simulated pendulum balancing task and a real-world robotic manipulator reaching task. We demonstrate that our model produces significantly more accurate predictions and exhibits improved control performance, compared to a model that assumes homoscedastic uncertainty only, in the presence of varying degrees of input degradation.
△ Less
Submitted 11 July, 2022; v1 submitted 18 August, 2020;
originally announced August 2020.
-
Fast Manipulability Maximization Using Continuous-Time Trajectory Optimization
Authors:
Filip Marić,
Oliver Limoyo,
Luka Petrović,
Trevor Ablett,
Ivan Petrović,
Jonathan Kelly
Abstract:
A significant challenge in manipulation motion planning is to ensure agility in the face of unpredictable changes during task execution. This requires the identification and possible modification of suitable joint-space trajectories, since the joint velocities required to achieve a specific endeffector motion vary with manipulator configuration. For a given manipulator configuration, the joint spa…
▽ More
A significant challenge in manipulation motion planning is to ensure agility in the face of unpredictable changes during task execution. This requires the identification and possible modification of suitable joint-space trajectories, since the joint velocities required to achieve a specific endeffector motion vary with manipulator configuration. For a given manipulator configuration, the joint space-to-task space velocity map** is characterized by a quantity known as the manipulability index. In contrast to previous control-based approaches, we examine the maximization of manipulability during planning as a way of achieving adaptable and safe joint space-to-task space motion map**s in various scenarios. By representing the manipulator trajectory as a continuous-time Gaussian process (GP), we are able to leverage recent advances in trajectory optimization to maximize the manipulability index during trajectory generation. Moreover, the sparsity of our chosen representation reduces the typically large computational cost associated with maximizing manipulability when additional constraints exist. Results from simulation studies and experiments with a real manipulator demonstrate increases in manipulability, while maintaining smooth trajectories with more dexterous (and therefore more agile) arm configurations.
△ Less
Submitted 1 May, 2020; v1 submitted 8 August, 2019;
originally announced August 2019.
-
Manipulability Maximization Using Continuous-Time Gaussian Processes
Authors:
Filip Marić,
Oliver Limoyo,
Luka Petrović,
Ivan Petrović,
Jonathan Kelly
Abstract:
A significant challenge in motion planning is to avoid being in or near \emph{singular configurations} (\textit{singularities}), that is, joint configurations that result in the loss of the ability to move in certain directions in task space. A robotic system's capacity for motion is reduced even in regions that are in close proximity to (i.e., neighbouring) a singularity. In this work we examine…
▽ More
A significant challenge in motion planning is to avoid being in or near \emph{singular configurations} (\textit{singularities}), that is, joint configurations that result in the loss of the ability to move in certain directions in task space. A robotic system's capacity for motion is reduced even in regions that are in close proximity to (i.e., neighbouring) a singularity. In this work we examine singularity avoidance in a motion planning context, finding trajectories which minimize proximity to singular regions, subject to constraints. We define a manipulability-based likelihood associated with singularity avoidance over a continuous trajectory representation, which we then maximize using a \textit{maximum a posteriori} (MAP) estimator. Viewing the MAP problem as inference on a factor graph, we use gradient information from interpolated states to maximize the trajectory's overall manipulability. Both qualitative and quantitative analyses of experimental data show increases in manipulability that result in smooth trajectories with visibly more dexterous arm configurations.
△ Less
Submitted 11 September, 2018; v1 submitted 26 March, 2018;
originally announced March 2018.
-
Self-Calibration of Mobile Manipulator Kinematic and Sensor Extrinsic Parameters Through Contact-Based Interaction
Authors:
Oliver Limoyo,
Trevor Ablett,
Filip Marić,
Luke Volpatti,
Jonathan Kelly
Abstract:
We present a novel approach for mobile manipulator self-calibration using contact information. Our method, based on point cloud registration, is applied to estimate the extrinsic transform between a fixed vision sensor mounted on a mobile base and an end effector. Beyond sensor calibration, we demonstrate that the method can be extended to include manipulator kinematic model parameters, which invo…
▽ More
We present a novel approach for mobile manipulator self-calibration using contact information. Our method, based on point cloud registration, is applied to estimate the extrinsic transform between a fixed vision sensor mounted on a mobile base and an end effector. Beyond sensor calibration, we demonstrate that the method can be extended to include manipulator kinematic model parameters, which involves a non-rigid registration process. Our procedure uses on-board sensing exclusively and does not rely on any external measurement devices, fiducial markers, or calibration rigs. Further, it is fully automatic in the general case. We experimentally validate the proposed method on a custom mobile manipulator platform, and demonstrate centimetre-level post-calibration accuracy in positioning of the end effector using visual guidance only. We also discuss the stability properties of the registration algorithm, in order to determine the conditions under which calibration is possible.
△ Less
Submitted 22 October, 2018; v1 submitted 16 March, 2018;
originally announced March 2018.