Search | arXiv e-print repository

Video Diffusion Alignment via Reward Gradients

Authors: Mihir Prabhudesai, Russell Mendonca, Zheyang Qin, Katerina Fragkiadaki, Deepak Pathak

Abstract: We have made significant progress towards building foundational video diffusion models. As these models are trained using large-scale unsupervised data, it has become crucial to adapt these models to specific downstream tasks. Adapting these models via supervised fine-tuning requires collecting target datasets of videos, which is challenging and tedious. In this work, we utilize pre-trained reward… ▽ More We have made significant progress towards building foundational video diffusion models. As these models are trained using large-scale unsupervised data, it has become crucial to adapt these models to specific downstream tasks. Adapting these models via supervised fine-tuning requires collecting target datasets of videos, which is challenging and tedious. In this work, we utilize pre-trained reward models that are learned via preferences on top of powerful vision discriminative models to adapt video diffusion models. These models contain dense gradient information with respect to generated RGB pixels, which is critical to efficient learning in complex search spaces, such as videos. We show that backpropagating gradients from these reward models to a video diffusion model can allow for compute and sample efficient alignment of the video diffusion model. We show results across a variety of reward models and video diffusion models, demonstrating that our approach can learn much more efficiently in terms of reward queries and computation than prior gradient-free approaches. Our code, model weights,and more visualization are available at https://vader-vid.github.io. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: Project Webpage: https://vader-vid.github.io; Code available at: https://github.com/mihirp1998/VADER

arXiv:2401.14403 [pdf, other]

Adaptive Mobile Manipulation for Articulated Objects In the Open World

Authors: Haoyu Xiong, Russell Mendonca, Kenneth Shaw, Deepak Pathak

Abstract: Deploying robots in open-ended unstructured environments such as homes has been a long-standing research problem. However, robots are often studied only in closed-off lab settings, and prior mobile manipulation work is restricted to pick-move-place, which is arguably just the tip of the iceberg in this area. In this paper, we introduce Open-World Mobile Manipulation System, a full-stack approach t… ▽ More Deploying robots in open-ended unstructured environments such as homes has been a long-standing research problem. However, robots are often studied only in closed-off lab settings, and prior mobile manipulation work is restricted to pick-move-place, which is arguably just the tip of the iceberg in this area. In this paper, we introduce Open-World Mobile Manipulation System, a full-stack approach to tackle realistic articulated object operation, e.g. real-world doors, cabinets, drawers, and refrigerators in open-ended unstructured environments. The robot utilizes an adaptive learning framework to initially learns from a small set of data through behavior cloning, followed by learning from online practice on novel objects that fall outside the training distribution. We also develop a low-cost mobile manipulation hardware platform capable of safe and autonomous online adaptation in unstructured environments with a cost of around 20,000 USD. In our experiments we utilize 20 articulate objects across 4 buildings in the CMU campus. With less than an hour of online learning for each object, the system is able to increase success rate from 50% of BC pre-training to 95% using online adaptation. Video results at https://open-world-mobilemanip.github.io/ △ Less

Submitted 28 January, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: Website at https://open-world-mobilemanip.github.io/

arXiv:2310.08864 [pdf, other]

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Authors: Open X-Embodiment Collaboration, Abby O'Neill, Abdul Rehman, Abhinav Gupta, Abhiram Maddukuri, Abhishek Gupta, Abhishek Padalkar, Abraham Lee, Acorn Pooley, Agrim Gupta, Ajay Mandlekar, A**kya Jain, Albert Tung, Alex Bewley, Alex Herzog, Alex Irpan, Alexander Khazatsky, Anant Rai, Anchit Gupta, Andrew Wang, Andrey Kolobov, Anikait Singh, Animesh Garg, Aniruddha Kembhavi, Annie Xie , et al. (267 additional authors not shown)

Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning method… ▽ More Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train generalist X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. More details can be found on the project website https://robotics-transformer-x.github.io. △ Less

Submitted 1 June, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: Project website: https://robotics-transformer-x.github.io

arXiv:2309.02435 [pdf, other]

Efficient RL via Disentangled Environment and Agent Representations

Authors: Kevin Gmelin, Shikhar Bahl, Russell Mendonca, Deepak Pathak

Abstract: Agents that are aware of the separation between themselves and their environments can leverage this understanding to form effective representations of visual input. We propose an approach for learning such structured representations for RL algorithms, using visual knowledge of the agent, such as its shape or mask, which is often inexpensive to obtain. This is incorporated into the RL objective usi… ▽ More Agents that are aware of the separation between themselves and their environments can leverage this understanding to form effective representations of visual input. We propose an approach for learning such structured representations for RL algorithms, using visual knowledge of the agent, such as its shape or mask, which is often inexpensive to obtain. This is incorporated into the RL objective using a simple auxiliary loss. We show that our method, Structured Environment-Agent Representations, outperforms state-of-the-art model-free approaches over 18 different challenging visual simulation environments spanning 5 different robots. Website at https://sear-rl.github.io/ △ Less

Submitted 5 September, 2023; originally announced September 2023.

Comments: ICML 2023. Website at https://sear-rl.github.io/

arXiv:2308.10901 [pdf, other]

Structured World Models from Human Videos

Authors: Russell Mendonca, Shikhar Bahl, Deepak Pathak

Abstract: We tackle the problem of learning complex, general behaviors directly in the real world. We propose an approach for robots to efficiently learn manipulation skills using only a handful of real-world interaction trajectories from many different settings. Inspired by the success of learning from large-scale datasets in the fields of computer vision and natural language, our belief is that in order t… ▽ More We tackle the problem of learning complex, general behaviors directly in the real world. We propose an approach for robots to efficiently learn manipulation skills using only a handful of real-world interaction trajectories from many different settings. Inspired by the success of learning from large-scale datasets in the fields of computer vision and natural language, our belief is that in order to efficiently learn, a robot must be able to leverage internet-scale, human video data. Humans interact with the world in many interesting ways, which can allow a robot to not only build an understanding of useful actions and affordances but also how these actions affect the world for manipulation. Our approach builds a structured, human-centric action space grounded in visual affordances learned from human videos. Further, we train a world model on human videos and fine-tune on a small amount of robot interaction data without any task supervision. We show that this approach of affordance-space world models enables different robots to learn various manipulation skills in complex settings, in under 30 minutes of interaction. Videos can be found at https://human-world-model.github.io △ Less

Submitted 21 August, 2023; originally announced August 2023.

Comments: RSS 2023. Website at https://human-world-model.github.io

arXiv:2304.08488 [pdf, other]

Affordances from Human Videos as a Versatile Representation for Robotics

Authors: Shikhar Bahl, Russell Mendonca, Lili Chen, Unnat Jain, Deepak Pathak

Abstract: Building a robot that can understand and learn to interact by watching humans has inspired several vision problems. However, despite some successful results on static datasets, it remains unclear how current models can be used on a robot directly. In this paper, we aim to bridge this gap by leveraging videos of human interactions in an environment centric manner. Utilizing internet videos of human… ▽ More Building a robot that can understand and learn to interact by watching humans has inspired several vision problems. However, despite some successful results on static datasets, it remains unclear how current models can be used on a robot directly. In this paper, we aim to bridge this gap by leveraging videos of human interactions in an environment centric manner. Utilizing internet videos of human behavior, we train a visual affordance model that estimates where and how in the scene a human is likely to interact. The structure of these behavioral affordances directly enables the robot to perform many complex tasks. We show how to seamlessly integrate our affordance model with four robot learning paradigms including offline imitation learning, exploration, goal-conditioned learning, and action parameterization for reinforcement learning. We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild. Results, visualizations and videos at https://robo-affordances.github.io/ △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: Accepted at CVPR 2023. Website at https://robo-affordances.github.io/

arXiv:2302.06604 [pdf, other]

ALAN: Autonomously Exploring Robotic Agents in the Real World

Authors: Russell Mendonca, Shikhar Bahl, Deepak Pathak

Abstract: Robotic agents that operate autonomously in the real world need to continuously explore their environment and learn from the data collected, with minimal human supervision. While it is possible to build agents that can learn in such a manner without supervision, current methods struggle to scale to the real world. Thus, we propose ALAN, an autonomously exploring robotic agent, that can perform tas… ▽ More Robotic agents that operate autonomously in the real world need to continuously explore their environment and learn from the data collected, with minimal human supervision. While it is possible to build agents that can learn in such a manner without supervision, current methods struggle to scale to the real world. Thus, we propose ALAN, an autonomously exploring robotic agent, that can perform tasks in the real world with little training and interaction time. This is enabled by measuring environment change, which reflects object movement and ignores changes in the robot position. We use this metric directly as an environment-centric signal, and also maximize the uncertainty of predicted environment change, which provides agent-centric exploration signal. We evaluate our approach on two different real-world play kitchen settings, enabling a robot to efficiently explore and discover manipulation skills, and perform tasks specified via goal images. Website at https://robo-explorer.github.io/ △ Less

Submitted 13 February, 2023; originally announced February 2023.

Comments: ICRA 2023. Website at https://robo-explorer.github.io/

arXiv:2110.09514 [pdf, other]

Discovering and Achieving Goals via World Models

Authors: Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, Deepak Pathak

Abstract: How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever… ▽ More How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zero-shot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goal-reaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at https://orybkin.github.io/lexa/ △ Less

Submitted 18 October, 2021; originally announced October 2021.

Comments: NeurIPS 2021. First two authors contributed equally. Website at https://orybkin.github.io/lexa/

arXiv:2104.14357 [pdf, other]

BlockColdChain: Vaccine Cold Chain Blockchain

Authors: Ronan D. Mendonça, Otávio S. Gomes, Luiz F. M. Vieira, Marcos A. M. Vieira, Alex B. Vieira, José A. M. Nacif

Abstract: In this paper, we propose a blockchain-based cold chain technology for vaccine cooling track. The COVID-19 pandemic has caused the death of millions of people. An important step towards ending the pandemic is vaccination. Vaccines must be kept under control temperature during the whole process, from fabrication to the hands of the health professionals who will immunize the population. However, the… ▽ More In this paper, we propose a blockchain-based cold chain technology for vaccine cooling track. The COVID-19 pandemic has caused the death of millions of people. An important step towards ending the pandemic is vaccination. Vaccines must be kept under control temperature during the whole process, from fabrication to the hands of the health professionals who will immunize the population. However, there are numerous reports of vaccine loss due to temperature variations, and, currently, people getting vaccinated have no control if their vaccine was kept safe. Blockchain is a technology solution that can provide public and verifiable records. We review the World Health Organization (WHO) cool chain and Blockchain technology. Moreover, we describe current IoT temperature monitoring devices and propose Blockcoldchain to track vaccine cold chain using blockchain, thus proving an unalterable vaccine temperature history. Our experimental results using smart contracts demonstrate the system's feasibility. △ Less

Submitted 28 April, 2021; originally announced April 2021.

Comments: 10 pages, 6 figures

arXiv:2101.01229 [pdf, other]

A Survey on Embedding Dynamic Graphs

Authors: Claudio D. T. Barros, Matheus R. F. Mendonça, Alex B. Vieira, Artur Ziviani

Abstract: Embedding static graphs in low-dimensional vector spaces plays a key role in network analytics and inference, supporting applications like node classification, link prediction, and graph visualization. However, many real-world networks present dynamic behavior, including topological evolution, feature evolution, and diffusion. Therefore, several methods for embedding dynamic graphs have been propo… ▽ More Embedding static graphs in low-dimensional vector spaces plays a key role in network analytics and inference, supporting applications like node classification, link prediction, and graph visualization. However, many real-world networks present dynamic behavior, including topological evolution, feature evolution, and diffusion. Therefore, several methods for embedding dynamic graphs have been proposed to learn network representations over time, facing novel challenges, such as time-domain modeling, temporal features to be captured, and the temporal granularity to be embedded. In this survey, we overview dynamic graph embedding, discussing its fundamentals and the recent advances developed so far. We introduce the formal definition of dynamic graph embedding, focusing on the problem setting and introducing a novel taxonomy for dynamic graph embedding input and output. We further explore different dynamic behaviors that may be encompassed by embeddings, classifying by topological evolution, feature evolution, and processes on networks. Afterward, we describe existing techniques and propose a taxonomy for dynamic graph embedding techniques based on algorithmic approaches, from matrix and tensor factorization to deep learning, random walks, and temporal point processes. We also elucidate main applications, including dynamic link prediction, anomaly detection, and diffusion prediction, and we further state some promising research directions in the area. △ Less

Submitted 21 July, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

Comments: 41 pages, 10 figures

MSC Class: 37E25 (Primary) 68T30; 05C62; 58D10 (Secondary) ACM Class: A.1; I.2.6

arXiv:2011.13518 [pdf, other]

Efficient Information Diffusion in Time-Varying Graphs through Deep Reinforcement Learning

Authors: Matheus R. F. Mendonça, André M. S. Barreto, Artur Ziviani

Abstract: Network seeding for efficient information diffusion over time-varying graphs~(TVGs) is a challenging task with many real-world applications. There are several ways to model this spatio-temporal influence maximization problem, but the ultimate goal is to determine the best moment for a node to start the diffusion process. In this context, we propose Spatio-Temporal Influence Maximization~(STIM), a… ▽ More Network seeding for efficient information diffusion over time-varying graphs~(TVGs) is a challenging task with many real-world applications. There are several ways to model this spatio-temporal influence maximization problem, but the ultimate goal is to determine the best moment for a node to start the diffusion process. In this context, we propose Spatio-Temporal Influence Maximization~(STIM), a model trained with Reinforcement Learning and Graph Embedding over a set of artificial TVGs that is capable of learning the temporal behavior and connectivity pattern of each node, allowing it to predict the best moment to start a diffusion through the TVG. We also develop a special set of artificial TVGs used for training that simulate a stochastic diffusion process in TVGs, showing that the STIM network can learn an efficient policy even over a non-deterministic environment. STIM is also evaluated with a real-world TVG, where it also manages to efficiently propagate information through the nodes. Finally, we also show that the STIM model has a time complexity of $O(|E|)$. STIM, therefore, presents a novel approach for efficient information diffusion in TVGs, being highly versatile, where one can change the goal of the model by simply changing the adopted reward function. △ Less

Submitted 26 November, 2020; originally announced November 2020.

Comments: 17 pages, 5 figures

arXiv:2006.16392 [pdf, other]

doi 10.1109/TNSE.2020.3035352

Approximating Network Centrality Measures Using Node Embedding and Machine Learning

Authors: Matheus R. F. Mendonça, André M. S. Barreto, Artur Ziviani

Abstract: Extracting information from real-world large networks is a key challenge nowadays. For instance, computing a node centrality may become unfeasible depending on the intended centrality due to its computational cost. One solution is to develop fast methods capable of approximating network centralities. Here, we propose an approach for efficiently approximating node centralities for large networks us… ▽ More Extracting information from real-world large networks is a key challenge nowadays. For instance, computing a node centrality may become unfeasible depending on the intended centrality due to its computational cost. One solution is to develop fast methods capable of approximating network centralities. Here, we propose an approach for efficiently approximating node centralities for large networks using Neural Networks and Graph Embedding techniques. Our proposed model, entitled Network Centrality Approximation using Graph Embedding (NCA-GE), uses the adjacency matrix of a graph and a set of features for each node (here, we use only the degree) as input and computes the approximate desired centrality rank for every node. NCA-GE has a time complexity of $O(|E|)$, $E$ being the set of edges of a graph, making it suitable for large networks. NCA-GE also trains pretty fast, requiring only a set of a thousand small synthetic scale-free graphs (ranging from 100 to 1000 nodes each), and it works well for different node centralities, network sizes, and topologies. Finally, we compare our approach to the state-of-the-art method that approximates centrality ranks using the degree and eigenvector centralities as input, where we show that the NCA-GE outperforms the former in a variety of scenarios. △ Less

Submitted 1 November, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

Comments: 16 pages, 4 figures

arXiv:2006.07178 [pdf, other]

Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling

Authors: Russell Mendonca, Xinyang Geng, Chelsea Finn, Sergey Levine

Abstract: Reinforcement learning algorithms can acquire policies for complex tasks autonomously. However, the number of samples required to learn a diverse set of skills can be prohibitively large. While meta-reinforcement learning methods have enabled agents to leverage prior experience to adapt quickly to new tasks, their performance depends crucially on how close the new task is to the previously experie… ▽ More Reinforcement learning algorithms can acquire policies for complex tasks autonomously. However, the number of samples required to learn a diverse set of skills can be prohibitively large. While meta-reinforcement learning methods have enabled agents to leverage prior experience to adapt quickly to new tasks, their performance depends crucially on how close the new task is to the previously experienced tasks. Current approaches are either not able to extrapolate well, or can do so at the expense of requiring extremely large amounts of data for on-policy meta-training. In this work, we present model identification and experience relabeling (MIER), a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data, more easily than policies and value functions. These dynamics models can then be used to continue training policies and value functions for out-of-distribution tasks without using meta-reinforcement learning at all, by generating synthetic experience for the new task. △ Less

Submitted 15 June, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

arXiv:1904.00956 [pdf, ps, other]

Guided Meta-Policy Search

Authors: Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, Chelsea Finn

Abstract: Reinforcement learning (RL) algorithms have demonstrated promising results on complex tasks, yet often require impractical numbers of samples since they learn from scratch. Meta-RL aims to address this challenge by leveraging experience from previous tasks so as to more quickly solve new tasks. However, in practice, these algorithms generally also require large amounts of on-policy experience duri… ▽ More Reinforcement learning (RL) algorithms have demonstrated promising results on complex tasks, yet often require impractical numbers of samples since they learn from scratch. Meta-RL aims to address this challenge by leveraging experience from previous tasks so as to more quickly solve new tasks. However, in practice, these algorithms generally also require large amounts of on-policy experience during the meta-training process, making them impractical for use in many problems. To this end, we propose to learn a reinforcement learning procedure in a federated way, where individual off-policy learners can solve the individual meta-training tasks, and then consolidate these solutions into a single meta-learner. Since the central meta-learner learns by imitating the solutions to the individual tasks, it can accommodate either the standard meta-RL problem setting or a hybrid setting where some or all tasks are provided with example demonstrations. The former results in an approach that can leverage policies learned for previous tasks without significant amounts of on-policy data during meta-training, whereas the latter is particularly useful in cases where demonstrations are easy for a person to provide. Across a number of continuous control meta-RL problems, we demonstrate significant improvements in meta-RL sample efficiency in comparison to prior work as well as the ability to scale to domains with visual observations. △ Less

Submitted 27 October, 2020; v1 submitted 1 April, 2019; originally announced April 2019.

Comments: Published at Neurips 2019. Website : https://sites.google.com/berkeley.edu/guided-metapolicy-search

arXiv:1812.07613 [pdf]

Proceedings of the Workshop on Social Robots in Therapy: Focusing on Autonomy and Ethical Challenges

Authors: Pablo G. Esteban, Daniel Hernández García, Hee Rin Lee, Pauline Chevalier, Paul Baxter, Cindy L. Bethel, Jainendra Shukla, Joan Oliver, Domènec Puig, Jason R. Wilson, Linda Tickle-Degnen, Madeleine Bartlett, Tony Belpaeme, Serge Thill, Kim Baraka, Francisco S. Melo, Manuela Veloso, David Becerra, Maja Matarić, Eduard Fosch-Villaronga, Jordi Albo-Canals, Gloria Beraldo, Emanuele Menegatti, Valentina De Tommasi, Roberto Mancin , et al. (13 additional authors not shown)

Abstract: Robot-Assisted Therapy (RAT) has successfully been used in HRI research by including social robots in health-care interventions by virtue of their ability to engage human users both social and emotional dimensions. Research projects on this topic exist all over the globe in the USA, Europe, and Asia. All of these projects have the overall ambitious goal to increase the well-being of a vulnerable p… ▽ More Robot-Assisted Therapy (RAT) has successfully been used in HRI research by including social robots in health-care interventions by virtue of their ability to engage human users both social and emotional dimensions. Research projects on this topic exist all over the globe in the USA, Europe, and Asia. All of these projects have the overall ambitious goal to increase the well-being of a vulnerable population. Typical work in RAT is performed using remote controlled robots; a technique called Wizard-of-Oz (WoZ). The robot is usually controlled, unbeknownst to the patient, by a human operator. However, WoZ has been demonstrated to not be a sustainable technique in the long-term. Providing the robots with autonomy (while remaining under the supervision of the therapist) has the potential to lighten the therapists burden, not only in the therapeutic session itself but also in longer-term diagnostic tasks. Therefore, there is a need for exploring several degrees of autonomy in social robots used in therapy. Increasing the autonomy of robots might also bring about a new set of challenges. In particular, there will be a need to answer new ethical questions regarding the use of robots with a vulnerable population, as well as a need to ensure ethically-compliant robot behaviours. Therefore, in this workshop we want to gather findings and explore which degree of autonomy might help to improve health-care interventions and how we can overcome the ethical challenges inherent to it. △ Less

Submitted 18 December, 2018; originally announced December 2018.

Comments: 25 pages, editors for the proceedings: Pablo G. Esteban, Daniel Hernández García, Hee Rin Lee, Pauline Chevalier, Paul Baxter, Cindy Bethel

arXiv:1812.05806 [pdf, other]

A Self-Supervised Bootstrap Method for Single-Image 3D Face Reconstruction

Authors: Yifan Xing, Rahul Tewari, Paulo R. S. Mendonca

Abstract: State-of-the-art methods for 3D reconstruction of faces from a single image require 2D-3D pairs of ground-truth data for supervision. Such data is costly to acquire, and most datasets available in the literature are restricted to pairs for which the input 2D images depict faces in a near fronto-parallel pose. Therefore, many data-driven methods for single-image 3D facial reconstruction perform poo… ▽ More State-of-the-art methods for 3D reconstruction of faces from a single image require 2D-3D pairs of ground-truth data for supervision. Such data is costly to acquire, and most datasets available in the literature are restricted to pairs for which the input 2D images depict faces in a near fronto-parallel pose. Therefore, many data-driven methods for single-image 3D facial reconstruction perform poorly on profile and near-profile faces. We propose a method to improve the performance of single-image 3D facial reconstruction networks by utilizing the network to synthesize its own training data for fine-tuning, comprising: (i) single-image 3D reconstruction of faces in near-frontal images without ground-truth 3D shape; (ii) application of a rigid-body transformation to the reconstructed face model; (iii) rendering of the face model from new viewpoints; and (iv) use of the rendered image and corresponding 3D reconstruction as additional data for supervised fine-tuning. The new 2D-3D pairs thus produced have the same high-quality observed for near fronto-parallel reconstructions, thereby nudging the network towards more uniform performance as a function of the viewing angle of input faces. Application of the proposed technique to the fine-tuning of a state-of-the-art single-image 3D-reconstruction network for faces demonstrates the usefulness of the method, with particularly significant gains for profile or near-profile views. △ Less

Submitted 17 December, 2018; v1 submitted 14 December, 2018; originally announced December 2018.

arXiv:1806.08331 [pdf, other]

doi 10.1007/s10846-019-01033-x

Monocular Trail Detection and Tracking Aided by Visual SLAM for Small Unmanned Aerial Vehicles

Authors: André Silva, Ricardo Mendonça, Pedro Santana

Abstract: This paper presents a monocular vision system susceptible of being installed in unmanned small and medium-sized aerial vehicles built to perform missions in forest environments (e.g., search and rescue). The proposed system extends a previous monocular-based technique for trail detection and tracking so as to take into account volumetric data acquired from a Visual SLAM algorithm and, as a result,… ▽ More This paper presents a monocular vision system susceptible of being installed in unmanned small and medium-sized aerial vehicles built to perform missions in forest environments (e.g., search and rescue). The proposed system extends a previous monocular-based technique for trail detection and tracking so as to take into account volumetric data acquired from a Visual SLAM algorithm and, as a result, to increase its sturdiness upon challenging trails. The experimental results, obtained via a set of 12 videos recorded with a camera installed in a tele-operated, unmanned small-sized aerial vehicle, show the ability of the proposed system to overcome some of the difficulties of the original detector, attaining a success rate of $97.8\,\%$. △ Less

Submitted 21 June, 2018; originally announced June 2018.

Journal ref: Journal of Intelligent & Robotic Systems, 2019

arXiv:1802.07245 [pdf, other]

Meta-Reinforcement Learning of Structured Exploration Strategies

Authors: Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine

Abstract: Exploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this wor… ▽ More Exploration is a fundamental challenge in reinforcement learning (RL). Many of the current exploration methods for deep RL use task-agnostic objectives, such as information gain or bonuses based on state visitation. However, many practical applications of RL involve learning more than a single task, and prior tasks can be used to inform how exploration should be performed in new tasks. In this work, we explore how prior tasks can inform an agent about how to explore effectively in new situations. We introduce a novel gradient-based fast adaptation algorithm -- model agnostic exploration with structured noise (MAESN) -- to learn exploration strategies from prior experience. The prior experience is used both to initialize a policy and to acquire a latent exploration space that can inject structured stochasticity into a policy, producing exploration strategies that are informed by prior knowledge and are more effective than random action-space noise. We show that MAESN is more effective at learning exploration strategies when compared to prior meta-RL methods, RL without learned exploration strategies, and task-agnostic exploration methods. We evaluate our method on a variety of simulated tasks: locomotion with a wheeled robot, locomotion with a quadrupedal walker, and object manipulation. △ Less

Submitted 20 February, 2018; originally announced February 2018.

Showing 1–18 of 18 results for author: Mendonça, R