-
Target Networks and Over-parameterization Stabilize Off-policy Bootstrap** with Function Approximation
Authors:
Fengdi Che,
Chenjun Xiao,
**cheng Mei,
Bo Dai,
Ramki Gummadi,
Oscar A Ramirez,
Christopher K Harris,
A. Rupam Mahmood,
Dale Schuurmans
Abstract:
We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision pr…
▽ More
We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision processes. Notably, using only a target network or an over-parameterized model does not provide such a convergence guarantee. Additionally, we extend our results to learning with truncated trajectories, showing that convergence is achievable for all tasks with minor modifications, akin to value truncation for the final states in trajectories. Our primary result focuses on temporal difference estimation for prediction, providing high-probability value estimation error bounds and empirical analysis on Baird's counterexample and a Four-room task. Furthermore, we explore the control setting, demonstrating that similar convergence conditions apply to Q-learning.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Feature extraction using Spectral Clustering for Gene Function Prediction using Hierarchical Multi-label Classification
Authors:
Miguel Romero,
Oscar Ramírez,
Jorge Finke,
Camilo Rocha
Abstract:
Gene annotation addresses the problem of predicting unknown associations between gene and functions (e.g., biological processes) of a specific organism. Despite recent advances, the cost and time demanded by annotation procedures that rely largely on in vivo biological experiments remain prohibitively high. This paper presents a novel in silico approach for to the annotation problem that combines…
▽ More
Gene annotation addresses the problem of predicting unknown associations between gene and functions (e.g., biological processes) of a specific organism. Despite recent advances, the cost and time demanded by annotation procedures that rely largely on in vivo biological experiments remain prohibitively high. This paper presents a novel in silico approach for to the annotation problem that combines cluster analysis and hierarchical multi-label classification (HMC). The approach uses spectral clustering to extract new features from the gene co-expression network (GCN) and enrich the prediction task. HMC is used to build multiple estimators that consider the hierarchical structure of gene functions. The proposed approach is applied to a case study on Zea mays, one of the most dominant and productive crops in the world. The results illustrate how in silico approaches are key to reduce the time and costs of gene annotation. More specifically, they highlight the importance of: (i) building new features that represent the structure of gene relationships in GCNs to annotate genes; and (ii) taking into account the structure of biological processes to obtain consistent predictions.
△ Less
Submitted 28 April, 2022; v1 submitted 25 March, 2022;
originally announced March 2022.
-
Implicit Behavioral Cloning
Authors:
Pete Florence,
Corey Lynch,
Andy Zeng,
Oscar Ramirez,
Ayzaan Wahid,
Laura Downs,
Adrian Wong,
Johnny Lee,
Igor Mordatch,
Jonathan Tompson
Abstract:
We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counter…
▽ More
We find that across a wide range of robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used explicit models. We present extensive experiments on this finding, and we provide both intuitive insight and theoretical arguments distinguishing the properties of implicit models compared to their explicit counterparts, particularly with respect to approximating complex, potentially discontinuous and multi-valued (set-valued) functions. On robotic policy learning tasks we show that implicit behavioral cloning policies with energy-based models (EBM) often outperform common explicit (Mean Square Error, or Mixture Density) behavioral cloning policies, including on tasks with high-dimensional action spaces and visual image inputs. We find these policies provide competitive results or outperform state-of-the-art offline reinforcement learning methods on the challenging human-expert tasks from the D4RL benchmark suite, despite using no reward information. In the real world, robots with implicit policies can learn complex and remarkably subtle behaviors on contact-rich tasks from human demonstrations, including tasks with high combinatorial complexity and tasks requiring 1mm precision.
△ Less
Submitted 31 August, 2021;
originally announced September 2021.
-
Park4U Mate: Context-Aware Digital Assistant for Personalized Autonomous Parking
Authors:
Antonyo Musabini,
Evin Bozbayir,
Hervé Marcasuzaa,
Omar Adair Islas Ramírez
Abstract:
People park their vehicle depending on interior and exterior contexts. They do it naturally, even unconsciously. For instance, with a baby seat on the rear, the driver might leave more space on one side to be able to get the baby out easily; or when grocery shop**, s/he may position the vehicle to remain the trunk accessible. Autonomous vehicles are becoming technically effective at driving from…
▽ More
People park their vehicle depending on interior and exterior contexts. They do it naturally, even unconsciously. For instance, with a baby seat on the rear, the driver might leave more space on one side to be able to get the baby out easily; or when grocery shop**, s/he may position the vehicle to remain the trunk accessible. Autonomous vehicles are becoming technically effective at driving from A to B and parking in a proper spot, with a default way. However, in order to satisfy users' expectations and to become trustworthy, they will also need to park or make a temporary stop, appropriate to the given situation. In addition, users want to understand better the capabilities of their driving assistance features, such as automated parking systems. A voice-based interface can help with this and even ease the adoption of these features. Therefore, we developed a voice-based in-car assistant (Park4U Mate), that is aware of interior and exterior contexts (thanks to a variety of sensors), and that is able to park autonomously in a smart way (with a constraints minimization strategy). The solution was demonstrated to thirty-five users in test-drives and their feedback was collected on the system's decision-making capability as well as on the human-machine-interaction. The results show that: (1) the proposed optimization algorithm is efficient at deciding the best parking strategy; hence, autonomous vehicles can adopt it; (2) a voice-based digital assistant for autonomous parking is perceived as a clear and effective interaction method. However, the interaction speed remained the most important criterion for users. In addition, they clearly wish not to be limited on only voice-interaction, to use the automated parking function and rather appreciate a multi-modal interaction.
△ Less
Submitted 16 July, 2021;
originally announced July 2021.
-
The Distracting Control Suite -- A Challenging Benchmark for Reinforcement Learning from Pixels
Authors:
Austin Stone,
Oscar Ramirez,
Kurt Konolige,
Rico Jonschkowski
Abstract:
Robots have to face challenging perceptual settings, including changes in viewpoint, lighting, and background. Current simulated reinforcement learning (RL) benchmarks such as DM Control provide visual input without such complexity, which limits the transfer of well-performing methods to the real world. In this paper, we extend DM Control with three kinds of visual distractions (variations in back…
▽ More
Robots have to face challenging perceptual settings, including changes in viewpoint, lighting, and background. Current simulated reinforcement learning (RL) benchmarks such as DM Control provide visual input without such complexity, which limits the transfer of well-performing methods to the real world. In this paper, we extend DM Control with three kinds of visual distractions (variations in background, color, and camera pose) to produce a new challenging benchmark for vision-based control, and we analyze state of the art RL algorithms in these settings. Our experiments show that current RL methods for vision-based control perform poorly under distractions, and that their performance decreases with increasing distraction complexity, showing that new methods are needed to cope with the visual complexities of the real world. We also find that combinations of multiple distraction types are more difficult than a mere combination of their individual effects.
△ Less
Submitted 7 January, 2021;
originally announced January 2021.
-
PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning
Authors:
Aleksandra Faust,
Oscar Ramirez,
Marek Fiser,
Kenneth Oslund,
Anthony Francis,
James Davidson,
Lydia Tapia
Abstract:
We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling based path planning with reinforcement learning (RL). The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology. Next, the sampling-based planners provide roadmaps which connect robot confi…
▽ More
We present PRM-RL, a hierarchical method for long-range navigation task completion that combines sampling based path planning with reinforcement learning (RL). The RL agents learn short-range, point-to-point navigation policies that capture robot dynamics and task constraints without knowledge of the large-scale topology. Next, the sampling-based planners provide roadmaps which connect robot configurations that can be successfully navigated by the RL agent. The same RL agents are used to control the robot under the direction of the planning, enabling long-range navigation. We use the Probabilistic Roadmaps (PRMs) for the sampling-based planner. The RL agents are constructed using feature-based and deep neural net policies in continuous state and action spaces. We evaluate PRM-RL, both in simulation and on-robot, on two navigation tasks with non-trivial robot dynamics: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments with load displacement constraints. Our results show improvement in task completion over both RL agents on their own and traditional sampling-based planners. In the indoor navigation task, PRM-RL successfully completes up to 215 m long trajectories under noisy sensor conditions, and the aerial cargo delivery completes flights over 1000 m without violating the task constraints in an environment 63 million times larger than used in training.
△ Less
Submitted 16 May, 2018; v1 submitted 11 October, 2017;
originally announced October 2017.