Search | arXiv e-print repository

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Authors: Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, David Lindner

Abstract: Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and gen… ▽ More Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and general approach to using VLMs as reward models, which we call VLM-RMs. We use VLM-RMs based on CLIP to train a MuJoCo humanoid to learn complex tasks without a manually specified reward function, such as kneeling, doing the splits, and sitting in a lotus position. For each of these tasks, we only provide a single sentence text prompt describing the desired task with minimal prompt engineering. We provide videos of the trained agents at: https://sites.google.com/view/vlm-rm. We can improve performance by providing a second "baseline" prompt and projecting out parts of the CLIP embedding space irrelevant to distinguish between goal and baseline. Further, we find a strong scaling effect for VLM-RMs: larger VLMs trained with more compute and data are better reward models. The failure modes of VLM-RMs we encountered are all related to known capability limitations of current VLMs, such as limited spatial reasoning ability or visually unrealistic environments that are far off-distribution for the VLM. We find that VLM-RMs are remarkably robust as long as the VLM is large enough. This suggests that future VLMs will become more and more useful reward models for a wide range of RL applications. △ Less

Submitted 14 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

Comments: Presented at International Conference on Learning Representations (ICLR) 2024

arXiv:2305.12489 [pdf, ps, other]

Lookdown construction for a Moran seed-bank model

Authors: Maria Clara Fittipaldi, Adrián González Casanova, Julio Ernesto Nava

Abstract: We present a lookdown construction for a Moran seed-bank model with variable active and inactive population sizes and we show that the empirical measure of our model coincides with that of the Seed-Bank-Moran Model with latency of Greven, den Hollander and Oomen, 2022. Furthermore, we prove that the time to the most recent common ancestor, starting from $N$ individuals with stationary distribution… ▽ More We present a lookdown construction for a Moran seed-bank model with variable active and inactive population sizes and we show that the empirical measure of our model coincides with that of the Seed-Bank-Moran Model with latency of Greven, den Hollander and Oomen, 2022. Furthermore, we prove that the time to the most recent common ancestor, starting from $N$ individuals with stationary distribution over its state (active or inactive), has the same asymptotic order as the largest inactivity period. We then obtain an asymptotic distribution of the TMRCA, and use this result to find the first order of the asymptotic distribution of the fixation time of a single beneficial mutant conditioned to invade the whole population, which surprisingly is of order $\ln(N)$. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 12 pages, 1 figure

MSC Class: 60K35; 92D15

arXiv:2210.08942 [pdf, other]

Meta-Learning via Classifier(-free) Diffusion Guidance

Authors: Elvis Nava, Sei** Kobayashi, Yifei Yin, Robert K. Katzschmann, Benjamin F. Grewe

Abstract: We introduce meta-learning algorithms that perform zero-shot weight-space adaptation of neural network models to unseen tasks. Our methods repurpose the popular generative image synthesis techniques of natural language guidance and diffusion models to generate neural network weights adapted for tasks. We first train an unconditional generative hypernetwork model to produce neural network weights;… ▽ More We introduce meta-learning algorithms that perform zero-shot weight-space adaptation of neural network models to unseen tasks. Our methods repurpose the popular generative image synthesis techniques of natural language guidance and diffusion models to generate neural network weights adapted for tasks. We first train an unconditional generative hypernetwork model to produce neural network weights; then we train a second "guidance" model that, given a natural language task description, traverses the hypernetwork latent space to find high-performance task-adapted weights in a zero-shot manner. We explore two alternative approaches for latent space guidance: "HyperCLIP"-based classifier guidance and a conditional Hypernetwork Latent Diffusion Model ("HyperLDM"), which we show to benefit from the classifier-free guidance technique common in image generation. Finally, we demonstrate that our approaches outperform existing multi-task and meta-learning methods in a series of zero-shot learning experiments on our Meta-VQA dataset. △ Less

Submitted 31 January, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

arXiv:2204.12584 [pdf, other]

Fast Aquatic Swimmer Optimization with Differentiable Projective Dynamics and Neural Network Hydrodynamic Models

Authors: Elvis Nava, John Z. Zhang, Mike Y. Michelis, Tao Du, **chuan Ma, Benjamin F. Grewe, Wojciech Matusik, Robert K. Katzschmann

Abstract: Aquatic locomotion is a classic fluid-structure interaction (FSI) problem of interest to biologists and engineers. Solving the fully coupled FSI equations for incompressible Navier-Stokes and finite elasticity is computationally expensive. Optimizing robotic swimmer design within such a system generally involves cumbersome, gradient-free procedures on top of the already costly simulation. To addre… ▽ More Aquatic locomotion is a classic fluid-structure interaction (FSI) problem of interest to biologists and engineers. Solving the fully coupled FSI equations for incompressible Navier-Stokes and finite elasticity is computationally expensive. Optimizing robotic swimmer design within such a system generally involves cumbersome, gradient-free procedures on top of the already costly simulation. To address this challenge we present a novel, fully differentiable hybrid approach to FSI that combines a 2D direct numerical simulation for the deformable solid structure of the swimmer and a physics-constrained neural network surrogate to capture hydrodynamic effects of the fluid. For the deformable solid simulation of the swimmer's body, we use state-of-the-art techniques from the field of computer graphics to speed up the finite-element method (FEM). For the fluid simulation, we use a U-Net architecture trained with a physics-based loss function to predict the flow field at each time step. The pressure and velocity field outputs from the neural network are sampled around the boundary of our swimmer using an immersed boundary method (IBM) to compute its swimming motion accurately and efficiently. We demonstrate the computational efficiency and differentiability of our hybrid simulator on a 2D carangiform swimmer. Due to differentiability, the simulator can be used for computational design of controls for soft bodies immersed in fluids via direct gradient-based optimization. △ Less

Submitted 22 June, 2022; v1 submitted 30 March, 2022; originally announced April 2022.

Comments: ICML 2022

arXiv:2110.11665 [pdf, other]

Diversified Sampling for Batched Bayesian Optimization with Determinantal Point Processes

Authors: Elvis Nava, Mojmír Mutný, Andreas Krause

Abstract: In Bayesian Optimization (BO) we study black-box function optimization with noisy point evaluations and Bayesian priors. Convergence of BO can be greatly sped up by batching, where multiple evaluations of the black-box function are performed in a single round. The main difficulty in this setting is to propose at the same time diverse and informative batches of evaluation points. In this work, we i… ▽ More In Bayesian Optimization (BO) we study black-box function optimization with noisy point evaluations and Bayesian priors. Convergence of BO can be greatly sped up by batching, where multiple evaluations of the black-box function are performed in a single round. The main difficulty in this setting is to propose at the same time diverse and informative batches of evaluation points. In this work, we introduce DPP-Batch Bayesian Optimization (DPP-BBO), a universal framework for inducing batch diversity in sampling based BO by leveraging the repulsive properties of Determinantal Point Processes (DPP) to naturally diversify the batch sampling procedure. We illustrate this framework by formulating DPP-Thompson Sampling (DPP-TS) as a variant of the popular Thompson Sampling (TS) algorithm and introducing a Markov Chain Monte Carlo procedure to sample from it. We then prove novel Bayesian simple regret bounds for both classical batched TS as well as our counterpart DPP-TS, with the latter bound being tighter. Our real-world, as well as synthetic, experiments demonstrate improved performance of DPP-BBO over classical batching methods with Gaussian process and Cox process models. △ Less

Submitted 8 February, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: To be published in AISTATS 2022

arXiv:2109.14855 [pdf, other]

doi 10.1109/IROS47612.2022.9981338

Sim2Real for Soft Robotic Fish via Differentiable Simulation

Authors: John Z. Zhang, Yu Zhang, **chuan Ma, Elvis Nava, Tao Du, Philip Arm, Wojciech Matusik, Robert K. Katzschmann

Abstract: Accurate simulation of soft mechanisms under dynamic actuation is critical for the design of soft robots. We address this gap with our differentiable simulation tool by learning the material parameters of our soft robotic fish. On the example of a soft robotic fish, we demonstrate an experimentally-verified, fast optimization pipeline for learning the material parameters from quasi-static data via… ▽ More Accurate simulation of soft mechanisms under dynamic actuation is critical for the design of soft robots. We address this gap with our differentiable simulation tool by learning the material parameters of our soft robotic fish. On the example of a soft robotic fish, we demonstrate an experimentally-verified, fast optimization pipeline for learning the material parameters from quasi-static data via differentiable simulation and apply it to the prediction of dynamic performance. Our method identifies physically plausible Young's moduli for various soft silicone elastomers and stiff acetal copolymers used in creation of our three different robotic fish tail designs. We show that our method is compatible with varying internal geometry of the actuators, such as the number of hollow cavities. Our framework allows high fidelity prediction of dynamic behavior for composite bi-morph bending structures in real hardware to millimeter-accuracy and within 3 percent error normalized to actuator length. We provide a differentiable and robust estimate of the thrust force using a neural network thrust predictor; this estimate allows for accurate modeling of our experimental setup measuring bollard pull. This work presents a prototypical hardware and simulation problem solved using our differentiable framework; the framework can be applied to higher dimensional parameter inference, learning control policies, and computational design due to its differentiable character. △ Less

Submitted 8 November, 2022; v1 submitted 30 September, 2021; originally announced September 2021.

Comments: Published at IROS 2022. 8 pages, 9 figures

arXiv:1912.07752 [pdf, ps, other]

The Boundedness of Alternative General Gaussian Singular Integrals with respect to the Gaussian measure

Authors: Eduard Nava, Ebner Pineda, Wilfredo Urbina

Abstract: In this paper we introduce a new class of Gaussian singular integrals, the general alternative Gaussian singular integrals and study the boundedness of them in Lp, p >1 and its weak (1,1) boundedness with respect to the Gaussian measure following the works of S. Pérez and H. Aimar, L. Forzani and R. Scotto respectively. In this paper we introduce a new class of Gaussian singular integrals, the general alternative Gaussian singular integrals and study the boundedness of them in Lp, p >1 and its weak (1,1) boundedness with respect to the Gaussian measure following the works of S. Pérez and H. Aimar, L. Forzani and R. Scotto respectively. △ Less

Submitted 20 June, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

MSC Class: Primary 42B25; 42B35; Secondary 46E30; 47G10

Showing 1–7 of 7 results for author: Nava, E