-
VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration
Authors:
Michael Ahn,
Montserrat Gonzalez Arenas,
Matthew Bennice,
Noah Brown,
Christine Chan,
Byron David,
Anthony Francis,
Gavin Gonzalez,
Rainer Hessmer,
Tomas Jackson,
Nikhil J Joshi,
Daniel Lam,
Tsang-Wei Edward Lee,
Alex Luong,
Sharath Maddineni,
Harsh Patel,
Jodilyn Peralta,
Jornell Quiambao,
Diego Reyes,
Rosario M Jauregui Ruano,
Dorsa Sadigh,
Pannag Sanketi,
Leila Takayama,
Pavel Vodenski,
Fei Xia
Abstract:
Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon ta…
▽ More
Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon tasks with the help of humans or other robots. VADER leverages visual question answering (VQA) modules to detect visual affordances and recognize execution errors. It then generates prompts for a language model planner (LMP) which decides when to seek help from another robot or human to recover from errors in long-horizon task execution. We show the effectiveness of VADER with two long-horizon robotic tasks. Our pilot study showed that VADER is capable of performing complex long-horizon tasks by asking for help from another robot to clear a table. Our user study showed that VADER is capable of performing complex long-horizon tasks by asking for help from a human to clear a path. We gathered feedback from people (N=19) about the performance of the VADER performance vs. a robot that did not ask for help. https://google-vader.github.io/
△ Less
Submitted 30 May, 2024; v1 submitted 24 May, 2024;
originally announced May 2024.
-
Identifying phase transitions in physical systems with neural networks: a neural architecture search perspective
Authors:
Rodrigo Carmo Terin,
Zochil González Arenas,
Roberto Santana
Abstract:
The use of machine learning algorithms to investigate phase transitions in physical systems is a valuable way to better understand the characteristics of these systems. Neural networks have been used to extract information of phases and phase transitions directly from many-body configurations. However, one limitation of neural networks is that they require the definition of the model architecture…
▽ More
The use of machine learning algorithms to investigate phase transitions in physical systems is a valuable way to better understand the characteristics of these systems. Neural networks have been used to extract information of phases and phase transitions directly from many-body configurations. However, one limitation of neural networks is that they require the definition of the model architecture and parameters previous to their application, and such determination is itself a difficult problem. In this paper, we investigate for the first time the relationship between the accuracy of neural networks for information of phases and the network configuration (that comprises the architecture and hyperparameters). We formulate the phase analysis as a regression task, address the question of generating data that reflects the different states of the physical system, and evaluate the performance of neural architecture search for this task. After obtaining the optimized architectures, we further implement smart data processing and analytics by means of neuron coverage metrics, assessing the capability of these metrics to estimate phase transitions. Our results identify the neuron coverage metric as promising for detecting phase transitions in physical systems.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Effects of term weighting approach with and without stop words removing on Arabic text classification
Authors:
Esra'a Alhenawi,
Ruba Abu Khurma,
Pedro A. Castillo,
Maribel G. Arenas
Abstract:
Classifying text is a method for categorizing documents into pre-established groups. Text documents must be prepared and represented in a way that is appropriate for the algorithms used for data mining prior to classification. As a result, a number of term weighting strategies have been created in the literature to enhance text categorization algorithms' functionality. This study compares the effe…
▽ More
Classifying text is a method for categorizing documents into pre-established groups. Text documents must be prepared and represented in a way that is appropriate for the algorithms used for data mining prior to classification. As a result, a number of term weighting strategies have been created in the literature to enhance text categorization algorithms' functionality. This study compares the effects of Binary and Term frequency weighting feature methodologies on the text's classification method when stop words are eliminated once and when they are not. In recognition of assessing the effects of prior weighting of features approaches on classification results in terms of accuracy, recall, precision, and F-measure values, we used an Arabic data set made up of 322 documents divided into six main topics (agriculture, economy, health, politics, science, and sport), each of which contains 50 documents, with the exception of the health category, which contains 61 documents. The results demonstrate that for all metrics, the term frequency feature weighting approach with stop word removal outperforms the binary approach, while for accuracy, recall, and F-Measure, the binary approach outperforms the TF approach without stop word removal. However, for precision, the two approaches produce results that are very similar. Additionally, it is clear from the data that, using the same phrase weighting approach, stop word removing increases classification accuracy.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Authors:
Jacky Liang,
Fei Xia,
Wenhao Yu,
Andy Zeng,
Montserrat Gonzalez Arenas,
Maria Attarian,
Maria Bauza,
Matthew Bennice,
Alex Bewley,
Adil Dostmohamed,
Chuyuan Kelly Fu,
Nimrod Gileadi,
Marissa Giustina,
Keerthana Gopalakrishnan,
Leonard Hasenclever,
Jan Humplik,
Jasmine Hsu,
Nikhil Joshi,
Ben Jyenis,
Chase Kew,
Sean Kirmani,
Tsang-Wei Edward Lee,
Kuang-Huei Lee,
Assaf Hurwitz Michaely,
Joss Moore
, et al. (25 additional authors not shown)
Abstract:
Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for o…
▽ More
Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for only as long as it fits within the context size of the LLM, and can be forgotten over longer interactions. In this work, we investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their teachability i.e., how efficiently they adapt to human inputs (measured by average number of corrections before the user considers the task successful). Our key observation is that when human-robot interactions are viewed as a partially observable Markov decision process (in which human language inputs are observations, and robot code outputs are actions), then training an LLM to complete previous interactions is training a transition dynamics model -- that can be combined with classic robotics techniques such as model predictive control (MPC) to discover shorter paths to success. This gives rise to Language Model Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments -- improving non-expert teaching success rates of unseen tasks by 26.9% while reducing the average number of human corrections from 2.4 to 1.9. Experiments show that LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. See videos, code, and demos at: https://robot-teaching.github.io/.
△ Less
Submitted 31 May, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Determining the significance and relative importance of parameters of a simulated quenching algorithm using statistical tools
Authors:
Pedro A. Castillo,
Maribel García Arenas,
Nuria Rico,
Antonio Miguel Mora,
Pablo García-Sánchez,
Juan Luis Jiménez Laredo,
Juan Julián Merelo Guervós
Abstract:
When search methods are being designed it is very important to know which parameters have the greatest influence on the behaviour and performance of the algorithm. To this end, algorithm parameters are commonly calibrated by means of either theoretic analysis or intensive experimentation. When undertaking a detailed statistical analysis of the influence of each parameter, the designer should pay a…
▽ More
When search methods are being designed it is very important to know which parameters have the greatest influence on the behaviour and performance of the algorithm. To this end, algorithm parameters are commonly calibrated by means of either theoretic analysis or intensive experimentation. When undertaking a detailed statistical analysis of the influence of each parameter, the designer should pay attention mostly to the parameters that are statistically significant. In this paper the ANOVA (ANalysis Of the VAriance) method is used to carry out an exhaustive analysis of a simulated annealing based method and the different parameters it requires. Following this idea, the significance and relative importance of the parameters regarding the obtained results, as well as suitable values for each of these, were obtained using ANOVA and post-hoc Tukey HSD test, on four well known function optimization problems and the likelihood function that is used to estimate the parameters involved in the lognormal diffusion process. Through this statistical study we have verified the adequacy of parameter values available in the bibliography using parametric hypothesis tests.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents
Authors:
Michael Ahn,
Debidatta Dwibedi,
Chelsea Finn,
Montse Gonzalez Arenas,
Keerthana Gopalakrishnan,
Karol Hausman,
Brian Ichter,
Alex Irpan,
Nikhil Joshi,
Ryan Julian,
Sean Kirmani,
Isabel Leal,
Edward Lee,
Sergey Levine,
Yao Lu,
Isabel Leal,
Sharath Maddineni,
Kanishka Rao,
Dorsa Sadigh,
Pannag Sanketi,
Pierre Sermanet,
Quan Vuong,
Stefan Welker,
Fei Xia,
Ted Xiao
, et al. (3 additional authors not shown)
Abstract:
Foundation models that incorporate language, vision, and more recently actions have revolutionized the ability to harness internet scale data to reason about useful tasks. However, one of the key challenges of training embodied foundation models is the lack of data grounded in the physical world. In this paper, we propose AutoRT, a system that leverages existing foundation models to scale up the d…
▽ More
Foundation models that incorporate language, vision, and more recently actions have revolutionized the ability to harness internet scale data to reason about useful tasks. However, one of the key challenges of training embodied foundation models is the lack of data grounded in the physical world. In this paper, we propose AutoRT, a system that leverages existing foundation models to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision. AutoRT leverages vision-language models (VLMs) for scene understanding and grounding, and further uses large language models (LLMs) for proposing diverse and novel instructions to be performed by a fleet of robots. Guiding data collection by tap** into the knowledge of foundation models enables AutoRT to effectively reason about autonomy tradeoffs and safety while significantly scaling up data collection for robot learning. We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies. We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
△ Less
Submitted 1 July, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
Authors:
Lihan Zha,
Yuchen Cui,
Li-Heng Lin,
Minae Kwon,
Montserrat Gonzalez Arenas,
Andy Zeng,
Fei Xia,
Dorsa Sadigh
Abstract:
Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new s…
▽ More
Today's robot policies exhibit subpar performance when faced with the challenge of generalizing to novel environments. Human corrective feedback is a crucial form of guidance to enable such generalization. However, adapting to and learning from online human corrections is a non-trivial endeavor: not only do robots need to remember human feedback over time to retrieve the right information in new settings and reduce the intervention rate, but also they would need to be able to respond to feedback that can be arbitrary corrections about high-level human preferences to low-level adjustments to skill parameters. In this work, we present Distillation and Retrieval of Online Corrections (DROC), a large language model (LLM)-based system that can respond to arbitrary forms of language feedback, distill generalizable knowledge from corrections, and retrieve relevant past experiences based on textual and visual similarity for improving performance in novel settings. DROC is able to respond to a sequence of online language corrections that address failures in both high-level task plans and low-level skill primitives. We demonstrate that DROC effectively distills the relevant information from the sequence of online corrections in a knowledge base and retrieves that knowledge in settings with new task or object instances. DROC outperforms other techniques that directly generate robot code via LLMs by using only half of the total number of corrections needed in the first round and requires little to no corrections after two iterations. We show further results, videos, prompts and code on https://sites.google.com/stanford.edu/droc .
△ Less
Submitted 21 March, 2024; v1 submitted 17 November, 2023;
originally announced November 2023.
-
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches
Authors:
Jiayuan Gu,
Sean Kirmani,
Paul Wohlhart,
Yao Lu,
Montserrat Gonzalez Arenas,
Kanishka Rao,
Wenhao Yu,
Chuyuan Fu,
Keerthana Gopalakrishnan,
Zhuo Xu,
Priya Sundaresan,
Peng Xu,
Hao Su,
Karol Hausman,
Chelsea Finn,
Quan Vuong,
Ted Xiao
Abstract:
Generalization remains one of the most important desiderata for robust robot learning systems. While recently proposed approaches show promise in generalization to novel objects, semantic concepts, or visual distribution shifts, generalization to new tasks remains challenging. For example, a language-conditioned policy trained on pick-and-place tasks will not be able to generalize to a folding tas…
▽ More
Generalization remains one of the most important desiderata for robust robot learning systems. While recently proposed approaches show promise in generalization to novel objects, semantic concepts, or visual distribution shifts, generalization to new tasks remains challenging. For example, a language-conditioned policy trained on pick-and-place tasks will not be able to generalize to a folding task, even if the arm trajectory of folding is similar to pick-and-place. Our key insight is that this kind of generalization becomes feasible if we represent the task through rough trajectory sketches. We propose a policy conditioning method using such rough trajectory sketches, which we call RT-Trajectory, that is practical, easy to specify, and allows the policy to effectively perform new tasks that would otherwise be challenging to perform. We find that trajectory sketches strike a balance between being detailed enough to express low-level motion-centric guidance while being coarse enough to allow the learned policy to interpret the trajectory sketch in the context of situational visual observations. In addition, we show how trajectory sketches can provide a useful interface to communicate with robotic policies: they can be specified through simple human inputs like drawings or videos, or through automated methods such as modern image-generating or waypoint-generating methods. We evaluate RT-Trajectory at scale on a variety of real-world robotic tasks, and find that RT-Trajectory is able to perform a wider range of tasks compared to language-conditioned and goal-conditioned policies, when provided the same training data.
△ Less
Submitted 6 November, 2023; v1 submitted 3 November, 2023;
originally announced November 2023.
-
Analyzing the contribution of different passively collected data to predict Stress and Depression
Authors:
Irene Bonafonte,
Cristina Bustos,
Abraham Larrazolo,
Gilberto Lorenzo Martinez Luna,
Adolfo Guzman Arenas,
Xavier Baro,
Isaac Tourgeman,
Mercedes Balcells,
Agata Lapedriza
Abstract:
The possibility of recognizing diverse aspects of human behavior and environmental context from passively captured data motivates its use for mental health assessment. In this paper, we analyze the contribution of different passively collected sensor data types (WiFi, GPS, Social interaction, Phone Log, Physical Activity, Audio, and Academic features) to predict daily selfreport stress and PHQ-9 d…
▽ More
The possibility of recognizing diverse aspects of human behavior and environmental context from passively captured data motivates its use for mental health assessment. In this paper, we analyze the contribution of different passively collected sensor data types (WiFi, GPS, Social interaction, Phone Log, Physical Activity, Audio, and Academic features) to predict daily selfreport stress and PHQ-9 depression score. First, we compute 125 mid-level features from the original raw data. These 125 features include groups of features from the different sensor data types. Then, we evaluate the contribution of each feature type by comparing the performance of Neural Network models trained with all features against Neural Network models trained with specific feature groups. Our results show that WiFi features (which encode mobility patterns) and Phone Log features (which encode information correlated with sleep patterns), provide significative information for stress and depression prediction.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
False Negative/Positive Control for SAM on Noisy Medical Images
Authors:
Xing Yao,
Han Liu,
Dewei Hu,
Daiwei Lu,
Ange Lou,
Hao Li,
Ruining Deng,
Gabriel Arenas,
Baris Oguz,
Nadav Schwartz,
Brett C Byram,
Ipek Oguz
Abstract:
The Segment Anything Model (SAM) is a recently developed all-range foundation model for image segmentation. It can use sparse manual prompts such as bounding boxes to generate pixel-level segmentation in natural images but struggles in medical images such as low-contrast, noisy ultrasound images. We propose a refined test-phase prompt augmentation technique designed to improve SAM's performance in…
▽ More
The Segment Anything Model (SAM) is a recently developed all-range foundation model for image segmentation. It can use sparse manual prompts such as bounding boxes to generate pixel-level segmentation in natural images but struggles in medical images such as low-contrast, noisy ultrasound images. We propose a refined test-phase prompt augmentation technique designed to improve SAM's performance in medical image segmentation. The method couples multi-box prompt augmentation and an aleatoric uncertainty-based false-negative (FN) and false-positive (FP) correction (FNPC) strategy. We evaluate the method on two ultrasound datasets and show improvement in SAM's performance and robustness to inaccurate prompts, without the necessity for further training or tuning. Moreover, we present the Single-Slice-to-Volume (SS2V) method, enabling 3D pixel-level segmentation using only the bounding box annotation from a single 2D slice. Our results allow efficient use of SAM in even noisy, low-contrast medical images. The source code will be released soon.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Authors:
Anthony Brohan,
Noah Brown,
Justice Carbajal,
Yevgen Chebotar,
Xi Chen,
Krzysztof Choromanski,
Tianli Ding,
Danny Driess,
Avinava Dubey,
Chelsea Finn,
Pete Florence,
Chuyuan Fu,
Montse Gonzalez Arenas,
Keerthana Gopalakrishnan,
Kehang Han,
Karol Hausman,
Alexander Herzog,
Jasmine Hsu,
Brian Ichter,
Alex Irpan,
Nikhil Joshi,
Ryan Julian,
Dmitry Kalashnikov,
Yuheng Kuang,
Isabel Leal
, et al. (29 additional authors not shown)
Abstract:
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.…
▽ More
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Large Language Models as General Pattern Machines
Authors:
Suvir Mirchandani,
Fei Xia,
Pete Florence,
Brian Ichter,
Danny Driess,
Montserrat Gonzalez Arenas,
Kanishka Rao,
Dorsa Sadigh,
Andy Zeng
Abstract:
We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstraction and Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion profici…
▽ More
We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstraction and Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. These results suggest that without any additional training, LLMs can serve as general sequence modelers, driven by in-context learning. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics -- from extrapolating sequences of numbers that represent states over time to complete simple motions, to least-to-most prompting of reward-conditioned trajectories that can discover and represent closed-loop policies (e.g., a stabilizing controller for CartPole). While difficult to deploy today for real systems due to latency, context size limitations, and compute costs, the approach of using LLMs to drive low-level control may provide an exciting glimpse into how the patterns among words could be transferred to actions.
△ Less
Submitted 25 October, 2023; v1 submitted 10 July, 2023;
originally announced July 2023.
-
To be or not to be: a translation reception study of a literary text translated into Dutch and Catalan using machine translation
Authors:
Ana Guerberof Arenas,
Antonio Toral
Abstract:
This article presents the results of a study involving the reception of a fictional story by Kurt Vonnegut translated from English into Catalan and Dutch in three conditions: machine-translated (MT), post-edited (PE) and translated from scratch (HT). 223 participants were recruited who rated the reading conditions using three scales: Narrative Engagement, Enjoyment and Translation Reception. The r…
▽ More
This article presents the results of a study involving the reception of a fictional story by Kurt Vonnegut translated from English into Catalan and Dutch in three conditions: machine-translated (MT), post-edited (PE) and translated from scratch (HT). 223 participants were recruited who rated the reading conditions using three scales: Narrative Engagement, Enjoyment and Translation Reception. The results show that HT presented a higher engagement, enjoyment and translation reception in Catalan if compared to PE and MT. However, the Dutch readers show higher scores in PE than in both HT and MT, and the highest engagement and enjoyments scores are reported when reading the original English version. We hypothesize that when reading a fictional story in translation, not only the condition and the quality of the translations is key to understand its reception, but also the participants reading patterns, reading language, and, perhaps language status in their own societies.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Language to Rewards for Robotic Skill Synthesis
Authors:
Wenhao Yu,
Nimrod Gileadi,
Chuyuan Fu,
Sean Kirmani,
Kuang-Huei Lee,
Montse Gonzalez Arenas,
Hao-Tien Lewis Chiang,
Tom Erez,
Leonard Hasenclever,
Jan Humplik,
Brian Ichter,
Ted Xiao,
Peng Xu,
Andy Zeng,
Tingnan Zhang,
Nicolas Heess,
Dorsa Sadigh,
Jie Tan,
Yuval Tassa,
Fei Xia
Abstract:
Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing effo…
▽ More
Large language models (LLMs) have demonstrated exciting progress in acquiring diverse new capabilities through in-context learning, ranging from logical reasoning to code-writing. Robotics researchers have also explored using LLMs to advance the capabilities of robotic control. However, since low-level robot actions are hardware-dependent and underrepresented in LLM training corpora, existing efforts in applying LLMs to robotics have largely treated LLMs as semantic planners or relied on human-engineered control primitives to interface with the robot. On the other hand, reward functions are shown to be flexible representations that can be optimized for control policies to achieve diverse tasks, while their semantic richness makes them suitable to be specified by LLMs. In this work, we introduce a new paradigm that harnesses this realization by utilizing LLMs to define reward parameters that can be optimized and accomplish variety of robotic tasks. Using reward as the intermediate interface generated by LLMs, we can effectively bridge the gap between high-level language instructions or corrections to low-level robot actions. Meanwhile, combining this with a real-time optimizer, MuJoCo MPC, empowers an interactive behavior creation experience where users can immediately observe the results and provide feedback to the system. To systematically evaluate the performance of our proposed method, we designed a total of 17 tasks for a simulated quadruped robot and a dexterous manipulator robot. We demonstrate that our proposed method reliably tackles 90% of the designed tasks, while a baseline using primitive skills as the interface with Code-as-policies achieves 50% of the tasks. We further validated our method on a real robot arm where complex manipulation skills such as non-prehensile pushing emerge through our interactive system.
△ Less
Submitted 16 June, 2023; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators
Authors:
Alexander Herzog,
Kanishka Rao,
Karol Hausman,
Yao Lu,
Paul Wohlhart,
Mengyuan Yan,
Jessica Lin,
Montserrat Gonzalez Arenas,
Ted Xiao,
Daniel Kappler,
Daniel Ho,
Jarek Rettinghouse,
Yevgen Chebotar,
Kuang-Huei Lee,
Keerthana Gopalakrishnan,
Ryan Julian,
Adrian Li,
Chuyuan Kelly Fu,
Bob Wei,
Sangeetha Ramesh,
Khem Holden,
Kim Kleiven,
David Rendleman,
Sean Kirmani,
Jeff Bingham
, et al. (15 additional authors not shown)
Abstract:
We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL…
▽ More
We describe a system for deep reinforcement learning of robotic manipulation skills applied to a large-scale real-world task: sorting recyclables and trash in office buildings. Real-world deployment of deep RL policies requires not only effective training algorithms, but the ability to bootstrap real-world training and enable broad generalization. To this end, our system combines scalable deep RL from real-world data with bootstrap** from training in simulation, and incorporates auxiliary inputs from existing computer vision systems as a way to boost generalization to novel objects, while retaining the benefits of end-to-end training. We analyze the tradeoffs of different design decisions in our system, and present a large-scale empirical validation that includes training on real-world data gathered over the course of 24 months of experimentation, across a fleet of 23 robots in three office buildings, with a total training set of 9527 hours of robotic experience. Our final validation also consists of 4800 evaluation trials across 240 waste station configurations, in order to evaluate in detail the impact of the design decisions in our system, the scaling effects of including more real-world data, and the performance of the method on novel objects. The projects website and videos can be found at \href{http://rl-at-scale.github.io}{rl-at-scale.github.io}.
△ Less
Submitted 5 May, 2023;
originally announced May 2023.
-
DivEMT: Neural Machine Translation Post-Editing Effort Across Typologically Diverse Languages
Authors:
Gabriele Sarti,
Arianna Bisazza,
Ana Guerberof Arenas,
Antonio Toral
Abstract:
We introduce DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. Using a strictly controlled setup, 18 professional translators were instructed to translate or post-edit the same set of English documents into Arabic, Dutch, Italian, Turkish, Ukrainian, and Vietnamese. During the process, their edits, keys…
▽ More
We introduce DivEMT, the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages. Using a strictly controlled setup, 18 professional translators were instructed to translate or post-edit the same set of English documents into Arabic, Dutch, Italian, Turkish, Ukrainian, and Vietnamese. During the process, their edits, keystrokes, editing times and pauses were recorded, enabling an in-depth, cross-lingual evaluation of NMT quality and post-editing effectiveness. Using this new dataset, we assess the impact of two state-of-the-art NMT systems, Google Translate and the multilingual mBART-50 model, on translation productivity. We find that post-editing is consistently faster than translation from scratch. However, the magnitude of productivity gains varies widely across systems and languages, highlighting major disparities in post-editing effectiveness for languages at different degrees of typological relatedness to English, even when controlling for system architecture and training data size. We publicly release the complete dataset including all collected behavioral data, to foster new research on the translation capabilities of NMT systems for typologically diverse languages.
△ Less
Submitted 18 October, 2022; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Creativity in translation: machine translation as a constraint for literary texts
Authors:
Ana Guerberof Arenas,
Antonio Toral
Abstract:
This article presents the results of a study involving the translation of a short story by Kurt Vonnegut from English to Catalan and Dutch using three modalities: machine-translation (MT), post-editing (PE) and translation without aid (HT). Our aim is to explore creativity, understood to involve novelty and acceptability, from a quantitative perspective. The results show that HT has the highest cr…
▽ More
This article presents the results of a study involving the translation of a short story by Kurt Vonnegut from English to Catalan and Dutch using three modalities: machine-translation (MT), post-editing (PE) and translation without aid (HT). Our aim is to explore creativity, understood to involve novelty and acceptability, from a quantitative perspective. The results show that HT has the highest creativity score, followed by PE, and lastly, MT, and this is unanimous from all reviewers. A neural MT system trained on literary data does not currently have the necessary capabilities for a creative translation; it renders literal solutions to translation problems. More importantly, using MT to post-edit raw output constrains the creativity of translators, resulting in a poorer translation often not fit for publication, according to experts.
△ Less
Submitted 12 April, 2022;
originally announced April 2022.
-
The Impact of Post-editing and Machine Translation on Creativity and Reading Experience
Authors:
Ana Guerberof Arenas,
Antonio Toral
Abstract:
This article presents the results of a study involving the translation of a fictional story from English into Catalan in three modalities: machine-translated (MT), post-edited (MTPE) and translated without aid (HT). Each translation was analysed to evaluate its creativity. Subsequently, a cohort of 88 Catalan participants read the story in a randomly assigned modality and completed a survey. The r…
▽ More
This article presents the results of a study involving the translation of a fictional story from English into Catalan in three modalities: machine-translated (MT), post-edited (MTPE) and translated without aid (HT). Each translation was analysed to evaluate its creativity. Subsequently, a cohort of 88 Catalan participants read the story in a randomly assigned modality and completed a survey. The results show that HT presented a higher creativity score if compared to MTPE and MT. HT also ranked higher in narrative engagement, and translation reception, while MTPE ranked marginally higher in enjoyment. HT and MTPE show no statistically significant differences in any category, whereas MT does in all variables tested. We conclude that creativity is highest when professional translators intervene in the process, especially when working without any aid. We hypothesize that creativity in translation could be the factor that enhances reading engagement and the reception of translated literary texts.
△ Less
Submitted 15 January, 2021;
originally announced January 2021.
-
Personalized Speech recognition on mobile devices
Authors:
Ian McGraw,
Rohit Prabhavalkar,
Raziel Alvarez,
Montse Gonzalez Arenas,
Kanishka Rao,
David Rybach,
Ouais Alsharif,
Hasim Sak,
Alexander Gruenstein,
Francoise Beaufays,
Carolina Parada
Abstract:
We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its…
▽ More
We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone. We employ a quantized Long Short-Term Memory (LSTM) acoustic model trained with connectionist temporal classification (CTC) to directly predict phoneme targets, and further reduce its memory footprint using an SVD-based compression scheme. Additionally, we minimize our memory footprint by using a single language model for both dictation and voice command domains, constructed using Bayesian interpolation. Finally, in order to properly handle device-specific information, such as proper names and other context-dependent information, we inject vocabulary items into the decoder graph and bias the language model on-the-fly. Our system achieves 13.5% word error rate on an open-ended dictation task, running with a median speed that is seven times faster than real-time.
△ Less
Submitted 11 March, 2016; v1 submitted 10 March, 2016;
originally announced March 2016.
-
Measuring the local GitHub developer community
Authors:
J. J. Merelo,
Nuria Rico,
Israel Blancas,
M. G. Arenas,
Fernando Tricas,
José Antonio Vacas
Abstract:
Creating rankings might seem like a vain exercise in belly-button gazing, even more so for people so unlike that kind of things as programmers. However, in this paper we will try to prove how creating city (or province) based rankings in Spain has led to all kind of interesting effects, including increased productivity and community building. We describe the methodology we have used to search for…
▽ More
Creating rankings might seem like a vain exercise in belly-button gazing, even more so for people so unlike that kind of things as programmers. However, in this paper we will try to prove how creating city (or province) based rankings in Spain has led to all kind of interesting effects, including increased productivity and community building. We describe the methodology we have used to search for programmers residing in a particular province focusing on those where most population is concentrated and apply different measures to show how these communities differ in structure, number and productivity.
△ Less
Submitted 27 January, 2015;
originally announced January 2015.
-
SOAP vs REST: Comparing a master-slave GA implementation
Authors:
P. A. Castillo,
J. L. Bernier,
M. G. Arenas,
J. J. Merelo,
P. Garcia-Sanchez
Abstract:
In this paper, a high-level comparison of both SOAP (Simple Object Access Protocol) and REST (Representational State Transfer) is made. These are the two main approaches for interfacing to the web with web services. Both approaches are different and present some advantages and disadvantages for interfacing to web services: SOAP is conceptually more difficult (has a steeper learning curve) and more…
▽ More
In this paper, a high-level comparison of both SOAP (Simple Object Access Protocol) and REST (Representational State Transfer) is made. These are the two main approaches for interfacing to the web with web services. Both approaches are different and present some advantages and disadvantages for interfacing to web services: SOAP is conceptually more difficult (has a steeper learning curve) and more "heavy-weight" than REST, although it lacks of standards support for security. In order to test their eficiency (in time), two experiments have been performed using both technologies: a client-server model implementation and a master-slave based genetic algorithm (GA). The results obtained show clear differences in time between SOAP and REST implementations. Although both techniques are suitable for develo** parallel systems, SOAP is heavier than REST, mainly due to the verbosity of SOAP communications (XML increases the time taken to parse the messages).
△ Less
Submitted 25 May, 2011;
originally announced May 2011.
-
Distributed Evolutionary Computation using REST
Authors:
P. A. Castillo,
M. G. Arenas,
A. M. Mora,
J. L. J. Laredo,
G. Romero,
V. M Rivas,
J. J. Merelo
Abstract:
This paper analises distributed evolutionary computation based on the Representational State Transfer (REST) protocol, which overlays a farming model on evolutionary computation. An approach to evolutionary distributed optimisation of multilayer perceptrons (MLP) using REST and language Perl has been done. In these experiments, a master-slave based evolutionary algorithm (EA) has been implemented,…
▽ More
This paper analises distributed evolutionary computation based on the Representational State Transfer (REST) protocol, which overlays a farming model on evolutionary computation. An approach to evolutionary distributed optimisation of multilayer perceptrons (MLP) using REST and language Perl has been done. In these experiments, a master-slave based evolutionary algorithm (EA) has been implemented, where slave processes evaluate the costly fitness function (training a MLP to solve a classification problem). Obtained results show that the parallel version of the developed programs obtains similar or better results using much less time than the sequential version, obtaining a good speedup.
△ Less
Submitted 25 May, 2011;
originally announced May 2011.
-
Lamarckian Evolution and the Baldwin Effect in Evolutionary Neural Networks
Authors:
P. A. Castillo,
M. G. Arenas,
J. G. Castellano,
J. J. Merelo,
A. Prieto,
V. Rivas,
G. Romero
Abstract:
Hybrid neuro-evolutionary algorithms may be inspired on Darwinian or Lamarckian evolu- tion. In the case of Darwinian evolution, the Baldwin effect, that is, the progressive incorporation of learned characteristics to the genotypes, can be observed and leveraged to improve the search. The purpose of this paper is to carry out an exper- imental study into how learning can improve G-Prop genetic s…
▽ More
Hybrid neuro-evolutionary algorithms may be inspired on Darwinian or Lamarckian evolu- tion. In the case of Darwinian evolution, the Baldwin effect, that is, the progressive incorporation of learned characteristics to the genotypes, can be observed and leveraged to improve the search. The purpose of this paper is to carry out an exper- imental study into how learning can improve G-Prop genetic search. Two ways of combining learning and genetic search are explored: one exploits the Baldwin effect, while the other uses a Lamarckian strategy. Our experiments show that using a Lamarckian op- erator makes the algorithm find networks with a low error rate, and the smallest size, while using the Bald- win effect obtains MLPs with the smallest error rate, and a larger size, taking longer to reach a solution. Both approaches obtain a lower average error than other BP-based algorithms like RPROP, other evolu- tionary methods and fuzzy logic based methods
△ Less
Submitted 1 March, 2006;
originally announced March 2006.