Search | arXiv e-print repository

The Effect of Model Size on LLM Post-hoc Explainability via LIME

Authors: Henning Heyen, Amy Widdicombe, Noah Y. Siegel, Maria Perez-Ortiz, Philip Treleaven

Abstract: Large language models (LLMs) are becoming bigger to boost performance. However, little is known about how explainability is affected by this trend. This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference (NLI) and zero-shot classification (ZSC) tasks. We evaluate the explanations based on their faithfulness to the models' internal decision pr… ▽ More Large language models (LLMs) are becoming bigger to boost performance. However, little is known about how explainability is affected by this trend. This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference (NLI) and zero-shot classification (ZSC) tasks. We evaluate the explanations based on their faithfulness to the models' internal decision processes and their plausibility, i.e. their agreement with human explanations. The key finding is that increased model size does not correlate with plausibility despite improved model performance, suggesting a misalignment between the LIME explanations and the models' internal processes as model size increases. Our results further suggest limitations regarding faithfulness metrics in NLI contexts. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Published at ICLR 2024 Workshop on Secure and Trustworthy Large Language Models

arXiv:2404.03189 [pdf, other]

The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models

Authors: Noah Y. Siegel, Oana-Maria Camburu, Nicolas Heess, Maria Perez-Ortiz

Abstract: In order to oversee advanced AI systems, it is important to understand their underlying decision-making process. When prompted, large language models (LLMs) can provide natural language explanations or reasoning traces that sound plausible and receive high ratings from human annotators. However, it is unclear to what extent these explanations are faithful, i.e., truly capture the factors responsib… ▽ More In order to oversee advanced AI systems, it is important to understand their underlying decision-making process. When prompted, large language models (LLMs) can provide natural language explanations or reasoning traces that sound plausible and receive high ratings from human annotators. However, it is unclear to what extent these explanations are faithful, i.e., truly capture the factors responsible for the model's predictions. In this work, we introduce Correlational Explanatory Faithfulness (CEF), a metric that can be used in faithfulness tests based on input interventions. Previous metrics used in such tests take into account only binary changes in the predictions. Our metric accounts for the total shift in the model's predicted label distribution, more accurately reflecting the explanations' faithfulness. We then introduce the Correlational Counterfactual Test (CCT) by instantiating CEF on the Counterfactual Test (CT) from Atanasova et al. (2023). We evaluate the faithfulness of free-text explanations generated by few-shot-prompted LLMs from the Llama2 family on three NLP tasks. We find that our metric measures aspects of faithfulness which the CT misses. △ Less

Submitted 7 June, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: To be published in ACL 2024. 19 pages, 2 figures

arXiv:2312.16259 [pdf, ps, other]

On the General Dead-Ending Universe of Partizan Games

Authors: Aaron N. Siegel

Abstract: The universe $\mathcal{E}$ of dead-ending partizan games has emerged as an important structure in the study of misère play. Here we attempt a systematic investigation of the structure of $\mathcal{E}$ and its subuniverses. We begin by showing that the dead-ends exhibit a rich "absolute" structure, in the sense that they behave identically in any universe in which they appear. We will use this re… ▽ More The universe $\mathcal{E}$ of dead-ending partizan games has emerged as an important structure in the study of misère play. Here we attempt a systematic investigation of the structure of $\mathcal{E}$ and its subuniverses. We begin by showing that the dead-ends exhibit a rich "absolute" structure, in the sense that they behave identically in any universe in which they appear. We will use this result to construct an uncountable family of dead-ending universes and show that they collectively admit an uncountable family of distinct comparison relations. We will then show that whenever the ends of a universe $\mathcal{U} \subset \mathcal{E}$ are computable, then there is a constructive test for comparison modulo $\mathcal{U}$. Finally, we propose a new type of generalized simplest form that works for arbitrary universes (including universes that are not dead-ending), and that is computable whenever comparison modulo $\mathcal{U}$ is computable. In particular, this gives a complete constructive theory for subuniverses of $\mathcal{E}$ with computable ends. This theory has been implemented in cgsuite as a proof of concept. As an application of these results, we will characterize the universe generated by misère Domineering, and we will compute the misère simplest forms of $2 \times n$ Domineering rectangles for small values of $n$. △ Less

Submitted 26 December, 2023; originally announced December 2023.

MSC Class: 05A99; 91A46

arXiv:2309.15534 [pdf]

Universal click-chemistry approach for the DNA functionalization of nanoparticles

Authors: Nicole Siegel, Hiroaki Hasebe, German Chiarelli, Denis Garoli, Hiroshi Sugimoto, Minoru Fujii, Guillermo P. Acuna, Karol Kolataj

Abstract: Nanotechnology has revolutionized the fabrication of hybrid species with tailored functionalities. A milestone in this field is the DNA conjugation of nanoparticles, introduced almost 30 years ago, which typically exploits the affinity between thiol groups and metallic surfaces. Over the last decades, developments in colloidal research have enabled the synthesis of an assortment of non-metallic st… ▽ More Nanotechnology has revolutionized the fabrication of hybrid species with tailored functionalities. A milestone in this field is the DNA conjugation of nanoparticles, introduced almost 30 years ago, which typically exploits the affinity between thiol groups and metallic surfaces. Over the last decades, developments in colloidal research have enabled the synthesis of an assortment of non-metallic structures, such as high-index dielectric nanoparticles, with unique properties not previously accessible with traditional metallic nanoparticles. However, to stabilize, integrate and provide further functionality to non-metallic nanoparticles, reliable techniques for their functionalization with DNA will be crucial. Here, we combine well-established dibenzylcyclooctyne-azide click-chemistry with a simple freeze-thaw method to achieve the functionalization of silica and silicon nanoparticles, which form exceptionally stable colloids with a high DNA surface density of 0.2 molecules/nm2. Furthermore, we demonstrate that these functionalized colloids can be self-assembled into high-index dielectric optical antennas with a yield of up to 78% via the use of DNA origami. Finally, we extend this method to functionalize other important nanomaterials, including oxides, polymers, core-shell and metal nanostructures. Our results indicate that the method presented herein serves as a crucial complement to conventional thiol functionalization chemistry and thus greatly expands the toolbox of DNA-functionalized nanoparticles currently available. △ Less

Submitted 29 December, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2304.13653 [pdf, other]

doi 10.1126/scirobotics.adi8022

Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning

Authors: Tuomas Haarnoja, Ben Moran, Guy Lever, Sandy H. Huang, Dhruva Tirumala, Jan Humplik, Markus Wulfmeier, Saran Tunyasuvunakool, Noah Y. Siegel, Roland Hafner, Michael Bloesch, Kristian Hartikainen, Arunkumar Byravan, Leonard Hasenclever, Yuval Tassa, Fereshteh Sadeghi, Nathan Batchelor, Federico Casarini, Stefano Saliceti, Charles Game, Neil Sreendra, Kushal Patel, Marlon Gwira, Andrea Huber, Nicole Hurley , et al. (3 additional authors not shown)

Abstract: We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust… ▽ More We investigate whether Deep Reinforcement Learning (Deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies in dynamic environments. We used Deep RL to train a humanoid robot with 20 actuated joints to play a simplified one-versus-one (1v1) soccer game. The resulting agent exhibits robust and dynamic movement skills such as rapid fall recovery, walking, turning, kicking and more; and it transitions between them in a smooth, stable, and efficient manner. The agent's locomotion and tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. The agent also developed a basic strategic understanding of the game, and learned, for instance, to anticipate ball movements and to block opponent shots. Our agent was trained in simulation and transferred to real robots zero-shot. We found that a combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training in simulation enabled good-quality transfer. Although the robots are inherently fragile, basic regularization of the behavior during training led the robots to learn safe and effective movements while still performing in a dynamic and agile way -- well beyond what is intuitively expected from the robot. Indeed, in experiments, they walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline, while efficiently combining the skills to achieve the longer term objectives. △ Less

Submitted 11 April, 2024; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: Project website: https://sites.google.com/view/op3-soccer

arXiv:2304.04869 [pdf, other]

doi 10.1088/1538-3873/acd1b5

The James Webb Space Telescope Mission

Authors: Jonathan P. Gardner, John C. Mather, Randy Abbott, James S. Abell, Mark Abernathy, Faith E. Abney, John G. Abraham, Roberto Abraham, Yasin M. Abul-Huda, Scott Acton, Cynthia K. Adams, Evan Adams, David S. Adler, Maarten Adriaensen, Jonathan Albert Aguilar, Mansoor Ahmed, Nasif S. Ahmed, Tanjira Ahmed, Rüdeger Albat, Loïc Albert, Stacey Alberts, David Aldridge, Mary Marsha Allen, Shaune S. Allen, Martin Altenburg , et al. (983 additional authors not shown)

Abstract: Twenty-six years ago a small committee report, building on earlier studies, expounded a compelling and poetic vision for the future of astronomy, calling for an infrared-optimized space telescope with an aperture of at least $4m$. With the support of their governments in the US, Europe, and Canada, 20,000 people realized that vision as the $6.5m$ James Webb Space Telescope. A generation of astrono… ▽ More Twenty-six years ago a small committee report, building on earlier studies, expounded a compelling and poetic vision for the future of astronomy, calling for an infrared-optimized space telescope with an aperture of at least $4m$. With the support of their governments in the US, Europe, and Canada, 20,000 people realized that vision as the $6.5m$ James Webb Space Telescope. A generation of astronomers will celebrate their accomplishments for the life of the mission, potentially as long as 20 years, and beyond. This report and the scientific discoveries that follow are extended thank-you notes to the 20,000 team members. The telescope is working perfectly, with much better image quality than expected. In this and accompanying papers, we give a brief history, describe the observatory, outline its objectives and current observing program, and discuss the inventions and people who made it possible. We cite detailed reports on the design and the measured performance on orbit. △ Less

Submitted 10 April, 2023; originally announced April 2023.

Comments: Accepted by PASP for the special issue on The James Webb Space Telescope Overview, 29 pages, 4 figures

arXiv:2211.14275 [pdf, other]

Solving math word problems with process- and outcome-based feedback

Authors: Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins

Abstract: Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might natur… ▽ More Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might naturally be expected not just in final-answer errors but also in reasoning errors, which can be difficult to detect and are problematic in many real-world domains such as education. We run the first comprehensive comparison between process- and outcome-based approaches trained on a natural language task, GSM8K. We find that pure outcome-based supervision produces similar final-answer error rates with less label supervision. However, for correct reasoning steps we find it necessary to use process-based supervision or supervision from learned reward models that emulate process-based feedback. In total, we improve the previous best results from 16.8% $\to$ 12.7% final-answer error and 14.0% $\to$ 3.4% reasoning error among final-answer-correct solutions. △ Less

Submitted 25 November, 2022; originally announced November 2022.

arXiv:2203.17138 [pdf, other]

Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors

Authors: Steven Bohez, Saran Tunyasuvunakool, Philemon Brakel, Fereshteh Sadeghi, Leonard Hasenclever, Yuval Tassa, Emilio Parisotto, Jan Humplik, Tuomas Haarnoja, Roland Hafner, Markus Wulfmeier, Michael Neunert, Ben Moran, Noah Siegel, Andrea Huber, Francesco Romano, Nathan Batchelor, Federico Casarini, Josh Merel, Raia Hadsell, Nicolas Heess

Abstract: We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our appro… ▽ More We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our approach does not require extensive reward engineering to produce sensible and natural looking behavior at the time of reuse. This makes it easy to create well-regularized, task-oriented controllers that are suitable for deployment on real robots. We demonstrate how our skill module can be used for imitation, and train controllable walking and ball dribbling policies for both the ANYmal quadruped and OP3 humanoid. These policies are then deployed on hardware via zero-shot simulation-to-reality transfer. Accompanying videos are available at https://bit.ly/robot-npmp. △ Less

Submitted 31 March, 2022; originally announced March 2022.

Comments: 30 pages, 9 figures, 8 tables, 14 videos at https://bit.ly/robot-npmp , submitted to Science Robotics

arXiv:2112.02721 [pdf, other]

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Authors: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, **ho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo , et al. (101 additional authors not shown)

Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split… ▽ More Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter). △ Less

Submitted 11 October, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

arXiv:2105.12196 [pdf, other]

From Motor Control to Team Play in Simulated Humanoid Football

Authors: Siqi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess

Abstract: Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents… ▽ More Intelligent behaviour in the physical world exhibits structure at multiple spatial and temporal scales. Although movements are ultimately executed at the level of instantaneous muscle tensions or joint torques, they must be selected to serve goals defined on much longer timescales, and in terms of relations that extend far beyond the body itself, ultimately involving coordination with other agents. Recent research in artificial intelligence has shown the promise of learning-based approaches to the respective problems of complex movement, longer-term planning and multi-agent coordination. However, there is limited research aimed at their integration. We study this problem by training teams of physically simulated humanoid avatars to play football in a realistic virtual environment. We develop a method that combines imitation learning, single- and multi-agent reinforcement learning and population-based training, and makes use of transferable representations of behaviour for decision making at different levels of abstraction. In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds. We investigate the emergence of behaviours at different levels of abstraction, as well as the representations that underlie these behaviours using several analysis techniques, including statistics from real-world sports analytics. Our work constitutes a complete demonstration of integrated decision-making at multiple scales in a physically embodied multi-agent setting. See project video at https://youtu.be/KHMwq9pv7mg. △ Less

Submitted 25 May, 2021; originally announced May 2021.

arXiv:2012.08554 [pdf, ps, other]

On the Structure of Misère Impartial Games

Authors: Aaron N. Siegel

Abstract: We consider the abstract structure of the monoid M of misère impartial game values. Several new results are presented, including a proof that the group of fractions of M is almost torsion-free; a method of calculating the number of distinct games born by day 7; and some new results on the structure of prime games. Also included are proofs of a few older results due to Conway, such as the Cancellat… ▽ More We consider the abstract structure of the monoid M of misère impartial game values. Several new results are presented, including a proof that the group of fractions of M is almost torsion-free; a method of calculating the number of distinct games born by day 7; and some new results on the structure of prime games. Also included are proofs of a few older results due to Conway, such as the Cancellation Theorem, that are essential to the analysis but whose proofs are not readily available in the literature. Much of the work presented here was done jointly with John Conway and Dan Hoey, and I dedicate this paper to their memory. △ Less

Submitted 3 September, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

MSC Class: 05A99; 91A46

arXiv:2010.15492 [pdf, other]

"What, not how": Solving an under-actuated insertion task from scratch

Authors: Giulia Vezzani, Michael Neunert, Markus Wulfmeier, Rae Jeong, Thomas Lampe, Noah Siegel, Roland Hafner, Abbas Abdolmaleki, Martin Riedmiller, Francesco Nori

Abstract: Robot manipulation requires a complex set of skills that need to be carefully combined and coordinated to solve a task. Yet, most ReinforcementLearning (RL) approaches in robotics study tasks which actually consist only of a single manipulation skill, such as gras** an object or inserting a pre-grasped object. As a result the skill ('how' to solve the task) but not the actual goal of a complete… ▽ More Robot manipulation requires a complex set of skills that need to be carefully combined and coordinated to solve a task. Yet, most ReinforcementLearning (RL) approaches in robotics study tasks which actually consist only of a single manipulation skill, such as gras** an object or inserting a pre-grasped object. As a result the skill ('how' to solve the task) but not the actual goal of a complete manipulation ('what' to solve) is specified. In contrast, we study a complex manipulation goal that requires an agent to learn and combine diverse manipulation skills. We propose a challenging, highly under-actuated peg-in-hole task with a free, rotational asymmetrical peg, requiring a broad range of manipulation skills. While correct peg (re-)orientation is a requirement for successful insertion, there is no reward associated with it. Hence an agent needs to understand this pre-condition and learn the skill to fulfil it. The final insertion reward is sparse, allowing freedom in the solution and leading to complex emerging behaviour not envisioned during the task design. We tackle the problem in a multi-task RL framework using Scheduled Auxiliary Control (SAC-X) combined with Regularized Hierarchical Policy Optimization (RHPO) which successfully solves the task in simulation and from scratch on a single robot where data is severely limited. △ Less

Submitted 30 October, 2020; v1 submitted 29 October, 2020; originally announced October 2020.

arXiv:2007.15588 [pdf, other]

Data-efficient Hindsight Off-policy Option Learning

Authors: Markus Wulfmeier, Dushyant Rao, Roland Hafner, Thomas Lampe, Abbas Abdolmaleki, Tim Hertweck, Michael Neunert, Dhruva Tirumala, Noah Siegel, Nicolas Heess, Martin Riedmiller

Abstract: We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates through the dynamic programming inference procedure to robustly train all policy components off-policy and end-to-end. The approach outperforms existing option learning methods on common benchmarks. To better understand the option fr… ▽ More We introduce Hindsight Off-policy Options (HO2), a data-efficient option learning algorithm. Given any trajectory, HO2 infers likely option choices and backpropagates through the dynamic programming inference procedure to robustly train all policy components off-policy and end-to-end. The approach outperforms existing option learning methods on common benchmarks. To better understand the option framework and disentangle benefits from both temporal and action abstraction, we evaluate ablations with flat policies and mixture policies with comparable optimization. The results highlight the importance of both types of abstraction as well as off-policy training and trust-region constraints, particularly in challenging, simulated 3D robot manipulation tasks from raw pixel inputs. Finally, we intuitively adapt the inference step to investigate the effect of increased temporal abstraction on training with pre-trained options and from scratch. △ Less

Submitted 15 June, 2021; v1 submitted 30 July, 2020; originally announced July 2020.

Comments: Published at ICML2021

arXiv:2006.15134 [pdf, other]

Critic Regularized Regression

Authors: Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Abstract: Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learnin… ▽ More Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learning from a fixed dataset. In this paper, we propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR). We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks. △ Less

Submitted 22 September, 2021; v1 submitted 26 June, 2020; originally announced June 2020.

Comments: 24 pages; presented at NeurIPS 2020

arXiv:2005.07541 [pdf, other]

Simple Sensor Intentions for Exploration

Authors: Tim Hertweck, Martin Riedmiller, Michael Bloesch, Jost Tobias Springenberg, Noah Siegel, Markus Wulfmeier, Roland Hafner, Nicolas Heess

Abstract: Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic sy… ▽ More Modern reinforcement learning algorithms can learn solutions to increasingly difficult control problems while at the same time reduce the amount of prior knowledge needed for their application. One of the remaining challenges is the definition of reward schemes that appropriately facilitate exploration without biasing the solution in undesirable ways, and that can be implemented on real robotic systems without expensive instrumentation. In this paper we focus on a setting in which goal tasks are defined via simple sparse rewards, and exploration is facilitated via agent-internal auxiliary tasks. We introduce the idea of simple sensor intentions (SSIs) as a generic way to define auxiliary tasks. SSIs reduce the amount of prior knowledge that is required to define suitable rewards. They can further be computed directly from raw sensor streams and thus do not require expensive and possibly brittle state estimation on real systems. We demonstrate that a learning system based on these rewards can solve complex robotic tasks in simulation and in real world settings. In particular, we show that a real robotic arm can learn to grasp and lift and solve a Ball-in-a-Cup task from scratch, when only raw sensor streams are used for both controller input and in the auxiliary reward definition. △ Less

Submitted 15 May, 2020; originally announced May 2020.

arXiv:2002.08396 [pdf, other]

Keep Doing What Worked: Behavioral Modelling Priors for Offline Reinforcement Learning

Authors: Noah Y. Siegel, Jost Tobias Springenberg, Felix Berkenkamp, Abbas Abdolmaleki, Michael Neunert, Thomas Lampe, Roland Hafner, Nicolas Heess, Martin Riedmiller

Abstract: Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms appealing for real world problems such as robot control. In practice, however, standard off-policy algorithms fail in the batch setting for continuous control. In th… ▽ More Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms appealing for real world problems such as robot control. In practice, however, standard off-policy algorithms fail in the batch setting for continuous control. In this paper, we propose a simple solution to this problem. It admits the use of data generated by arbitrary behavior policies and uses a learned prior -- the advantage-weighted behavior model (ABM) -- to bias the RL policy towards actions that have previously been executed and are likely to be successful on the new task. Our method can be seen as an extension of recent work on batch-RL that enables stable learning from conflicting data-sources. We find improvements on competitive baselines in a variety of RL tasks -- including standard continuous control benchmarks and multi-task learning for simulated and real-world robots. △ Less

Submitted 17 June, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

ACM Class: I.2.6; I.2.9

Journal ref: ICLR 2020

arXiv:1912.10517 [pdf, other]

Memgames

Authors: Urban Larsson, Simon Rubinstein-Salzedo, Aaron N. Siegel

Abstract: In this article, we study the structure, and in particular the Grundy values, of a family of games known as memgames. In this article, we study the structure, and in particular the Grundy values, of a family of games known as memgames. △ Less

Submitted 30 October, 2023; v1 submitted 22 December, 2019; originally announced December 2019.

Comments: Feedback welcome!

arXiv:1910.04142 [pdf, other]

Imagined Value Gradients: Model-Based Policy Optimization with Transferable Latent Dynamics Models

Authors: Arunkumar Byravan, Jost Tobias Springenberg, Abbas Abdolmaleki, Roland Hafner, Michael Neunert, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

Abstract: Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predicti… ▽ More Humans are masters at quickly learning many complex tasks, relying on an approximate understanding of the dynamics of their environments. In much the same way, we would like our learning agents to quickly adapt to new tasks. In this paper, we explore how model-based Reinforcement Learning (RL) can facilitate transfer to new tasks. We develop an algorithm that learns an action-conditional, predictive model of expected future observations, rewards and values from which a policy can be derived by following the gradient of the estimated value along imagined trajectories. We show how robust policy optimization can be achieved in robot manipulation tasks even with approximate models that are learned directly from vision and proprioception. We evaluate the efficacy of our approach in a transfer learning scenario, re-using previously learned models on tasks with different reward structures and visual distractors, and show a significant improvement in learning speed compared to strong off-policy baselines. Videos with results can be found at https://sites.google.com/view/ivg-corl19 △ Less

Submitted 9 October, 2019; originally announced October 2019.

Comments: To appear at the 3rd annual Conference on Robot Learning, Osaka, Japan (CoRL 2019). 24 pages including appendix (main paper - 8 pages)

arXiv:1906.11228 [pdf, other]

Compositional Transfer in Hierarchical Reinforcement Learning

Authors: Markus Wulfmeier, Abbas Abdolmaleki, Roland Hafner, Jost Tobias Springenberg, Michael Neunert, Tim Hertweck, Thomas Lampe, Noah Siegel, Nicolas Heess, Martin Riedmiller

Abstract: The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multip… ▽ More The successful application of general reinforcement learning algorithms to real-world robotics applications is often limited by their high data requirements. We introduce Regularized Hierarchical Policy Optimization (RHPO) to improve data-efficiency for domains with multiple dominant tasks and ultimately reduce required platform time. To this end, we employ compositional inductive biases on multiple levels and corresponding mechanisms for sharing off-policy transition data across low-level controllers and tasks as well as scheduling of tasks. The presented algorithm enables stable and fast learning for complex, real-world domains in the parallel multitask and sequential transfer case. We show that the investigated types of hierarchy enable positive transfer while partially mitigating negative interference and evaluate the benefits of additional incentives for efficient, compositional task solutions in single task domains. Finally, we demonstrate substantial data-efficiency and final performance gains over competitive baselines in a week-long, physical robot stacking experiment. △ Less

Submitted 19 May, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

Comments: Robotics Science and Systems 2020

arXiv:1804.02445 [pdf, other]

doi 10.1145/3197026.3197040

Extracting Scientific Figures with Distantly Supervised Neural Networks

Authors: Noah Siegel, Nicholas Lourie, Russell Power, Waleed Ammar

Abstract: Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded the development of data-driven methods for scientific figure extraction. In this paper, we induce high-quality training labels for the task of figure extraction in a large number of scientific documents, with no human intervention. To a… ▽ More Non-textual components such as charts, diagrams and tables provide key information in many scientific documents, but the lack of large labeled datasets has impeded the development of data-driven methods for scientific figure extraction. In this paper, we induce high-quality training labels for the task of figure extraction in a large number of scientific documents, with no human intervention. To accomplish this we leverage the auxiliary data provided in two large web collections of scientific documents (arXiv and PubMed) to locate figures and their associated captions in the rasterized PDF. We share the resulting dataset of over 5.5 million induced labels---4,000 times larger than the previous largest figure extraction dataset---with an average precision of 96.8%, to enable the development of modern data-driven methods for this task. We use this dataset to train a deep neural network for end-to-end figure detection, yielding a model that can be more easily extended to new domains compared to previous work. The model was successfully deployed in Semantic Scholar, a large-scale academic search engine, and used to extract figures in 13 million scientific documents. △ Less

Submitted 30 May, 2018; v1 submitted 6 April, 2018; originally announced April 2018.

Comments: 10 pages, 5 figures, paper accepted at JCDL 2018

arXiv:0705.2404 [pdf, ps, other]

Misere quotients for impartial games: Supplementary material

Authors: Thane E. Plambeck, Aaron N. Siegel

Abstract: We provide supplementary appendices to the paper Misere quotients for impartial games. These include detailed solutions to many of the octal games discussed in the paper, and descriptions of the algorithms used to compute most of our solutions. We provide supplementary appendices to the paper Misere quotients for impartial games. These include detailed solutions to many of the octal games discussed in the paper, and descriptions of the algorithms used to compute most of our solutions. △ Less

Submitted 16 May, 2007; originally announced May 2007.

Comments: Supplement to the paper Misere Quotients for Impartial Games. 17 pages

MSC Class: 91A46

arXiv:math/0703565 [pdf, ps, other]

Misère canonical forms of partizan games

Authors: Aaron N. Siegel

Abstract: We show that partizan games admit canonical forms in misère play. The proof is a synthesis of the canonical form theorems for normal-play partizan games and misère-play impartial games. It is fully constructive, and algorithms readily emerge for comparing misère games and calculating their canonical forms. We use these techniques to show that there are precisely 256 games born by day 2, and to… ▽ More We show that partizan games admit canonical forms in misère play. The proof is a synthesis of the canonical form theorems for normal-play partizan games and misère-play impartial games. It is fully constructive, and algorithms readily emerge for comparing misère games and calculating their canonical forms. We use these techniques to show that there are precisely 256 games born by day 2, and to obtain a bound on the number of games born by day 3. △ Less

Submitted 19 March, 2007; originally announced March 2007.

Comments: 12 pages

MSC Class: 91A46

arXiv:math/0703070 [pdf, ps, other]

The structure and classification of misère quotients

Authors: Aaron N. Siegel

Abstract: A \emph{bipartite monoid} is a commutative monoid $\Q$ together with an identified subset $¶\subset \Q$. In this paper we study a class of bipartite monoids, known as \emph{misère quotients}, that are naturally associated to impartial combinatorial games. We introduce a structure theory for misère quotients with $|¶| = 2$, and give a complete classification of all such quotients up to isomorph… ▽ More A \emph{bipartite monoid} is a commutative monoid $\Q$ together with an identified subset $¶\subset \Q$. In this paper we study a class of bipartite monoids, known as \emph{misère quotients}, that are naturally associated to impartial combinatorial games. We introduce a structure theory for misère quotients with $|¶| = 2$, and give a complete classification of all such quotients up to isomorphism. One consequence is that if $|¶| = 2$ and $\Q$ is finite, then $|\Q| = 2^n+2$ or $2^n+4$. We then develop computational techniques for enumerating misère quotients of small order, and apply them to count the number of non-isomorphic quotients of order at most~18. We also include a manual proof that there is exactly one quotient of order~8. △ Less

Submitted 2 March, 2007; originally announced March 2007.

Comments: 23 pages

MSC Class: 91A46

arXiv:math/0612616 [pdf, ps, other]

Misère Games and Misère Quotients

Authors: Aaron N. Siegel

Abstract: These lecture notes are based on a short course on misère quotients offered at the Weizmann Institute of Science in Rehovot, Israel, in November 2006. They include an introduction to impartial games, starting from the beginning; the basic misère quotient construction; a proof of the Guy--Smith--Plambeck Periodicity Theorem; and statements of some recent results and open problems in the subject. These lecture notes are based on a short course on misère quotients offered at the Weizmann Institute of Science in Rehovot, Israel, in November 2006. They include an introduction to impartial games, starting from the beginning; the basic misère quotient construction; a proof of the Guy--Smith--Plambeck Periodicity Theorem; and statements of some recent results and open problems in the subject. △ Less

Submitted 21 December, 2006; v1 submitted 20 December, 2006; originally announced December 2006.

Comments: 34 pages; fixed references

MSC Class: 91A46

arXiv:math/0609825 [pdf, ps, other]

Misere quotients for impartial games

Authors: Thane E. Plambeck, Aaron N. Siegel

Abstract: We announce misere-play solutions to several previously-unsolved combinatorial games. The solutions are described in terms of misere quotients--commutative monoids that encode the additive structure of specific misere-play games. We also introduce several advances in the structure theory of misere quotients, including a connection between the combinatorial structure of normal and misere play. We announce misere-play solutions to several previously-unsolved combinatorial games. The solutions are described in terms of misere quotients--commutative monoids that encode the additive structure of specific misere-play games. We also introduce several advances in the structure theory of misere quotients, including a connection between the combinatorial structure of normal and misere play. △ Less

Submitted 13 August, 2007; v1 submitted 28 September, 2006; originally announced September 2006.

Comments: Paper has been split into two parts: this part, and a supplement at arXiv:0705.2404v1

MSC Class: 91A46; 20M14

Journal ref: Journal of Combinatorial Theory, Series A (May 2008) pp 593-622

Showing 1–25 of 25 results for author: Siegel, N