Search | arXiv e-print repository

Event-based Agile Object Catching with a Quadrupedal Robot

Authors: Benedek Forrai, Takahiro Miki, Daniel Gehrig, Marco Hutter, Davide Scaramuzza

Abstract: Quadrupedal robots are conquering various indoor and outdoor applications due to their ability to navigate challenging uneven terrains. Exteroceptive information greatly enhances this capability since perceiving their surroundings allows them to adapt their controller and thus achieve higher levels of robustness. However, sensors such as LiDARs and RGB cameras do not provide sufficient information… ▽ More Quadrupedal robots are conquering various indoor and outdoor applications due to their ability to navigate challenging uneven terrains. Exteroceptive information greatly enhances this capability since perceiving their surroundings allows them to adapt their controller and thus achieve higher levels of robustness. However, sensors such as LiDARs and RGB cameras do not provide sufficient information to quickly and precisely react in a highly dynamic environment since they suffer from a bandwidth-latency tradeoff. They require significant bandwidth at high frame rates while featuring significant perceptual latency at lower frame rates, thereby limiting their versatility on resource-constrained platforms. In this work, we tackle this problem by equip** our quadruped with an event camera, which does not suffer from this tradeoff due to its asynchronous and sparse operation. In leveraging the low latency of the events, we push the limits of quadruped agility and demonstrate high-speed ball catching for the first time. We show that our quadruped equipped with an event camera can catch objects with speeds up to 15 m/s from 4 meters, with a success rate of 83%. Using a VGA event camera, our method runs at 100 Hz on an NVIDIA Jetson Orin. △ Less

Submitted 6 April, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

Journal ref: IEEE International Conference on Robotics and Automation (ICRA) 2023, London

arXiv:2303.05486 [pdf, other]

Learning Arm-Assisted Fall Damage Reduction and Recovery for Legged Mobile Manipulators

Authors: Yuntao Ma, Farbod Farshidian, Marco Hutter

Abstract: Adaptive falling and recovery skills greatly extend the applicability of robot deployments. In the case of legged mobile manipulators, the robot arm could adaptively stop the fall and assist the recovery. Prior works on falling and recovery strategies for legged mobile manipulators usually rely on assumptions such as inelastic collisions and falling in defined directions to enable real-time comput… ▽ More Adaptive falling and recovery skills greatly extend the applicability of robot deployments. In the case of legged mobile manipulators, the robot arm could adaptively stop the fall and assist the recovery. Prior works on falling and recovery strategies for legged mobile manipulators usually rely on assumptions such as inelastic collisions and falling in defined directions to enable real-time computation. This paper presents a learning-based approach to reducing fall damage and recovery. An asymmetric actor-critic training structure is used to train a time-invariant policy with time-varying reward functions. In simulated experiments, the policy recovers from 98.9\% of initial falling configurations. It reduces base contact impulse, peak joint internal forces, and base acceleration during the fall compared to the baseline methods. The trained control policy is deployed and extensively tested on the ALMA robot hardware. A video summarizing the proposed method and the hardware tests is available at https://youtu.be/avwg2HqGi8s. △ Less

Submitted 9 March, 2023; originally announced March 2023.

arXiv:2303.01420 [pdf, other]

doi 10.55417/fr.2023013

ArtPlanner: Robust Legged Robot Navigation in the Field

Authors: Lorenz Wellhausen, Marco Hutter

Abstract: Due to the highly complex environment present during the DARPA Subterranean Challenge, all six funded teams relied on legged robots as part of their robotic team. Their unique locomotion skills of being able to step over obstacles require special considerations for navigation planning. In this work, we present and examine ArtPlanner, the navigation planner used by team CERBERUS during the Finals.… ▽ More Due to the highly complex environment present during the DARPA Subterranean Challenge, all six funded teams relied on legged robots as part of their robotic team. Their unique locomotion skills of being able to step over obstacles require special considerations for navigation planning. In this work, we present and examine ArtPlanner, the navigation planner used by team CERBERUS during the Finals. It is based on a sampling-based method that determines valid poses with a reachability abstraction and uses learned foothold scores to restrict areas considered safe for step**. The resulting planning graph is assigned learned motion costs by a neural network trained in simulation to minimize traversal time and limit the risk of failure. Our method achieves real-time performance with a bounded computation time. We present extensive experimental results gathered during the Finals event of the DARPA Subterranean Challenge, where this method contributed to team CERBERUS winning the competition. It powered navigation of four ANYmal quadrupeds for 90 minutes of autonomous operation without a single planning or locomotion failure. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2302.11434 [pdf, other]

iPlanner: Imperative Path Planning

Authors: Fan Yang, Chen Wang, Cesar Cadena, Marco Hutter

Abstract: The problem of path planning has been studied for years. Classic planning pipelines, including perception, map**, and path searching, can result in latency and compounding errors between modules. While recent studies have demonstrated the effectiveness of end-to-end learning methods in achieving high planning efficiency, these methods often struggle to match the generalization abilities of class… ▽ More The problem of path planning has been studied for years. Classic planning pipelines, including perception, map**, and path searching, can result in latency and compounding errors between modules. While recent studies have demonstrated the effectiveness of end-to-end learning methods in achieving high planning efficiency, these methods often struggle to match the generalization abilities of classic approaches in handling different environments. Moreover, end-to-end training of policies often requires a large number of labeled data or training iterations to reach convergence. In this paper, we present a novel Imperative Learning (IL) approach. This approach leverages a differentiable cost map to provide implicit supervision during policy training, eliminating the need for demonstrations or labeled trajectories. Furthermore, the policy training adopts a Bi-Level Optimization (BLO) process, which combines network update and metric-based trajectory optimization, to generate a smooth and collision-free path toward the goal based on a single depth measurement. The proposed method allows task-level costs of predicted trajectories to be backpropagated through all components to update the network through direct gradient descent. In our experiments, the method demonstrates around 4x faster planning than the classic approach and robustness against localization noise. Additionally, the IL approach enables the planner to generalize to various unseen environments, resulting in an overall 26-87% improvement in SPL performance compared to baseline learning methods. △ Less

Submitted 24 May, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: 9 pages, 11 figures, Robotics: Science and Systems (RSS) 2023

arXiv:2302.09579 [pdf, other]

Evaluating Representations with Readout Model Switching

Authors: Yazhe Li, Jorg Bornschein, Marcus Hutter

Abstract: Although much of the success of Deep Learning builds on learning good representations, a rigorous method to evaluate their quality is lacking. In this paper, we treat the evaluation of representations as a model selection problem and propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric. Contrary to the established practice of limiting the capacity of the rea… ▽ More Although much of the success of Deep Learning builds on learning good representations, a rigorous method to evaluate their quality is lacking. In this paper, we treat the evaluation of representations as a model selection problem and propose to use the Minimum Description Length (MDL) principle to devise an evaluation metric. Contrary to the established practice of limiting the capacity of the readout model, we design a hybrid discrete and continuous-valued model space for the readout models and employ a switching strategy to combine their predictions. The MDL score takes model complexity, as well as data efficiency into account. As a result, the most appropriate model for the specific task and representation will be chosen, making it a unified measure for comparison. The proposed metric can be efficiently computed with an online method and we present results for pre-trained vision encoders of various architectures (ResNet and ViT) and objective functions (supervised and self-supervised) on a range of downstream tasks. We compare our methods with accuracy-based approaches and show that the latter are inconsistent when multiple readout models are used. Finally, we discuss important properties revealed by our evaluations such as model scaling, preferred readout model, and data efficiency. △ Less

Submitted 19 February, 2023; originally announced February 2023.

Journal ref: International Conference on Learning Representations, 2023

arXiv:2302.06083 [pdf, ps, other]

Universal Agent Mixtures and the Geometry of Intelligence

Authors: Samuel Allen Alexander, David Quarel, Len Du, Marcus Hutter

Abstract: Inspired by recent progress in multi-agent Reinforcement Learning (RL), in this work we examine the collective intelligent behaviour of theoretical universal agents by introducing a weighted mixture operation. Given a weighted set of agents, their weighted mixture is a new agent whose expected total reward in any environment is the corresponding weighted average of the original agents' expected to… ▽ More Inspired by recent progress in multi-agent Reinforcement Learning (RL), in this work we examine the collective intelligent behaviour of theoretical universal agents by introducing a weighted mixture operation. Given a weighted set of agents, their weighted mixture is a new agent whose expected total reward in any environment is the corresponding weighted average of the original agents' expected total rewards in that environment. Thus, if RL agent intelligence is quantified in terms of performance across environments, the weighted mixture's intelligence is the weighted average of the original agents' intelligences. This operation enables various interesting new theorems that shed light on the geometry of RL agent intelligence, namely: results about symmetries, convex agent-sets, and local extrema. We also show that any RL agent intelligence measure based on average performance across environments, subject to certain weak technical conditions, is identical (up to a constant factor) to performance within a single environment dependent on said intelligence measure. △ Less

Submitted 12 February, 2023; originally announced February 2023.

Comments: 16 pages, accepted to AISTATS23

arXiv:2302.03067 [pdf, other]

Memory-Based Meta-Learning on Non-Stationary Distributions

Authors: Tim Genewein, Grégoire Delétang, Anian Ruoss, Li Kevin Wenliang, Elliot Catt, Vincent Dutordoir, Jordi Grau-Moya, Laurent Orseau, Marcus Hutter, Joel Veness

Abstract: Memory-based meta-learning is a technique for approximating Bayes-optimal predictors. Under fairly general conditions, minimizing sequential prediction error, measured by the log loss, leads to implicit meta-learning. The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. The focus is on piecewise stationary s… ▽ More Memory-based meta-learning is a technique for approximating Bayes-optimal predictors. Under fairly general conditions, minimizing sequential prediction error, measured by the log loss, leads to implicit meta-learning. The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. The focus is on piecewise stationary sources with unobserved switching-points, which arguably capture an important characteristic of natural language and action-observation sequences in partially observable environments. We show that various types of memory-based neural models, including Transformers, LSTMs, and RNNs can learn to accurately approximate known Bayes-optimal algorithms and behave as if performing Bayesian inference over the latent switching-points and the latent parameters governing the data distribution within each segment. △ Less

Submitted 25 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

arXiv:2302.02971 [pdf, other]

U-Clip: On-Average Unbiased Stochastic Gradient Clip**

Authors: Bryn Elesedy, Marcus Hutter

Abstract: U-Clip is a simple amendment to gradient clip** that can be applied to any iterative gradient optimization algorithm. Like regular clip**, U-Clip involves using gradients that are clipped to a prescribed size (e.g. with component wise or norm based clip**) but instead of discarding the clipped portion of the gradient, U-Clip maintains a buffer of these values that is added to the gradients o… ▽ More U-Clip is a simple amendment to gradient clip** that can be applied to any iterative gradient optimization algorithm. Like regular clip**, U-Clip involves using gradients that are clipped to a prescribed size (e.g. with component wise or norm based clip**) but instead of discarding the clipped portion of the gradient, U-Clip maintains a buffer of these values that is added to the gradients on the next iteration (before clip**). We show that the cumulative bias of the U-Clip updates is bounded by a constant. This implies that the clipped updates are unbiased on average. Convergence follows via a lemma that guarantees convergence with updates $u_i$ as long as $\sum_{i=1}^t (u_i - g_i) = o(t)$ where $g_i$ are the gradients. Extensive experimental exploration is performed on CIFAR10 with further validation given on ImageNet. △ Less

Submitted 6 February, 2023; originally announced February 2023.

arXiv:2301.04195 [pdf, other]

doi 10.1109/LRA.2023.3270034

Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments

Authors: Mayank Mittal, Calvin Yu, Qinxi Yu, **gzhou Liu, Nikita Rudin, David Hoeller, Jia Lin Yuan, Ritvik Singh, Yunrong Guo, Hammad Mazhar, Ajay Mandlekar, Buck Babich, Gavriel State, Marco Hutter, Animesh Garg

Abstract: We present Orbit, a unified and modular framework for robot learning powered by NVIDIA Isaac Sim. It offers a modular design to easily and efficiently create robotic environments with photo-realistic scenes and high-fidelity rigid and deformable body simulation. With Orbit, we provide a suite of benchmark tasks of varying difficulty -- from single-stage cabinet opening and cloth folding to multi-s… ▽ More We present Orbit, a unified and modular framework for robot learning powered by NVIDIA Isaac Sim. It offers a modular design to easily and efficiently create robotic environments with photo-realistic scenes and high-fidelity rigid and deformable body simulation. With Orbit, we provide a suite of benchmark tasks of varying difficulty -- from single-stage cabinet opening and cloth folding to multi-stage tasks such as room reorganization. To support working with diverse observations and action spaces, we include fixed-arm and mobile manipulators with different physically-based sensors and motion generators. Orbit allows training reinforcement learning policies and collecting large demonstration datasets from hand-crafted or expert solutions in a matter of minutes by leveraging GPU-based parallelization. In summary, we offer an open-sourced framework that readily comes with 16 robotic platforms, 4 sensor modalities, 10 motion generators, more than 20 benchmark tasks, and wrappers to 4 learning libraries. With this framework, we aim to support various research areas, including representation learning, reinforcement learning, imitation learning, and task and motion planning. We hope it helps establish interdisciplinary collaborations in these communities, and its modularity makes it easily extensible for more tasks and applications in the future. △ Less

Submitted 16 February, 2024; v1 submitted 10 January, 2023; originally announced January 2023.

Comments: Project website: https://isaac-orbit.github.io/

Journal ref: IEEE Robotics and Automation Letters (Volume: 8, Issue: 6, June 2023)

arXiv:2301.03509 [pdf, other]

doi 10.1109/LRA.2023.3234809

Learning-based Design and Control for Quadrupedal Robots with Parallel-Elastic Actuators

Authors: Filip Bjelonic, Joonho Lee, Philip Arm, Dhionis Sako, Davide Tateo, Jan Peters, Marco Hutter

Abstract: Parallel-elastic joints can improve the efficiency and strength of robots by assisting the actuators with additional torques. For these benefits to be realized, a spring needs to be carefully designed. However, designing robots is an iterative and tedious process, often relying on intuition and heuristics. We introduce a design optimization framework that allows us to co-optimize a parallel elasti… ▽ More Parallel-elastic joints can improve the efficiency and strength of robots by assisting the actuators with additional torques. For these benefits to be realized, a spring needs to be carefully designed. However, designing robots is an iterative and tedious process, often relying on intuition and heuristics. We introduce a design optimization framework that allows us to co-optimize a parallel elastic knee joint and locomotion controller for quadrupedal robots with minimal human intuition. We design a parallel elastic joint and optimize its parameters with respect to the efficiency in a model-free fashion. In the first step, we train a design-conditioned policy using model-free Reinforcement Learning, capable of controlling the quadruped in the predefined range of design parameters. Afterwards, we use Bayesian Optimization to find the best design using the policy. We use this framework to optimize the parallel-elastic spring parameters for the knee of our quadrupedal robot ANYmal together with the optimal controller. We evaluate the optimized design and controller in real-world experiments over various terrains. Our results show that the new system improves the torque-square efficiency of the robot by 33% compared to the baseline and reduces maximum joint torque by 30% without compromising tracking performance. The improved design resulted in 11% longer operation time on flat terrain. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Journal ref: IEEE Robotics and Automation Letters (2023)

arXiv:2212.12532 [pdf, other]

Generalization Bounds for Few-Shot Transfer Learning with Pretrained Classifiers

Authors: Tomer Galanti, András György, Marcus Hutter

Abstract: We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. We offer a theoretical expl… ▽ More We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. We offer a theoretical explanation for this behavior based on the recently discovered phenomenon of class-feature-variability collapse, that is, that during the training of deep classification networks the feature embeddings of samples belonging to the same class tend to concentrate around their class means. More specifically, we show that the few-shot error of the learned feature map on new classes (defined as the classification error of the nearest class-center classifier using centers learned from a small number of random samples from each new class) is small in case of class-feature-variability collapse, under the assumption that the classes are selected independently from a fixed distribution. This suggests that foundation models can provide feature maps that are transferable to new downstream tasks, even with very few samples; to our knowledge, this is the first performance bound for transfer-learning that is non-vacuous in the few-shot setting. △ Less

Submitted 16 July, 2023; v1 submitted 23 December, 2022; originally announced December 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2112.15121

arXiv:2211.16335 [pdf, other]

X-ICP: Localizability-Aware LiDAR Registration for Robust Localization in Extreme Environments

Authors: Turcan Tuna, Julian Nubert, Yoshua Nava, Shehryar Khattak, Marco Hutter

Abstract: Modern robotic systems are required to operate in challenging environments, which demand reliable localization under challenging conditions. LiDAR-based localization methods, such as the Iterative Closest Point (ICP) algorithm, can suffer in geometrically uninformative environments that are known to deteriorate point cloud registration performance and push optimization toward divergence along weak… ▽ More Modern robotic systems are required to operate in challenging environments, which demand reliable localization under challenging conditions. LiDAR-based localization methods, such as the Iterative Closest Point (ICP) algorithm, can suffer in geometrically uninformative environments that are known to deteriorate point cloud registration performance and push optimization toward divergence along weakly constrained directions. To overcome this issue, this work proposes i) a robust fine-grained localizability detection module, and ii) a localizability-aware constrained ICP optimization module, which couples with the localizability detection module in a unified manner. The proposed localizability detection is achieved by utilizing the correspondences between the scan and the map to analyze the alignment strength against the principal directions of the optimization as part of its fine-grained LiDAR localizability analysis. In the second part, this localizability analysis is then integrated into the scan-to-map point cloud registration to generate drift-free pose updates by enforcing controlled updates or leaving the degenerate directions of the optimization unchanged. The proposed method is thoroughly evaluated and compared to state-of-the-art methods in simulated and real-world experiments, demonstrating the performance and reliability improvement in LiDAR-challenging environments. In all experiments, the proposed framework demonstrates accurate and generalizable localizability detection and robust pose estimation without environment-specific parameter tuning. △ Less

Submitted 18 February, 2024; v1 submitted 29 November, 2022; originally announced November 2022.

Comments: 20 Pages, 20 Figures Submitted to IEEE Transactions On Robotics. Supplementary Video: https://youtu.be/SviLl7q69aA Project Website: https://sites.google.com/leggedrobotics.com/x-icp

arXiv:2211.10270 [pdf, other]

Bayesian Multi-Task Learning MPC for Robotic Mobile Manipulation

Authors: Elena Arcari, Maria Vittoria Minniti, Anna Scampicchio, Andrea Carron, Farbod Farshidian, Marco Hutter, Melanie N. Zeilinger

Abstract: Mobile manipulation in robotics is challenging due to the need of solving many diverse tasks, such as opening a door or picking-and-placing an object. Typically, a basic first-principles system description of the robot is available, thus motivating the use of model-based controllers. However, the robot dynamics and its interaction with an object are affected by uncertainty, limiting the controller… ▽ More Mobile manipulation in robotics is challenging due to the need of solving many diverse tasks, such as opening a door or picking-and-placing an object. Typically, a basic first-principles system description of the robot is available, thus motivating the use of model-based controllers. However, the robot dynamics and its interaction with an object are affected by uncertainty, limiting the controller's performance. To tackle this problem, we propose a Bayesian multi-task learning model that uses trigonometric basis functions to identify the error in the dynamics. In this way, data from different but related tasks can be leveraged to provide a descriptive error model that can be efficiently updated online for new, unseen tasks. We combine this learning scheme with a model predictive controller, and extensively test the effectiveness of the proposed approach, including comparisons with available baseline controllers. We present simulation tests with a ball-balancing robot, and door-opening hardware experiments with a quadrupedal manipulator. △ Less

Submitted 21 March, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

Comments: Accepted for publication in the IEEE Robotics and Automation Letters (RA-L)

arXiv:2210.14997 [pdf, other]

LiDAR-guided object search and detection in Subterranean Environments

Authors: Manthan Patel, Gabriel Waibel, Shehryar Khattak, Marco Hutter

Abstract: Detecting objects of interest, such as human survivors, safety equipment, and structure access points, is critical to any search-and-rescue operation. Robots deployed for such time-sensitive efforts rely on their onboard sensors to perform their designated tasks. However, as disaster response operations are predominantly conducted under perceptually degraded conditions, commonly utilized sensors s… ▽ More Detecting objects of interest, such as human survivors, safety equipment, and structure access points, is critical to any search-and-rescue operation. Robots deployed for such time-sensitive efforts rely on their onboard sensors to perform their designated tasks. However, as disaster response operations are predominantly conducted under perceptually degraded conditions, commonly utilized sensors such as visual cameras and LiDARs suffer in terms of performance degradation. In response, this work presents a method that utilizes the complementary nature of vision and depth sensors to leverage multi-modal information to aid object detection at longer distances. In particular, depth and intensity values from sparse LiDAR returns are used to generate proposals for objects present in the environment. These proposals are then utilized by a Pan-Tilt-Zoom (PTZ) camera system to perform a directed search by adjusting its pose and zoom level for performing object detection and classification in difficult environments. The proposed work has been thoroughly verified using an ANYmal quadruped robot in underground settings and on datasets collected during the DARPA Subterranean Challenge finals. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: 6 pages, 5 Figures, 2 Tables, conference: IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR-2022), Seville, Spain

arXiv:2210.13856 [pdf, other]

A Framework for Collaborative Multi-Robot Map** using Spectral Graph Wavelets

Authors: Lukas Bernreiter, Shehryar Khattak, Lionel Ott, Roland Siegwart, Marco Hutter, Cesar Cadena

Abstract: The exploration of large-scale unknown environments can benefit from the deployment of multiple robots for collaborative map**. Each robot explores a section of the environment and communicates onboard pose estimates and maps to a central server to build an optimized global multi-robot map. Naturally, inconsistencies can arise between onboard and server estimates due to onboard odometry drift, f… ▽ More The exploration of large-scale unknown environments can benefit from the deployment of multiple robots for collaborative map**. Each robot explores a section of the environment and communicates onboard pose estimates and maps to a central server to build an optimized global multi-robot map. Naturally, inconsistencies can arise between onboard and server estimates due to onboard odometry drift, failures, or degeneracies. The map** server can correct and overcome such failure cases using computationally expensive operations such as inter-robot loop closure detection and multi-modal map**. However, the individual robots do not benefit from the collaborative map if the map** server provides no feedback. Although server updates from the multi-robot map can greatly alleviate the robotic mission strategically, most existing work lacks them, due to their associated computational and bandwidth-related costs. Motivated by this challenge, this paper proposes a novel collaborative map** framework that enables global map** consistency among robots and the map** server. In particular, we propose graph spectral analysis, at different spatial scales, to detect structural differences between robot and server graphs, and to generate necessary constraints for the individual robot pose graphs. Our approach specifically finds the nodes that correspond to the drift's origin rather than the nodes where the error becomes too large. We thoroughly analyze and validate our proposed framework using several real-world multi-robot field deployments where we show improvements of the onboard system up to 90\% and can recover the onboard estimation from localization failures and even from the degeneracies within its estimation. △ Less

Submitted 2 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

Comments: arXiv admin note: text overlap with arXiv:2203.00308

arXiv:2210.12392 [pdf, other]

Testing Independence of Exchangeable Random Variables

Authors: Marcus Hutter

Abstract: Given well-shuffled data, can we determine whether the data items are statistically (in)dependent? Formally, we consider the problem of testing whether a set of exchangeable random variables are independent. We will show that this is possible and develop tests that can confidently reject the null hypothesis that data is independent and identically distributed and have high power for (some) exchang… ▽ More Given well-shuffled data, can we determine whether the data items are statistically (in)dependent? Formally, we consider the problem of testing whether a set of exchangeable random variables are independent. We will show that this is possible and develop tests that can confidently reject the null hypothesis that data is independent and identically distributed and have high power for (some) exchangeable distributions. We will make no structural assumptions on the underlying sample space. One potential application is in Deep Learning, where data is often scraped from the whole internet, with duplications abound, which can render data non-iid and test-set evaluation prone to give wrong answers. △ Less

Submitted 22 October, 2022; originally announced October 2022.

Comments: 42 pages, 3 figures

arXiv:2210.07931 [pdf, other]

Sequential Learning Of Neural Networks for Prequential MDL

Authors: Jorg Bornschein, Yazhe Li, Marcus Hutter

Abstract: Minimum Description Length (MDL) provides a framework and an objective for principled model evaluation. It formalizes Occam's Razor and can be applied to data from non-stationary sources. In the prequential formulation of MDL, the objective is to minimize the cumulative next-step log-loss when sequentially going through the data and using previous observations for parameter estimation. It thus clo… ▽ More Minimum Description Length (MDL) provides a framework and an objective for principled model evaluation. It formalizes Occam's Razor and can be applied to data from non-stationary sources. In the prequential formulation of MDL, the objective is to minimize the cumulative next-step log-loss when sequentially going through the data and using previous observations for parameter estimation. It thus closely resembles a continual- or online-learning problem. In this study, we evaluate approaches for computing prequential description lengths for image classification datasets with neural networks. Considering the computational cost, we find that online-learning with rehearsal has favorable performance compared to the previously widely used block-wise estimation. We propose forward-calibration to better align the models predictions with the empirical observations and introduce replay-streams, a minibatch incremental training technique to efficiently implement approximate random replay while avoiding large in-memory replay buffers. As a result, we present description lengths for a suite of image classification datasets that improve upon previously reported results by large margins. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2210.02750 [pdf, other]

doi 10.1109/LRA.2022.3211785

Meta Reinforcement Learning for Optimal Design of Legged Robots

Authors: Álvaro Belmonte-Baeza, Joonho Lee, Giorgio Valsecchi, Marco Hutter

Abstract: The process of robot design is a complex task and the majority of design decisions are still based on human intuition or tedious manual tuning. A more informed way of facing this task is computational design methods where design parameters are concurrently optimized with corresponding controllers. Existing approaches, however, are strongly influenced by predefined control rules or motion templates… ▽ More The process of robot design is a complex task and the majority of design decisions are still based on human intuition or tedious manual tuning. A more informed way of facing this task is computational design methods where design parameters are concurrently optimized with corresponding controllers. Existing approaches, however, are strongly influenced by predefined control rules or motion templates and cannot provide end-to-end solutions. In this paper, we present a design optimization framework using model-free meta reinforcement learning, and its application to the optimizing kinematics and actuator parameters of quadrupedal robots. We use meta reinforcement learning to train a locomotion policy that can quickly adapt to different designs. This policy is used to evaluate each design instance during the design optimization. We demonstrate that the policy can control robots of different designs to track random velocity commands over various rough terrains. With controlled experiments, we show that the meta policy achieves close-to-optimal performance for each design instance after adaptation. Lastly, we compare our results against a model-based baseline and show that our approach allows higher performance while not being constrained by predefined motions or gait patterns. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: 8 pages. To be published in IEEE Robotics and Automation Letters (RA-L) and ICRA 2023

arXiv:2210.02019 [pdf, other]

Atari-5: Distilling the Arcade Learning Environment down to Five Games

Authors: Matthew Aitchison, Penny Sweetser, Marcus Hutter

Abstract: The Arcade Learning Environment (ALE) has become an essential benchmark for assessing the performance of reinforcement learning algorithms. However, the computational cost of generating results on the entire 57-game dataset limits ALE's use and makes the reproducibility of many results infeasible. We propose a novel solution to this problem in the form of a principled methodology for selecting sma… ▽ More The Arcade Learning Environment (ALE) has become an essential benchmark for assessing the performance of reinforcement learning algorithms. However, the computational cost of generating results on the entire 57-game dataset limits ALE's use and makes the reproducibility of many results infeasible. We propose a novel solution to this problem in the form of a principled methodology for selecting small but representative subsets of environments within a benchmark suite. We applied our method to identify a subset of five ALE games, called Atari-5, which produces 57-game median score estimates within 10% of their true values. Extending the subset to 10-games recovers 80% of the variance for log-scores for all games within the 57-game set. We show this level of compression is possible due to a high degree of correlation between many of the games in ALE. △ Less

Submitted 5 October, 2022; originally announced October 2022.

arXiv:2209.15618 [pdf, other]

Beyond Bayes-optimality: meta-learning what you know you don't know

Authors: Jordi Grau-Moya, Grégoire Delétang, Markus Kunesch, Tim Genewein, Elliot Catt, Kevin Li, Anian Ruoss, Chris Cundy, Joel Veness, Jane Wang, Marcus Hutter, Christopher Summerfield, Shane Legg, Pedro Ortega

Abstract: Meta-training agents with memory has been shown to culminate in Bayes-optimal agents, which casts Bayes-optimality as the implicit solution to a numerical optimization problem rather than an explicit modeling assumption. Bayes-optimal agents are risk-neutral, since they solely attune to the expected return, and ambiguity-neutral, since they act in new situations as if the uncertainty were known. T… ▽ More Meta-training agents with memory has been shown to culminate in Bayes-optimal agents, which casts Bayes-optimality as the implicit solution to a numerical optimization problem rather than an explicit modeling assumption. Bayes-optimal agents are risk-neutral, since they solely attune to the expected return, and ambiguity-neutral, since they act in new situations as if the uncertainty were known. This is in contrast to risk-sensitive agents, which additionally exploit the higher-order moments of the return, and ambiguity-sensitive agents, which act differently when recognizing situations in which they lack knowledge. Humans are also known to be averse to ambiguity and sensitive to risk in ways that aren't Bayes-optimal, indicating that such sensitivity can confer advantages, especially in safety-critical situations. How can we extend the meta-learning protocol to generate risk- and ambiguity-sensitive agents? The goal of this work is to fill this gap in the literature by showing that risk- and ambiguity-sensitivity also emerge as the result of an optimization problem using modified meta-training algorithms, which manipulate the experience-generation process of the learner. We empirically test our proposed meta-training algorithms on agents exposed to foundational classes of decision-making experiments and demonstrate that they become sensitive to risk and ambiguity. △ Less

Submitted 12 October, 2022; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: 33 pages, 8 figures, technical report

arXiv:2209.15428 [pdf, other]

PyPose: A Library for Robot Learning with Physics-based Optimization

Authors: Chen Wang, Dasong Gao, Kuan Xu, Junyi Geng, Yaoyu Hu, Yuheng Qiu, Bowen Li, Fan Yang, Brady Moon, Abhinav Pandey, Aryan, Jiahe Xu, Tianhao Wu, Haonan He, Daning Huang, Zhongqiang Ren, Shibo Zhao, Taimeng Fu, Pranay Reddy, Xiao Lin, Wenshan Wang, **gnan Shi, Rajat Talak, Kun Cao, Yi Du , et al. (12 additional authors not shown)

Abstract: Deep learning has had remarkable success in robotic perception, but its data-centric nature suffers when it comes to generalizing to ever-changing environments. By contrast, physics-based optimization generalizes better, but it does not perform as well in complicated tasks due to the lack of high-level semantic information and reliance on manual parametric tuning. To take advantage of these two co… ▽ More Deep learning has had remarkable success in robotic perception, but its data-centric nature suffers when it comes to generalizing to ever-changing environments. By contrast, physics-based optimization generalizes better, but it does not perform as well in complicated tasks due to the lack of high-level semantic information and reliance on manual parametric tuning. To take advantage of these two complementary worlds, we present PyPose: a robotics-oriented, PyTorch-based library that combines deep perceptual models with physics-based optimization. PyPose's architecture is tidy and well-organized, it has an imperative style interface and is efficient and user-friendly, making it easy to integrate into real-world robotic applications. Besides, it supports parallel computing of any order gradients of Lie groups and Lie algebras and $2^{\text{nd}}$-order optimizers, such as trust region methods. Experiments show that PyPose achieves more than $10\times$ speedup in computation compared to the state-of-the-art libraries. To boost future research, we provide concrete examples for several fields of robot learning, including SLAM, planning, control, and inertial navigation. △ Less

Submitted 24 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

Comments: Project Website: https://pypose.org Documentation: https://pypose.org/docs/ Tutorial: https://pypose.org/tutorials/ Source code: https://github.com/pypose/pypose

Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

arXiv:2209.12827 [pdf, other]

Advanced Skills by Learning Locomotion and Local Navigation End-to-End

Authors: Nikita Rudin, David Hoeller, Marko Bjelonic, Marco Hutter

Abstract: The common approach for local navigation on challenging environments with legged robots requires path planning, path following and locomotion, which usually requires a locomotion control policy that accurately tracks a commanded velocity. However, by breaking down the navigation problem into these sub-tasks, we limit the robot's capabilities since the individual tasks do not consider the full solu… ▽ More The common approach for local navigation on challenging environments with legged robots requires path planning, path following and locomotion, which usually requires a locomotion control policy that accurately tracks a commanded velocity. However, by breaking down the navigation problem into these sub-tasks, we limit the robot's capabilities since the individual tasks do not consider the full solution space. In this work, we propose to solve the complete problem by training an end-to-end policy with deep reinforcement learning. Instead of continuously tracking a precomputed path, the robot needs to reach a target position within a provided time. The task's success is only evaluated at the end of an episode, meaning that the policy does not need to reach the target as fast as possible. It is free to select its path and the locomotion gait. Training a policy in this way opens up a larger set of possible solutions, which allows the robot to learn more complex behaviors. We compare our approach to velocity tracking and additionally show that the time dependence of the task reward is critical to successfully learn these new behaviors. Finally, we demonstrate the successful deployment of policies on a real quadrupedal robot. The robot is able to cross challenging terrains, which were not possible previously, while using a more energy-efficient gait and achieving a higher success rate. △ Less

Submitted 26 September, 2022; originally announced September 2022.

Comments: IROS 2022, Project website: https://sites.google.com/ leggedrobotics.com/end-to-end-loco-navigation

arXiv:2208.08373 [pdf, other]

Perceptive Locomotion through Nonlinear Model Predictive Control

Authors: Ruben Grandia, Fabian Jenelten, Shaohui Yang, Farbod Farshidian, Marco Hutter

Abstract: Dynamic locomotion in rough terrain requires accurate foot placement, collision avoidance, and planning of the underactuated dynamics of the system. Reliably optimizing for such motions and interactions in the presence of imperfect and often incomplete perceptive information is challenging. We present a complete perception, planning, and control pipeline, that can optimize motions for all degrees… ▽ More Dynamic locomotion in rough terrain requires accurate foot placement, collision avoidance, and planning of the underactuated dynamics of the system. Reliably optimizing for such motions and interactions in the presence of imperfect and often incomplete perceptive information is challenging. We present a complete perception, planning, and control pipeline, that can optimize motions for all degrees of freedom of the robot in real-time. To mitigate the numerical challenges posed by the terrain a sequence of convex inequality constraints is extracted as local approximations of foothold feasibility and embedded into an online model predictive controller. Steppability classification, plane segmentation, and a signed distance field are precomputed per elevation map to minimize the computational effort during the optimization. A combination of multiple-shooting, real-time iteration, and a filter-based line-search are used to solve the formulated problem reliably and at high rate. We validate the proposed method in scenarios with gaps, slopes, and step** stones in simulation and experimentally on the ANYmal quadruped platform, resulting in state-of-the-art dynamic climbing. △ Less

Submitted 17 August, 2022; originally announced August 2022.

arXiv:2208.01787 [pdf, other]

Present and Future of SLAM in Extreme Underground Environments

Authors: Kamak Ebadi, Lukas Bernreiter, Harel Biggie, Gavin Catt, Yun Chang, Arghya Chatterjee, Christopher E. Denniston, Simon-Pierre Deschênes, Kyle Harlow, Shehryar Khattak, Lucas Nogueira, Matteo Palieri, Pavel Petráček, Matěj Petrlík, Andrzej Reinke, Vít Krátký, Shibo Zhao, Ali-akbar Agha-mohammadi, Kostas Alexis, Christoffer Heckman, Kasra Khosoussi, Navinda Kottege, Benjamin Morrell, Marco Hutter, Fred Pauling , et al. (6 additional authors not shown)

Abstract: This paper reports on the state of the art in underground SLAM by discussing different SLAM strategies and results across six teams that participated in the three-year-long SubT competition. In particular, the paper has four main goals. First, we review the algorithms, architectures, and systems adopted by the teams; particular emphasis is put on lidar-centric SLAM solutions (the go-to approach fo… ▽ More This paper reports on the state of the art in underground SLAM by discussing different SLAM strategies and results across six teams that participated in the three-year-long SubT competition. In particular, the paper has four main goals. First, we review the algorithms, architectures, and systems adopted by the teams; particular emphasis is put on lidar-centric SLAM solutions (the go-to approach for virtually all teams in the competition), heterogeneous multi-robot operation (including both aerial and ground robots), and real-world underground operation (from the presence of obscurants to the need to handle tight computational constraints). We do not shy away from discussing the dirty details behind the different SubT SLAM systems, which are often omitted from technical papers. Second, we discuss the maturity of the field by highlighting what is possible with the current SLAM systems and what we believe is within reach with some good systems engineering. Third, we outline what we believe are fundamental open problems, that are likely to require further research to break through. Finally, we provide a list of open-source SLAM implementations and datasets that have been produced during the SubT challenge and related efforts, and constitute a useful resource for researchers and practitioners. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: 21 pages including references. This survey paper is submitted to IEEE Transactions on Robotics for pre-approval

arXiv:2208.01329 [pdf, other]

Self-Supervised Traversability Prediction by Learning to Reconstruct Safe Terrain

Authors: Robin Schmid, Deegan Atha, Frederik Schöller, Sharmita Dey, Seyed Fakoorian, Kyohei Otsu, Barry Ridge, Marko Bjelonic, Lorenz Wellhausen, Marco Hutter, Ali-akbar Agha-mohammadi

Abstract: Navigating off-road with a fast autonomous vehicle depends on a robust perception system that differentiates traversable from non-traversable terrain. Typically, this depends on a semantic understanding which is based on supervised learning from images annotated by a human expert. This requires a significant investment in human time, assumes correct expert classification, and small details can lea… ▽ More Navigating off-road with a fast autonomous vehicle depends on a robust perception system that differentiates traversable from non-traversable terrain. Typically, this depends on a semantic understanding which is based on supervised learning from images annotated by a human expert. This requires a significant investment in human time, assumes correct expert classification, and small details can lead to misclassification. To address these challenges, we propose a method for predicting high- and low-risk terrains from only past vehicle experience in a self-supervised fashion. First, we develop a tool that projects the vehicle trajectory into the front camera image. Second, occlusions in the 3D representation of the terrain are filtered out. Third, an autoencoder trained on masked vehicle trajectory regions identifies low- and high-risk terrains based on the reconstruction error. We evaluated our approach with two models and different bottleneck sizes with two different training and testing sites with a fourwheeled off-road vehicle. Comparison with two independent test sets of semantic labels from similar terrain as training sites demonstrates the ability to separate the ground as low-risk and the vegetation as high-risk with 81.1% and 85.1% accuracy. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2207.14745 [pdf, other]

Collision detection and identification for a legged manipulator

Authors: Jessie van Dam, Andreea Tulbure, Maria Vittoria Minniti, Firas Abi-Farraj, Marco Hutter

Abstract: To safely deploy legged robots in the real world it is necessary to provide them with the ability to reliably detect unexpected contacts and accurately estimate the corresponding contact force. In this paper, we propose a collision detection and identification pipeline for a quadrupedal manipulator. We first introduce an approach to estimate the collision time span based on band-pass filtering and… ▽ More To safely deploy legged robots in the real world it is necessary to provide them with the ability to reliably detect unexpected contacts and accurately estimate the corresponding contact force. In this paper, we propose a collision detection and identification pipeline for a quadrupedal manipulator. We first introduce an approach to estimate the collision time span based on band-pass filtering and show that this information is key for obtaining accurate collision force estimates. We then improve the accuracy of the identified force magnitude by compensating for model inaccuracies, unmodeled loads, and any other potential source of quasi-static disturbances acting on the robot. We validate our framework with extensive hardware experiments in various scenarios, including trotting and additional unmodeled load on the robot. △ Less

Submitted 29 July, 2022; originally announced July 2022.

Comments: International Conference on Intelligent Robots and Systems, IROS 2022, Kyoto, Japan, Oct 23 - Oct. 27, 2022

arXiv:2207.14635 [pdf, other]

Haptic Teleoperation of High-dimensional Robotic Systems Using a Feedback MPC Framework

Authors: ** Cheng, Firas Abi-Farraj, Farbod Farshidian, Marco Hutter

Abstract: Model Predictive Control (MPC) schemes have proven their efficiency in controlling high degree-of-freedom (DoF) complex robotic systems. However, they come at a high computational cost and an update rate of about tens of hertz. This relatively slow update rate hinders the possibility of stable haptic teleoperation of such systems since the slow feedback loops can cause instabilities and loss of tr… ▽ More Model Predictive Control (MPC) schemes have proven their efficiency in controlling high degree-of-freedom (DoF) complex robotic systems. However, they come at a high computational cost and an update rate of about tens of hertz. This relatively slow update rate hinders the possibility of stable haptic teleoperation of such systems since the slow feedback loops can cause instabilities and loss of transparency to the operator. This work presents a novel framework for transparent teleoperation of MPC-controlled complex robotic systems. In particular, we employ a feedback MPC approach and exploit its structure to account for the operator input at a fast rate which is independent of the update rate of the MPC loop itself. We demonstrate our framework on a mobile manipulator platform and show that it significantly improves haptic teleoperation's transparency and stability. We also highlight that the proposed feedback structure is constraint satisfactory and does not violate any constraints defined in the optimal control problem. To the best of our knowledge, this work is the first realization of the bilateral teleoperation of a legged manipulator using a whole-body MPC framework. △ Less

Submitted 29 July, 2022; originally announced July 2022.

arXiv:2207.09238 [pdf, other]

Formal Algorithms for Transformers

Authors: Mary Phuong, Marcus Hutter

Abstract: This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures s… ▽ More This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: 16 pages, 15 algorithms

Journal ref: Latest 2022 version at http://www.hutter1.net/publ/transalg.pdf

arXiv:2207.04914 [pdf, other]

Team CERBERUS Wins the DARPA Subterranean Challenge: Technical Overview and Lessons Learned

Authors: Marco Tranzatto, Mihir Dharmadhikari, Lukas Bernreiter, Marco Camurri, Shehryar Khattak, Frank Mascarich, Patrick Pfreundschuh, David Wisth, Samuel Zimmermann, Mihir Kulkarni, Victor Reijgwart, Benoit Casseau, Timon Homberger, Paolo De Petris, Lionel Ott, Wayne Tubby, Gabriel Waibel, Huan Nguyen, Cesar Cadena, Russell Buchanan, Lorenz Wellhausen, Nikhil Khedekar, Olov Andersson, Lintong Zhang, Takahiro Miki , et al. (11 additional authors not shown)

Abstract: This article presents the CERBERUS robotic system-of-systems, which won the DARPA Subterranean Challenge Final Event in 2021. The Subterranean Challenge was organized by DARPA with the vision to facilitate the novel technologies necessary to reliably explore diverse underground environments despite the grueling challenges they present for robotic autonomy. Due to their geometric complexity, degrad… ▽ More This article presents the CERBERUS robotic system-of-systems, which won the DARPA Subterranean Challenge Final Event in 2021. The Subterranean Challenge was organized by DARPA with the vision to facilitate the novel technologies necessary to reliably explore diverse underground environments despite the grueling challenges they present for robotic autonomy. Due to their geometric complexity, degraded perceptual conditions combined with lack of GPS support, austere navigation conditions, and denied communications, subterranean settings render autonomous operations particularly demanding. In response to this challenge, we developed the CERBERUS system which exploits the synergy of legged and flying robots, coupled with robust control especially for overcoming perilous terrain, multi-modal and multi-robot perception for localization and map** in conditions of sensor degradation, and resilient autonomy through unified exploration path planning and local motion planning that reflects robot-specific limitations. Based on its ability to explore diverse underground environments and its high-level command and control by a single human supervisor, CERBERUS demonstrated efficient exploration, reliable detection of objects of interest, and accurate map**. In this article, we report results from both the preliminary runs and the final Prize Round of the DARPA Subterranean Challenge, and discuss highlights and challenges faced, alongside lessons learned for the benefit of the community. △ Less

Submitted 11 July, 2022; originally announced July 2022.

arXiv:2207.02098 [pdf, other]

Neural Networks and the Chomsky Hierarchy

Authors: Grégoire Delétang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, Pedro A. Ortega

Abstract: Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20'910 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice… ▽ More Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (20'910 models, 15 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grou** tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs. This includes negative results where even extensive amounts of data and training time never lead to any non-trivial generalization, despite models having sufficient capacity to fit the training data perfectly. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks. △ Less

Submitted 28 February, 2023; v1 submitted 5 July, 2022; originally announced July 2022.

arXiv:2206.15298 [pdf, other]

doi 10.1109/LRA.2022.3189166

Design and Motion Planning for a Reconfigurable Robotic Base

Authors: Johannes Pankert, Giorgio Valsecchi, Davide Baret, Jon Zehnder, Lukasz L. Pietrasik, Marko Bjelonic, Marco Hutter

Abstract: A robotic platform for mobile manipulation needs to satisfy two contradicting requirements for many real-world applications: A compact base is required to navigate through cluttered indoor environments, while the support needs to be large enough to prevent tumbling or tip over, especially during fast manipulation operations with heavy payloads or forceful interaction with the environment. This pap… ▽ More A robotic platform for mobile manipulation needs to satisfy two contradicting requirements for many real-world applications: A compact base is required to navigate through cluttered indoor environments, while the support needs to be large enough to prevent tumbling or tip over, especially during fast manipulation operations with heavy payloads or forceful interaction with the environment. This paper proposes a novel robot design that fulfills both requirements through a versatile footprint. It can reconfigure its footprint to a narrow configuration when navigating through tight spaces and to a wide stance when manipulating heavy objects. Furthermore, its triangular configuration allows for high-precision tasks on uneven ground by preventing support switches. A model predictive control strategy is presented that unifies planning and control for simultaneous navigation, reconfiguration, and manipulation. It converts task-space goals into whole-body motion plans for the new robot. The proposed design has been tested extensively with a hardware prototype. The footprint reconfiguration allows to almost completely remove manipulation-induced vibrations. The control strategy proves effective in both lab experiment and during a real-world construction task. △ Less

Submitted 5 April, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

Comments: 8 pages, accepted for RA-L and IROS 2022

Journal ref: IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 9012-9019, Oct. 2022

arXiv:2206.14049 [pdf, other]

doi 10.1109/TRO.2022.3186804

TAMOLS: Terrain-Aware Motion Optimization for Legged Systems

Authors: Fabian Jenelten, Ruben Grandia, Farbod Farshidian, Marco Hutter

Abstract: Terrain geometry is, in general, non-smooth, non-linear, non-convex, and, if perceived through a robot-centric visual unit, appears partially occluded and noisy. This work presents the complete control pipeline capable of handling the aforementioned problems in real-time. We formulate a trajectory optimization problem that jointly optimizes over the base pose and footholds, subject to a heightmap.… ▽ More Terrain geometry is, in general, non-smooth, non-linear, non-convex, and, if perceived through a robot-centric visual unit, appears partially occluded and noisy. This work presents the complete control pipeline capable of handling the aforementioned problems in real-time. We formulate a trajectory optimization problem that jointly optimizes over the base pose and footholds, subject to a heightmap. To avoid converging into undesirable local optima, we deploy a graduated optimization technique. We embed a compact, contact-force free stability criterion that is compatible with the non-flat ground formulation. Direct collocation is used as transcription method, resulting in a non-linear optimization problem that can be solved online in less than ten milliseconds. To increase robustness in the presence of external disturbances, we close the tracking loop with a momentum observer. Our experiments demonstrate stair climbing, walking on step** stones, and over gaps, utilizing various dynamic gaits. △ Less

Submitted 5 July, 2022; v1 submitted 28 June, 2022; originally announced June 2022.

Comments: Accepted as regular T-RO paper

arXiv:2206.08077 [pdf, other]

Neural Scene Representation for Locomotion on Structured Terrain

Authors: David Hoeller, Nikita Rudin, Christopher Choy, Animashree Anandkumar, Marco Hutter

Abstract: We propose a learning-based method to reconstruct the local terrain for locomotion with a mobile robot traversing urban environments. Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the algorithm estimates the topography in the robot's vicinity. The raw measurements from these cameras are noisy and only provide partial and occluded observations that in man… ▽ More We propose a learning-based method to reconstruct the local terrain for locomotion with a mobile robot traversing urban environments. Using a stream of depth measurements from the onboard cameras and the robot's trajectory, the algorithm estimates the topography in the robot's vicinity. The raw measurements from these cameras are noisy and only provide partial and occluded observations that in many cases do not show the terrain the robot stands on. Therefore, we propose a 3D reconstruction model that faithfully reconstructs the scene, despite the noisy measurements and large amounts of missing data coming from the blind spots of the camera arrangement. The model consists of a 4D fully convolutional network on point clouds that learns the geometric priors to complete the scene from the context and an auto-regressive feedback to leverage spatio-temporal consistency and use evidence from the past. The network can be solely trained with synthetic data, and due to extensive augmentation, it is robust in the real world, as shown in the validation on a quadrupedal robot, ANYmal, traversing challenging settings. We run the pipeline on the robot's onboard low-power computer using an efficient sparse tensor implementation and show that the proposed method outperforms classical map representations. △ Less

Submitted 16 June, 2022; originally announced June 2022.

arXiv:2206.01192 [pdf, other]

Uniqueness and Complexity of Inverse MDP Models

Authors: Marcus Hutter, Steven Hansen

Abstract: What is the action sequence aa'a" that was likely responsible for reaching state s"' (from state s) in 3 steps? Addressing such questions is important in causal reasoning and in reinforcement learning. Inverse "MDP" models p(aa'a"|ss"') can be used to answer them. In the traditional "forward" view, transition "matrix" p(s'|sa) and policy π(a|s) uniquely determine "everything": the whole dynamics p… ▽ More What is the action sequence aa'a" that was likely responsible for reaching state s"' (from state s) in 3 steps? Addressing such questions is important in causal reasoning and in reinforcement learning. Inverse "MDP" models p(aa'a"|ss"') can be used to answer them. In the traditional "forward" view, transition "matrix" p(s'|sa) and policy π(a|s) uniquely determine "everything": the whole dynamics p(as'a's"a"...|s), and with it, the action-conditional state process p(s's"...|saa'a"), the multi-step inverse models p(aa'a"...|ss^i), etc. If the latter is our primary concern, a natural question, analogous to the forward case is to which extent 1-step inverse model p(a|ss') plus policy π(a|s) determine the multi-step inverse models or even the whole dynamics. In other words, can forward models be inferred from inverse models or even be side-stepped. This work addresses this question and variations thereof, and also whether there are efficient decision/inference algorithms for this. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 34 pages, 4 fgures

arXiv:2205.05477 [pdf, other]

Marsupial Walking-and-Flying Robotic Deployment for Collaborative Exploration of Unknown Environments

Authors: Paolo De Petris, Shehryar Khattak, Mihir Dharmadhikari, Gabriel Waibel, Huan Nguyen, Markus Montenegro, Nikhil Khedekar, Kostas Alexis, Marco Hutter

Abstract: This work contributes a marsupial robotic system-of-systems involving a legged and an aerial robot capable of collaborative map** and exploration path planning that exploits the heterogeneous properties of the two systems and the ability to selectively deploy the aerial system from the ground robot. Exploiting the dexterous locomotion capabilities and long endurance of quadruped robots, the mars… ▽ More This work contributes a marsupial robotic system-of-systems involving a legged and an aerial robot capable of collaborative map** and exploration path planning that exploits the heterogeneous properties of the two systems and the ability to selectively deploy the aerial system from the ground robot. Exploiting the dexterous locomotion capabilities and long endurance of quadruped robots, the marsupial combination can explore within large-scale and confined environments involving rough terrain. However, as certain types of terrain or vertical geometries can render any ground system unable to continue its exploration, the marsupial system can - when needed - deploy the flying robot which, by exploiting its 3D navigation capabilities, can undertake a focused exploration task within its endurance limitations. Focusing on autonomy, the two systems can co-localize and map together by sharing LiDAR-based maps and plan exploration paths individually, while a tailored graph search onboard the legged robot allows it to identify where and when the ferried aerial platform should be deployed. The system is verified within multiple experimental studies demonstrating the expanded exploration capabilities of the marsupial system-of-systems and facilitating the exploration of otherwise individually unreachable areas. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: 6 pages, 6 figures, Submitted to the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2022

arXiv:2204.12876 [pdf, other]

Elevation Map** for Locomotion and Navigation using GPU

Authors: Takahiro Miki, Lorenz Wellhausen, Ruben Grandia, Fabian Jenelten, Timon Homberger, Marco Hutter

Abstract: Perceiving the surrounding environment is crucial for autonomous mobile robots. An elevation map provides a memory-efficient and simple yet powerful geometric representation for ground robots. The robots can use this information for navigation in an unknown environment or perceptive locomotion control over rough terrain. Depending on the application, various post processing steps may be incorporat… ▽ More Perceiving the surrounding environment is crucial for autonomous mobile robots. An elevation map provides a memory-efficient and simple yet powerful geometric representation for ground robots. The robots can use this information for navigation in an unknown environment or perceptive locomotion control over rough terrain. Depending on the application, various post processing steps may be incorporated, such as smoothing, inpainting or plane segmentation. In this work, we present an elevation map** pipeline leveraging GPU for fast and efficient processing with additional features both for navigation and locomotion. We demonstrated our map** framework through extensive hardware experiments. Our map** software was successfully deployed for underground exploration during DARPA Subterranean Challenge and for various experiments of quadrupedal locomotion. △ Less

Submitted 27 April, 2022; originally announced April 2022.

arXiv:2203.15854 [pdf, other]

doi 10.1109/IROS47612.2022.9982190

Locomotion Policy Guided Traversability Learning using Volumetric Representations of Complex Environments

Authors: Jonas Frey, David Hoeller, Shehryar Khattak, Marco Hutter

Abstract: Despite the progress in legged robotic locomotion, autonomous navigation in unknown environments remains an open problem. Ideally, the navigation system utilizes the full potential of the robots' locomotion capabilities while operating within safety limits under uncertainty. The robot must sense and analyze the traversability of the surrounding terrain, which depends on the hardware, locomotion co… ▽ More Despite the progress in legged robotic locomotion, autonomous navigation in unknown environments remains an open problem. Ideally, the navigation system utilizes the full potential of the robots' locomotion capabilities while operating within safety limits under uncertainty. The robot must sense and analyze the traversability of the surrounding terrain, which depends on the hardware, locomotion control, and terrain properties. It may contain information about the risk, energy, or time consumption needed to traverse the terrain. To avoid hand-crafted traversability cost functions we propose to collect traversability information about the robot and locomotion policy by simulating the traversal over randomly generated terrains using a physics simulator. Thousand of robots are simulated in parallel controlled by the same locomotion policy used in reality to acquire 57 years of real-world locomotion experience equivalent. For deployment on the real robot, a sparse convolutional network is trained to predict the simulated traversability cost, which is tailored to the deployed locomotion policy, from an entirely geometric representation of the environment in the form of a 3D voxel-occupancy map. This representation avoids the need for commonly used elevation maps, which are error-prone in the presence of overhanging obstacles and multi-floor or low-ceiling scenarios. The effectiveness of the proposed traversability prediction network is demonstrated for path planning for the legged robot ANYmal in various indoor and natural environments. △ Less

Submitted 21 August, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: accepted for 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022)

Journal ref: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2203.14912 [pdf, other]

Advanced Skills through Multiple Adversarial Motion Priors in Reinforcement Learning

Authors: Eric Vollenweider, Marko Bjelonic, Victor Klemm, Nikita Rudin, Joonho Lee, Marco Hutter

Abstract: In recent years, reinforcement learning (RL) has shown outstanding performance for locomotion control of highly articulated robotic systems. Such approaches typically involve tedious reward function tuning to achieve the desired motion style. Imitation learning approaches such as adversarial motion priors aim to reduce this problem by encouraging a pre-defined motion style. In this work, we presen… ▽ More In recent years, reinforcement learning (RL) has shown outstanding performance for locomotion control of highly articulated robotic systems. Such approaches typically involve tedious reward function tuning to achieve the desired motion style. Imitation learning approaches such as adversarial motion priors aim to reduce this problem by encouraging a pre-defined motion style. In this work, we present an approach to augment the concept of adversarial motion prior-based RL to allow for multiple, discretely switchable styles. We show that multiple styles and skills can be learned simultaneously without notable performance differences, even in combination with motion data-free skills. Our approach is validated in several real-world experiments with a wheeled-legged quadruped robot showing skills learned from existing RL controllers and trajectory optimization, such as ducking and walking, and novel skills such as switching between a quadrupedal and humanoid configuration. For the latter skill, the robot is required to stand up, navigate on two wheels, and sit down. Instead of tuning the sit-down motion, we verify that a reverse playback of the stand-up movement helps the robot discover feasible sit-down behaviors and avoids tedious reward function tuning. △ Less

Submitted 23 March, 2022; originally announced March 2022.

arXiv:2203.07429 [pdf, other]

Planar Bipedal Locomotion with Nonlinear Model Predictive Control: Online Gait Generation using Whole-Body Dynamics

Authors: Manuel Y. Galliker, Noel Csomay-Shanklin, Ruben Grandia, Andrew J. Taylor, Farbod Farshidian, Marco Hutter, Aaron D. Ames

Abstract: The ability to generate dynamic walking in real-time for bipedal robots with input constraints and underactuation has the potential to enable locomotion in dynamic, complex and unstructured environments. Yet, the high-dimensional nature of bipedal robots has limited the use of full-order rigid body dynamics to gaits which are synthesized offline and then tracked online. In this work we develop an… ▽ More The ability to generate dynamic walking in real-time for bipedal robots with input constraints and underactuation has the potential to enable locomotion in dynamic, complex and unstructured environments. Yet, the high-dimensional nature of bipedal robots has limited the use of full-order rigid body dynamics to gaits which are synthesized offline and then tracked online. In this work we develop an online nonlinear model predictive control approach that leverages the full-order dynamics to realize diverse walking behaviors. Additionally, this approach can be coupled with gaits synthesized offline via a desired reference to enable a shorter prediction horizon and rapid online re-planning, bridging the gap between online reactive control and offline gait planning. We demonstrate the proposed method, both with and without an offline gait, on the planar robot AMBER-3M in simulation and on hardware. △ Less

Submitted 3 November, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: 8 pages, 6 figures, accepted to Humanoids 2022

arXiv:2203.05698 [pdf, other]

Learning-based Localizability Estimation for Robust LiDAR Localization

Authors: Julian Nubert, Etienne Walther, Shehryar Khattak, Marco Hutter

Abstract: LiDAR-based localization and map** is one of the core components in many modern robotic systems due to the direct integration of range and geometry, allowing for precise motion estimation and generation of high quality maps in real-time. Yet, as a consequence of insufficient environmental constraints present in the scene, this dependence on geometry can result in localization failure, happening… ▽ More LiDAR-based localization and map** is one of the core components in many modern robotic systems due to the direct integration of range and geometry, allowing for precise motion estimation and generation of high quality maps in real-time. Yet, as a consequence of insufficient environmental constraints present in the scene, this dependence on geometry can result in localization failure, happening in self-symmetric surroundings such as tunnels. This work addresses precisely this issue by proposing a neural network-based estimation approach for detecting (non-)localizability during robot operation. Special attention is given to the localizability of scan-to-scan registration, as it is a crucial component in many LiDAR odometry estimation pipelines. In contrast to previous, mostly traditional detection approaches, the proposed method enables early detection of failure by estimating the localizability on raw sensor measurements without evaluating the underlying registration optimization. Moreover, previous approaches remain limited in their ability to generalize across environments and sensor types, as heuristic-tuning of degeneracy detection thresholds is required. The proposed approach avoids this problem by learning from a collection of different environments, allowing the network to function over various scenarios. Furthermore, the network is trained exclusively on simulated data, avoiding arduous data collection in challenging and degenerate, often hard-to-access, environments. The presented method is tested during field experiments conducted across challenging environments and on two different sensor types without any modifications. The observed detection performance is on par with state-of-the-art methods after environment-specific threshold tuning. △ Less

Submitted 1 August, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: 8 pages, 7 figures, 4 tables

arXiv:2203.02221 [pdf, other]

Whole-Body MPC and Dynamic Occlusion Avoidance: A Maximum Likelihood Visibility Approach

Authors: Ibrahim Ibrahim, Farbod Farshidian, Jan Preisig, Perry Franklin, Paolo Rocco, Marco Hutter

Abstract: This paper introduces a novel approach for whole-body motion planning and dynamic occlusion avoidance. The proposed approach reformulates the visibility constraint as a likelihood maximization of visibility probability. In this formulation, we augment the primary cost function of a whole-body model predictive control scheme through a relaxed log barrier function yielding a relaxed log-likelihood m… ▽ More This paper introduces a novel approach for whole-body motion planning and dynamic occlusion avoidance. The proposed approach reformulates the visibility constraint as a likelihood maximization of visibility probability. In this formulation, we augment the primary cost function of a whole-body model predictive control scheme through a relaxed log barrier function yielding a relaxed log-likelihood maximization formulation of visibility probability. The visibility probability is computed through a probabilistic shadow field that quantifies point light source occlusions. We provide the necessary algorithms to obtain such a field for both 2D and 3D cases. We demonstrate 2D implementations of this field in simulation and 3D implementations through real-time hardware experiments. We show that due to the linear complexity of our shadow field algorithm to the map size, we can achieve high update rates, which facilitates onboard execution on mobile platforms with limited computational power. Lastly, we evaluate the performance of the proposed MPC reformulation in simulation for a quadrupedal mobile manipulator. △ Less

Submitted 4 March, 2022; originally announced March 2022.

arXiv:2203.01389 [pdf, other]

Graph-based Multi-sensor Fusion for Consistent Localization of Autonomous Construction Robots

Authors: Julian Nubert, Shehryar Khattak, Marco Hutter

Abstract: Enabling autonomous operation of large-scale construction machines, such as excavators, can bring key benefits for human safety and operational opportunities for applications in dangerous and hazardous environments. To facilitate robot autonomy, robust and accurate state-estimation remains a core component to enable these machines for operation in a diverse set of complex environments. In this wor… ▽ More Enabling autonomous operation of large-scale construction machines, such as excavators, can bring key benefits for human safety and operational opportunities for applications in dangerous and hazardous environments. To facilitate robot autonomy, robust and accurate state-estimation remains a core component to enable these machines for operation in a diverse set of complex environments. In this work, a method for multi-modal sensor fusion for robot state-estimation and localization is presented, enabling operation of construction robots in real-world scenarios. The proposed approach presents a graph-based prediction-update loop that combines the benefits of filtering and smoothing in order to provide consistent state estimates at high update rate, while maintaining accurate global localization for large-scale earth-moving excavators. Furthermore, the proposed approach enables a flexible integration of asynchronous sensor measurements and provides consistent pose estimates even during phases of sensor dropout. For this purpose, a dual-graph design for switching between two distinct optimization problems is proposed, directly addressing temporary failure and the subsequent return of global position estimates. The proposed approach is implemented on-board two Menzi Muck walking excavators and validated during real-world tests conducted in representative operational environments. △ Less

Submitted 2 March, 2022; originally announced March 2022.

Comments: 7 pages, 6 figures, 2 tables

arXiv:2203.00308 [pdf, other]

Collaborative Robot Map** using Spectral Graph Analysis

Authors: Lukas Bernreiter, Shehryar Khattak, Lionel Ott, Roland Siegwart, Marco Hutter, Cesar Cadena

Abstract: In this paper, we deal with the problem of creating globally consistent pose graphs in a centralized multi-robot SLAM framework. For each robot to act autonomously, individual onboard pose estimates and maps are maintained, which are then communicated to a central server to build an optimized global map. However, inconsistencies between onboard and server estimates can occur due to onboard odometr… ▽ More In this paper, we deal with the problem of creating globally consistent pose graphs in a centralized multi-robot SLAM framework. For each robot to act autonomously, individual onboard pose estimates and maps are maintained, which are then communicated to a central server to build an optimized global map. However, inconsistencies between onboard and server estimates can occur due to onboard odometry drift or failure. Furthermore, robots do not benefit from the collaborative map if the server provides no feedback in a computationally tractable and bandwidth-efficient manner. Motivated by this challenge, this paper proposes a novel collaborative map** framework to enable accurate global map** among robots and server. In particular, structural differences between robot and server graphs are exploited at different spatial scales using graph spectral analysis to generate necessary constraints for the individual robot pose graphs. The proposed approach is thoroughly analyzed and validated using several real-world multi-robot field deployments where we show improvements of the onboard system up to 90%. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: Accepted for IEEE International Conference on Robotics and Automation, 2022

arXiv:2202.12385 [pdf, other]

A Collision-Free MPC for Whole-Body Dynamic Locomotion and Manipulation

Authors: Jia-Ruei Chiu, Jean-Pierre Sleiman, Mayank Mittal, Farbod Farshidian, Marco Hutter

Abstract: In this paper, we present a real-time whole-body planner for collision-free legged mobile manipulation. We enforce both self-collision and environment-collision avoidance as soft constraints within a Model Predictive Control (MPC) scheme that solves a multi-contact optimal control problem. By penalizing the signed distances among a set of representative primitive collision bodies, the robot is abl… ▽ More In this paper, we present a real-time whole-body planner for collision-free legged mobile manipulation. We enforce both self-collision and environment-collision avoidance as soft constraints within a Model Predictive Control (MPC) scheme that solves a multi-contact optimal control problem. By penalizing the signed distances among a set of representative primitive collision bodies, the robot is able to safely execute a variety of dynamic maneuvers while preventing any self-collisions. Moreover, collision-free navigation and manipulation in both static and dynamic environments are made viable through efficient queries of distances and their gradients via a euclidean signed distance field. We demonstrate through a comparative study that our approach only slightly increases the computational complexity of the MPC planning. Finally, we validate the effectiveness of our framework through a set of hardware experiments involving dynamic mobile manipulation tasks with potential collisions, such as locomotion balancing with the swinging arm, weight throwing, and autonomous door opening. △ Less

Submitted 24 February, 2022; originally announced February 2022.

Comments: Accepted in IEEE International Conference on Robotics and Automation (ICRA) 2022 in Philadelphia (PA), USA

arXiv:2201.08117 [pdf, other]

doi 10.1126/scirobotics.abk2822

Learning robust perceptive locomotion for quadrupedal robots in the wild

Authors: Takahiro Miki, Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, Marco Hutter

Abstract: Legged robots that can operate autonomously in remote and hazardous environments will greatly increase opportunities for exploration into under-explored areas. Exteroceptive perception is crucial for fast and energy-efficient locomotion: perceiving the terrain before making contact with it enables planning and adaptation of the gait ahead of time to maintain speed and stability. However, utilizing… ▽ More Legged robots that can operate autonomously in remote and hazardous environments will greatly increase opportunities for exploration into under-explored areas. Exteroceptive perception is crucial for fast and energy-efficient locomotion: perceiving the terrain before making contact with it enables planning and adaptation of the gait ahead of time to maintain speed and stability. However, utilizing exteroceptive perception robustly for locomotion has remained a grand challenge in robotics. Snow, vegetation, and water visually appear as obstacles on which the robot cannot step~-- or are missing altogether due to high reflectance. Additionally, depth perception can degrade due to difficult lighting, dust, fog, reflective or transparent surfaces, sensor occlusion, and more. For this reason, the most robust and general solutions to legged locomotion to date rely solely on proprioception. This severely limits locomotion speed, because the robot has to physically feel out the terrain before adapting its gait accordingly. Here we present a robust and general solution to integrating exteroceptive and proprioceptive perception for legged locomotion. We leverage an attention-based recurrent encoder that integrates proprioceptive and exteroceptive input. The encoder is trained end-to-end and learns to seamlessly combine the different perception modalities without resorting to heuristics. The result is a legged locomotion controller with high robustness and speed. The controller was tested in a variety of challenging natural and urban environments over multiple seasons and completed an hour-long hike in the Alps in the time recommended for human hikers. △ Less

Submitted 20 January, 2022; originally announced January 2022.

Journal ref: Science Robotics, 19 Jan 2022, Vol 7, Issue 62

arXiv:2201.07067 [pdf, other]

CERBERUS: Autonomous Legged and Aerial Robotic Exploration in the Tunnel and Urban Circuits of the DARPA Subterranean Challenge

Authors: Marco Tranzatto, Frank Mascarich, Lukas Bernreiter, Carolina Godinho, Marco Camurri, Shehryar Khattak, Tung Dang, Victor Reijgwart, Johannes Loeje, David Wisth, Samuel Zimmermann, Huan Nguyen, Marius Fehr, Lukas Solanka, Russell Buchanan, Marko Bjelonic, Nikhil Khedekar, Mathieu Valceschini, Fabian Jenelten, Mihir Dharmadhikari, Timon Homberger, Paolo De Petris, Lorenz Wellhausen, Mihir Kulkarni, Takahiro Miki , et al. (16 additional authors not shown)

Abstract: Autonomous exploration of subterranean environments constitutes a major frontier for robotic systems as underground settings present key challenges that can render robot autonomy hard to achieve. This has motivated the DARPA Subterranean Challenge, where teams of robots search for objects of interest in various underground environments. In response, the CERBERUS system-of-systems is presented as a… ▽ More Autonomous exploration of subterranean environments constitutes a major frontier for robotic systems as underground settings present key challenges that can render robot autonomy hard to achieve. This has motivated the DARPA Subterranean Challenge, where teams of robots search for objects of interest in various underground environments. In response, the CERBERUS system-of-systems is presented as a unified strategy towards subterranean exploration using legged and flying robots. As primary robots, ANYmal quadruped systems are deployed considering their endurance and potential to traverse challenging terrain. For aerial robots, both conventional and collision-tolerant multirotors are utilized to explore spaces too narrow or otherwise unreachable by ground systems. Anticipating degraded sensing conditions, a complementary multi-modal sensor fusion approach utilizing camera, LiDAR, and inertial data for resilient robot pose estimation is proposed. Individual robot pose estimates are refined by a centralized multi-robot map optimization approach to improve the reported location accuracy of detected objects of interest in the DARPA-defined coordinate frame. Furthermore, a unified exploration path planning policy is presented to facilitate the autonomous operation of both legged and aerial robots in complex underground networks. Finally, to enable communication between the robots and the base station, CERBERUS utilizes a ground rover with a high-gain antenna and an optical fiber connection to the base station, alongside breadcrumbing of wireless nodes by our legged robots. We report results from the CERBERUS system-of-systems deployment at the DARPA Subterranean Challenge Tunnel and Urban Circuits, along with the current limitations and the lessons learned for the benefit of the community. △ Less

Submitted 18 January, 2022; originally announced January 2022.

Comments: 50 pages, 25 figures. Accepted at Field Robotics, 2021

arXiv:2201.03871 [pdf, other]

Combining Learning-based Locomotion Policy with Model-based Manipulation for Legged Mobile Manipulators

Authors: Yuntao Ma, Farbod Farshidian, Takahiro Miki, Joonho Lee, Marco Hutter

Abstract: Deep reinforcement learning produces robust locomotion policies for legged robots over challenging terrains. To date, few studies have leveraged model-based methods to combine these locomotion skills with the precise control of manipulators. Here, we incorporate external dynamics plans into learning-based locomotion policies for mobile manipulation. We train the base policy by applying a random wr… ▽ More Deep reinforcement learning produces robust locomotion policies for legged robots over challenging terrains. To date, few studies have leveraged model-based methods to combine these locomotion skills with the precise control of manipulators. Here, we incorporate external dynamics plans into learning-based locomotion policies for mobile manipulation. We train the base policy by applying a random wrench sequence on the robot base in simulation and adding the noisified wrench sequence prediction to the policy observations. The policy then learns to counteract the partially-known future disturbance. The random wrench sequences are replaced with the wrench prediction generated with the dynamics plans from model predictive control to enable deployment. We show zero-shot adaptation for manipulators unseen during training. On the hardware, we demonstrate stable locomotion of legged robots with the prediction of the external wrench. △ Less

Submitted 11 January, 2022; originally announced January 2022.

Comments: 8 pages, accepted for RA-L 2022

arXiv:2112.15121 [pdf, other]

On the Role of Neural Collapse in Transfer Learning

Authors: Tomer Galanti, András György, Marcus Hutter

Abstract: We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. In this paper we provide an… ▽ More We study the ability of foundation models to learn representations for classification that are transferable to new, unseen classes. Recent results in the literature show that representations learned by a single classifier over many classes are competitive on few-shot learning problems with representations learned by special-purpose algorithms designed for such problems. In this paper we provide an explanation for this behavior based on the recently observed phenomenon that the features learned by overparameterized classification networks show an interesting clustering property, called neural collapse. We demonstrate both theoretically and empirically that neural collapse generalizes to new samples from the training classes, and -- more importantly -- to new classes as well, allowing foundation models to provide feature maps that work well in transfer learning and, specifically, in the few-shot setting. △ Less

Submitted 3 January, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

arXiv:2112.14586 [pdf, other]

Isotuning With Applications To Scale-Free Online Learning

Authors: Laurent Orseau, Marcus Hutter

Abstract: We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms. Scale-free regret bounds must scale linearly with the maximum loss, both toward large losses and toward very small losses. Adaptive regret bounds demonstrate that an algorithm can take advantage of easy data and potentially have constant regret. We seek to develop fast… ▽ More We extend and combine several tools of the literature to design fast, adaptive, anytime and scale-free online learning algorithms. Scale-free regret bounds must scale linearly with the maximum loss, both toward large losses and toward very small losses. Adaptive regret bounds demonstrate that an algorithm can take advantage of easy data and potentially have constant regret. We seek to develop fast algorithms that depend on as few parameters as possible, in particular they should be anytime and thus not depend on the time horizon. Our first and main tool, isotuning, is a generalization of the idea of balancing the trade-off of the regret. We develop a set of tools to design and analyze such learning rates easily and show that they adapts automatically to the rate of the regret (whether constant, $O(\log T)$, $O(\sqrt{T})$, etc.) within a factor 2 of the optimal learning rate in hindsight for the same observed quantities. The second tool is an online correction, which allows us to obtain centered bounds for many algorithms, to prevent the regret bounds from being vacuous when the domain is overly large or only partially constrained. The last tool, null updates, prevents the algorithm from performing overly large updates, which could result in unbounded regret, or even invalid updates. We develop a general theory using these tools and apply it to several standard algorithms. In particular, we (almost entirely) restore the adaptivity to small losses of FTRL for unbounded domains, design and prove scale-free adaptive guarantees for a variant of Mirror Descent (at least when the Bregman divergence is convex in its second argument), extend Adapt-ML-Prod to scale-free guarantees, and provide several other minor contributions about Prod, AdaHedge, BOA and Soft-Bayes. △ Less

Submitted 11 July, 2023; v1 submitted 29 December, 2021; originally announced December 2021.

arXiv:2112.13386 [pdf, ps, other]

Reducing Planning Complexity of General Reinforcement Learning with Non-Markovian Abstractions

Authors: Sultan J. Majeed, Marcus Hutter

Abstract: The field of General Reinforcement Learning (GRL) formulates the problem of sequential decision-making from ground up. The history of interaction constitutes a "ground" state of the system, which never repeats. On the one hand, this generality allows GRL to model almost every domain possible, e.g.\ Bandits, MDPs, POMDPs, PSRs, and history-based environments. On the other hand, in general, the near… ▽ More The field of General Reinforcement Learning (GRL) formulates the problem of sequential decision-making from ground up. The history of interaction constitutes a "ground" state of the system, which never repeats. On the one hand, this generality allows GRL to model almost every domain possible, e.g.\ Bandits, MDPs, POMDPs, PSRs, and history-based environments. On the other hand, in general, the near-optimal policies in GRL are functions of complete history, which hinders not only learning but also planning in GRL. The usual way around for the planning part is that the agent is given a Markovian abstraction of the underlying process. So, it can use any MDP planning algorithm to find a near-optimal policy. The Extreme State Aggregation (ESA) framework has extended this idea to non-Markovian abstractions without compromising on the possibility of planning through a (surrogate) MDP. A distinguishing feature of ESA is that it proves an upper bound of $O\left(\varepsilon^{-A} \cdot (1-γ)^{-2A}\right)$ on the number of states required for the surrogate MDP (where $A$ is the number of actions, $γ$ is the discount-factor, and $\varepsilon$ is the optimality-gap) which holds \emph{uniformly} for \emph{all} domains. While the possibility of a universal bound is quite remarkable, we show that this bound is very loose. We propose a novel non-MDP abstraction which allows for a much better upper bound of $O\left(\varepsilon^{-1} \cdot (1-γ)^{-2} \cdot A \cdot 2^{A}\right)$. Furthermore, we show that this bound can be improved further to $O\left(\varepsilon^{-1} \cdot (1-γ)^{-2} \cdot \log^3 A \right)$ by using an action-sequentialization method. △ Less

Submitted 26 December, 2021; originally announced December 2021.

Showing 51–100 of 318 results for author: Hutter, M