\corrauth

Julius Jankowski, Idiap Research Institute, Rue Marconi 19, 1920 Martigny, Switzerland.

Robust Pushing: Exploiting Quasi-static Belief Dynamics and Contact-informed Optimization

Julius Jankowski1, 21, 2affiliationmark:    Lara Brudermüller33affiliationmark:    Nick Hawes33affiliationmark: and Sylvain Calinon1, 21, 2affiliationmark: 11affiliationmark: Idiap Research Institute, Switzerland
22affiliationmark: Ecole Polytechnique Fédérale de Lausanne, Switzerland
33affiliationmark: Oxford Robotics Institute, University of Oxford, UK
[email protected]
Abstract

Non-prehensile manipulation such as pushing is typically subject to uncertain, non-smooth dynamics. However, modeling the uncertainty of the dynamics typically results in intractable belief dynamics, making data-efficient planning under uncertainty difficult. This article focuses on the problem of efficiently generating robust open-loop pushing plans. First, we investigate how the belief over object configurations propagates through quasi-static contact dynamics. We exploit the simplified dynamics to predict the variance of the object configuration without sampling from a perturbation distribution. In a sampling-based trajectory optimization algorithm, the gain of the variance is constrained in order to enforce robustness of the plan. Second, we propose an informed trajectory sampling mechanism for drawing robot trajectories that are likely to make contact with the object. This sampling mechanism is shown to significantly improve chances of finding robust solutions, especially when making-and-breaking contacts is required. We demonstrate that the proposed approach is able to synthesize bi-manual pushing trajectories, resulting in successful long-horizon pushing maneuvers without exteroceptive feedback such as vision or tactile feedback. We furthermore deploy the proposed approach in a model-predictive control scheme, demonstrating additional robustness against unmodeled perturbations.

keywords:
Contact-rich Manipulation, Robust Manipulation, Open-loop Pushing, Stochastic Contact Dynamics.

1 Introduction

Enabling robots to physically interact with the world is a key challenge in robotics. In particular, the ability to move, reorient or localize objects through contact is a fundamental capability for robots to perform tasks in unstructured environments. Model-based planning techniques aim to synthesize robot control actions by using a given model to reason over anticipated outcomes of actions. However, there are two core challenges of model-based planning through contacts. First, contact dynamics are inherently discontinuous, i.e. a robot control action may have no effect on the state of a target object if the robot does not make contact. This translates into a vanishing gradient of the associated manipulation objective, making traditional gradient-based optimization difficult.

Refer to caption
Figure 1: Our model-based optimization approach synthesizes bi-manual pushing trajectories by controlling the variance of the object configuration, without explicitly modeling contact modes. We show that the robustness of the pushing trajectories is sufficient to successfully push an object over long horizons. The red cubes on the table and the yellow content of the can act as additional perturbations to the contact dynamics.

Second, the mechanics of non-prehensile contacts are subject to uncertainty. This is due to the fact that dynamic effects such as friction between surfaces are difficult to predict, especially when manipulating objects without knowing their physical properties, such as friction coefficients or mass distribution (Lynch and Mason (1995); Dogar and Srinivasa (2011); Ha et al. (2020)). Thus, when generating open-loop plans by assuming an accurate model of the dynamics, those plans are likely to fail as they do not take into account the underlying uncertainty. In Rodriguez (2021), this effect is described with an experiment of repeatedly picking-and-placing a queen chess piece, which fails if the queen is picked from the top. In contrast, picking the queen from the side stabilizes the repeated pick-and-place process. Uncertainty about physical parameters of objects is particularly unavoidable when robots interact with objects for the first time. This raises the question of how to let robots deal with this uncertainty autonomously. Deploying deterministic models in fast feedback loops is an implicit approach to compensating for uncertain dynamics, such as done in model-predictive control schemes. Yet they require accurate sensory measurements and achieve robustness against perturbations or modelling errors only by correcting the observed control errors. Modeling uncertainties and reasoning over an anticipated distribution of outcomes, i.e. a belief, is a more explicit way of co** with uncertain dynamics. Planning in belief-space enables open-loop execution of plans while still exhibiting robustness against modeled perturbations, thus not requiring sensors. Moreover, control errors are anticipated at planning time and can thus be prevented before they happen, potentially making manipulation under uncertainty more efficient. Closed-loop control in belief-space can then be achieved by continuously updating the belief based on observations and subsequent replanning, combining the benefits of reactivity with respect to unmodeled perturbations and the effectiveness of preventing control errors at planning time.

In this article, we present an approach for modeling the uncertainty of contact dynamics in order to synthesize robust manipulation behavior. Towards this end, we make the following contributions:

i) We study how a belief over an object’s configuration propagates through uncertain contact dynamics. We derive a prediction of the variance of object configurations upon contact, allowing us to anticipate the reduction of uncertainty without sampling perturbations to the contact dynamics.

ii) We introduce a contact prior for sampling candidate robot trajectories that are likely to create contacts between the robot and the object.

iii) Last, we propose a sampling-based trajectory optimization algorithm that constrains solutions to be robust based on the predicted variance (i)). The informed trajectory distribution (ii)) serves as a proposal distribution that guides the sampling-based optimization process.

In real-world experiments we demonstrate that the proposed approach is able to synthesize robust bi-manual pushing trajectories in only a few seconds of planning time, consisting of long-horizon (up to 100 seconds) open-loop pushing maneuvers that include making and breaking contacts (cf. Fig. 1). The experimental results show that the interplay of i) and ii) is crucial for synthesizing robust behavior. We furthermore show that the proposed planning algorithm is fast enough to be used in a model-predictive control loop, enabling a combination of reactivity and anticipatory robustness. To the best of our knowledge, the proposed algorithm is the first model-based planning approach that is able to synthesize robust plans for contact-rich manipulation without pre-defined manipulation primitives.

2 Related Work

2.1 Contact-rich Manipulation

To plan and control physical interactions such as contacts, model-based algorithms aim to exploit these models to synthesize manipulation behavior. Optimization-based approaches have shown successful manipulation planning capabilities when the desired behavior can be found via local optimization (Hogan and Rodriguez (2020); Aydinoglu and Posa (2022); Aydinoglu et al. (2022); Cleac’h et al. (2021)) or when contact modes can be represented as a small set of discrete decision variables (Marcucci et al. (2017); Migimatsu and Bohg (2020); Toussaint et al. (2020, 2022); Chen et al. (2021)). In the case of local optimization, however, most methods rely on gradients which require a smoothed approximation of the contact dynamics. Moreover, they rely on good initializations, as control actions that do not result in the robot making contact with the object will yield a vanishing gradient of the associated manipulation objective. In contrast, when reformulating the manipulation problem as finding the optimal sequence of contact modes, the difficulty lies in an combinatorial explosion when frequent switching of contact modes is necessary, i.e. making-and-breaking contacts, or if multiple contact points are involved.

Recently, sampling-based planning and control algorithms have been explored for contact-rich manipulation tasks as a gradient-free approach to cope with discontinuous cost landscapes. This offers a framework for combining stochastic sampling and optimization, supporting the search over contact-rich manipulation actions. In Pang et al. (2023), the authors propose to use a quasi-dynamic contact model to efficiently simulate physical interactions during manipulation. Plans are then synthesized with an adaptation of the RRT algorithm. While the planning algorithm is able to generate contact-rich manipulation plans including making-and-breaking contacts, the underlying contact model is assumed to be accurate, resulting in plans that are not robust to inaccuracies in the contact model. In another stream of research on controlling contact-rich manipulation, sampling-based optimization has been explored in model-predictive control schemes to exploit parallel computing opportunities (Bhardwaj et al. (2022); Howell et al. (2022); Jankowski et al. (2023)). In this article, we build upon our previous work on via-point-based stochastic trajectory optimization (VP-STO) as a tool for efficiently optimizing robot trajectories without requiring gradients of the manipulation cost with respect to the manipulation action (Jankowski et al. (2023)).

2.2 Robust Manipulation

Refer to caption
Figure 2: Analogies between active robot localization and robust manipulation. Both can be formulated as a belief space planning problem, where the objective is to decrease the uncertainty of the belief over time. While active localization approaches are based on observation models, robust manipulation exploits favorable contact dynamics to achieve the same goal.

In robotic manipulation, uncertainty is a dominant aspect that arises from the complex and hybrid nature of modeling physical interactions (Rodriguez (2021)). Robust manipulation aims at exploiting particular contact configurations that naturally reduce errors in manipulation tasks. In Erdmann and Mason (1988), the authors exploit the geometry of a tray with physical boundaries to reorient a tool without sensory feedback. Lynch and Mason (1995) exploit line contacts between a robot and polygonal objects to generate robust pushing plans. In Dogar and Srinivasa (2011), the geometry of a half-open gripper is exploited to actively funnel the probability distribution over object locations between the two fingers and the palm with a push-grasp. Ha et al. (2020) model the uncertainty in an underactuated system with additive noise on the robot control actions and approximate the probability distribution over state trajectories with a normal distribution around a particular contact mode. Logic geometric programming is then used to find the most robust contact mode from a set of pre-defined candidate modes. These strategies have in common that favorable contact geometries are used to create natural contact dynamics that effectively decrease the uncertainty of the manipulation system over time without the need for sensors. However, the above approaches rely on a pre-programmed set of robust behaviors that are tailored to the robot contact geometry.

In this article, we achieve robust manipulation as the result of optimizing over an object belief. Belief-space planning in robotics is concerned with modeling the state of the robot and its work space via probability distributions, e.g. for active localization. Sensor measurements of the robot are then used to reject or confirm possible states based on an observation model to reduce uncertainty. Fig. 2 illustrates the analogy of active localization and robust manipulation. Both can be formulated as a belief-space planning problem, where the common objective is to decrease the state-variance over time. While the goal is to minimize the uncertainty of the robot’s own state in localization problems, robust manipulation aims to minimize the uncertainty of the object state. An even more important difference lies in the way the belief is updated over time. Instead of using observation models to rule out possible states of the robot, robust manipulation exploits natural invariances in the contact dynamics to let possible object states converge to a single state.

2.3 Modeling Uncertainty in Contact Dynamics

Modeling the uncertainty in contact dynamics is a key aspect for synthesizing robust robot behavior. In many reinforcement learning approaches (Haarnoja et al. (2019); Schulman et al. (2015, 2017)), domain randomization is a natural way of informing the skill learning process about all the possibilities that may be encountered when moving from simulation to reality. Domain randomization may include probability distributions over parameters of the dynamics model such as friction coefficients, object mass or object geometry (Andrychowicz et al. (2020); Muratore et al. (2022)). By simulating a large number of combinations of policy samples and domain samples, the policy ideally converges to a behavior that is robust against the uncertainty modeled through domain randomization. We believe that domain randomization as an interface for modeling uncertainties in physical interactions is the key reason why reinforcement learning techniques show more advanced manipulation skills in the real world such as in-hand manipulation (Handa et al. (2023)) compared to model-based planning and control techniques that typically assume an accurate model. Hence, we aim to bridge the gap between modeling uncertainty in contact dynamics and ad-hoc planning and control techniques that do not require data-inefficient offline training cycles by optimizing over statistical properties of a belief without sampling perturbations.

3 Problem Formulation & Approach

We consider an underactuated manipulation system with ndofrsuperscriptsubscript𝑛dof𝑟n_{\mathrm{dof}}^{r}italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT actuated degrees of freedom for the robot, and ndofosuperscriptsubscript𝑛dof𝑜n_{\mathrm{dof}}^{o}italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT unactuated degrees of freedom for the object to be manipulated. We are interested in controlling the configuration of the object 𝒒ondofosuperscript𝒒𝑜superscriptsuperscriptsubscript𝑛dof𝑜\bm{q}^{o}\in\mathbb{R}^{n_{\mathrm{dof}}^{o}}bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT by executing robot control commands 𝒖ndofr𝒖superscriptsuperscriptsubscript𝑛dof𝑟\bm{u}\in\mathbb{R}^{n_{\mathrm{dof}}^{r}}bold_italic_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. The initial object configuration and the contact dynamics are subject to uncertainty, such that the object configuration at time step k𝑘kitalic_k is a random variable that is described through the belief bk=p(𝒒ko)subscript𝑏𝑘𝑝subscriptsuperscript𝒒𝑜𝑘b_{k}=p(\bm{q}^{o}_{k})italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_p ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ). We formulate the planning problem as a stochastic optimal control problem over a horizon of K𝐾Kitalic_K time steps:

min𝒖0:K1EbK[(𝒒Ko𝒒deso)(𝒒Ko𝒒deso)].subscriptsubscript𝒖:0𝐾1subscriptEsubscript𝑏𝐾delimited-[]superscriptsubscriptsuperscript𝒒𝑜𝐾subscriptsuperscript𝒒𝑜destopsubscriptsuperscript𝒒𝑜𝐾subscriptsuperscript𝒒𝑜des\min_{\bm{u}_{0:K-1}}\mathrm{E}_{b_{K}}\left[\left(\bm{q}^{o}_{K}-\bm{q}^{o}_{% \mathrm{des}}\right)^{\scriptscriptstyle\top}\left(\bm{q}^{o}_{K}-\bm{q}^{o}_{% \mathrm{des}}\right)\right].roman_min start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT 0 : italic_K - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT roman_E start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT - bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT - bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT ) ] . (1)

We are optimizing for an open-loop control trajectory 𝒖0:K1subscript𝒖:0𝐾1\bm{u}_{0:K-1}bold_italic_u start_POSTSUBSCRIPT 0 : italic_K - 1 end_POSTSUBSCRIPT such that the expected control error at time step K𝐾Kitalic_K is minimal. In order to evaluate the expected control error in (1) given a control trajectory, the belief over object positions is to be propagated through the stochastic contact dynamics. However, as contact dynamics are inherently non-smooth, propagating the belief through contacts in closed-form is intractable. Another way to explicitly evaluate the expected cost is to sample a large number of stochastic rollouts. Yet, this is problematic due to the fact that the evaluation of a single robot control trajectory becomes not only inefficient, but also stochastic.

A key to our approach is the separation of the stochastic optimal control problem in (1) into a mean control problem and a variance control problem (Okamoto et al. (2018); Shirai et al. (2023)). The objective of the stochastic optimal control problem is separated as follows:

min𝒖0:K1(EbK[𝒒Ko]𝒒deso)(EbK[𝒒Ko]𝒒deso)+VbK[𝒒Ko].\min_{\bm{u}_{0:K-1}}\left(\mathrm{E}_{b_{K}}\left[\bm{q}^{o}_{K}\right]-\bm{q% }^{o}_{\mathrm{des}}\right)^{\scriptscriptstyle\top}\left(\mathrm{E}_{b_{K}}% \left[\bm{q}^{o}_{K}\right]-\bm{q}^{o}_{\mathrm{des}}\right)+\mathrm{V}_{b_{K}% }\left[\bm{q}^{o}_{K}\right].roman_min start_POSTSUBSCRIPT bold_italic_u start_POSTSUBSCRIPT 0 : italic_K - 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( roman_E start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] - bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( roman_E start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] - bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT ) + roman_V start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] . (2)

We provide a more detailed derivation from (1) to (2) in the appendix. The first term in (2) corresponds to the mean control problem, which refers to planning for the expected object configuration EbK[𝒒Ko]subscriptEsubscript𝑏𝐾delimited-[]subscriptsuperscript𝒒𝑜𝐾\mathrm{E}_{b_{K}}\left[\bm{q}^{o}_{K}\right]roman_E start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ], i.e. the configuration that is obtained if the mean initial configuration and the nominal contact dynamics are used. Evaluating the mean control objective does not require the propagation of the belief or the sampling of large numbers of stochastic rollouts of the state. In other words, the mean control problem resembles a deterministic optimal control problem that assumes an accurate dynamics model, while the stochasticity in the original problem is captured by the variance control problem represented by the second term in (2). Note that the variance control problem, i.e. minimizing VbK[𝒒Ko]subscriptVsubscript𝑏𝐾delimited-[]subscriptsuperscript𝒒𝑜𝐾\mathrm{V}_{b_{K}}\left[\bm{q}^{o}_{K}\right]roman_V start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ], is independent of the desired object configuration 𝒒desosubscriptsuperscript𝒒𝑜des\bm{q}^{o}_{\mathrm{des}}bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT. The variance of the object configuration is defined as:

Vbk[𝒒ko]=Ebk[(𝒒koEbk[𝒒ko])(𝒒koEbk[𝒒ko])].subscriptVsubscript𝑏𝑘delimited-[]subscriptsuperscript𝒒𝑜𝑘subscriptEsubscript𝑏𝑘delimited-[]superscriptsubscriptsuperscript𝒒𝑜𝑘subscriptEsubscript𝑏𝑘delimited-[]subscriptsuperscript𝒒𝑜𝑘topsubscriptsuperscript𝒒𝑜𝑘subscriptEsubscript𝑏𝑘delimited-[]subscriptsuperscript𝒒𝑜𝑘\mathrm{V}_{b_{k}}[\bm{q}^{o}_{k}]=\mathrm{E}_{b_{k}}\left[\left(\bm{q}^{o}_{k% }-\mathrm{E}_{b_{k}}[\bm{q}^{o}_{k}]\right)^{\scriptscriptstyle\top}\left(\bm{% q}^{o}_{k}-\mathrm{E}_{b_{k}}[\bm{q}^{o}_{k}]\right)\right].roman_V start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] = roman_E start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - roman_E start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - roman_E start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] ) ] . (3)

Thus, the stochastic optimal control problem can be solved by controlling the nominal object configuration while also steering its variance. The variance is a scalar measure of the second statistical moment of the belief and can be interpreted as the uncertainty that the system has about the object’s configuration. However, note that computing the variance at time step K𝐾Kitalic_K still requires the propagation of the belief or a Monte-Carlo approximation of the stochastic dynamics. In Sec. 4, we present an approach which approximates the variance of the object configuration over time without sampling perturbations to the contact dynamics, resulting in efficient and deterministic rollouts of the nominal dynamics and the approximated variance. In Sec. 5 and Sec. 6, we exploit the approximated variance in a sampling-based trajectory optimization scheme for synthesizing robust robot trajectories.

4 Belief Dynamics through Contacts

Given a robot control action and a belief over an object’s configuration, we are interested in predicting the mean and the variance of the object’s configuration at the consecutive time step. For this, we first introduce a quasi-static model for the object dynamics, i.e. predicting the nominal object configuration given a robot configuration. We then model the uncertainty of the object dynamics through additive perturbations on the object configuration if the robot makes contact with the object. Last, we derive a deterministic prediction of the variance given the current belief, a control action and the statistical properties of the perturbations.

4.1 Stochastic Quasi-Static Dynamics for Pushing

Quasi-static and quasi-dynamic models have been used to simplify the prediction of slow physical interactions between robots and objects (Mason (2001); Koval et al. (2016); Hogan and Rodriguez (2020); Cheng et al. (2021); Pang (2021); Pang et al. (2023)). Both classes of models assume that effects that are related to velocities and accelerations can be neglected as they do not affect the outcome of the prediction. This limits the range of applications to slow interactions such as pushing tasks, insertion tasks or in-hand manipulation. Yet, the benefits of quasi-static and quasi-dynamic models are the lower dimensionality of the system state, i.e. half the number of states compared to second-order dynamics models (e.g. Todorov et al. (2012)), and the lower temporal resolution required to compute stable predictions of the system state. Both aspects effectively allow for faster simulated rollouts of robot plans and thus for more efficient model-based trajectory optimization. Such models oftentimes denote the state as 𝒒=(𝒒r,𝒒o)𝒒superscript𝒒𝑟superscript𝒒𝑜\bm{q}=\left(\bm{q}^{r},\bm{q}^{o}\right)bold_italic_q = ( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ). It decomposes into the position of the actuated degrees of freedom of the robot 𝒒rndofrsuperscript𝒒𝑟superscriptsuperscriptsubscript𝑛dof𝑟\bm{q}^{r}\in\mathbb{R}^{n_{\mathrm{dof}}^{r}}bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT, and the position of the unactuated degrees of freedom of the object(s) 𝒒ondofosuperscript𝒒𝑜superscriptsuperscriptsubscript𝑛dof𝑜\bm{q}^{o}\in\mathbb{R}^{n_{\mathrm{dof}}^{o}}bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. The discretized dynamics are consequently given in the form of

(𝒒+r𝒒+o)=𝒇((𝒒r𝒒o),𝒖).matrixsubscriptsuperscript𝒒𝑟subscriptsuperscript𝒒𝑜𝒇matrixsuperscript𝒒𝑟superscript𝒒𝑜𝒖\begin{pmatrix}\bm{q}^{r}_{+}\\ \bm{q}^{o}_{+}\end{pmatrix}=\bm{f}\left(\begin{pmatrix}\bm{q}^{r}\\ \bm{q}^{o}\end{pmatrix},\bm{u}\right).( start_ARG start_ROW start_CELL bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) = bold_italic_f ( ( start_ARG start_ROW start_CELL bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) , bold_italic_u ) . (4)

𝒒+rsubscriptsuperscript𝒒𝑟\bm{q}^{r}_{+}bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT, 𝒒+osubscriptsuperscript𝒒𝑜\bm{q}^{o}_{+}bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT are the predicted robot and object configurations at the consecutive time step, respectively. In Pang et al. (2023), the input 𝒖ndofr𝒖superscriptsuperscriptsubscript𝑛dof𝑟\bm{u}\in\mathbb{R}^{n_{\mathrm{dof}}^{r}}bold_italic_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT is defined as the commanded robot configuration. The robot is assumed to be controlled by a low-level impedance controller (Hogan (1984)), such that the robot can be modeled as an impedance for the contact dynamics. In this article, we further simplify the quasi-dynamic contact dynamics in Pang et al. (2023) by assuming infinite stiffness of the controlled robot. As a consequence, contacts with objects are assumed to not affect the robot state itself, but only the configuration of the object. This assumption is particularly realistic when pushing lightweight objects with a stiff robot impedance controller. The high-stiffness assumption induces that the robot is able to reach a desired robot position even when being in contact with an object, i.e. 𝒒+r=𝒖subscriptsuperscript𝒒𝑟𝒖\bm{q}^{r}_{+}=\bm{u}bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = bold_italic_u. The benefit of modeling the contact interactions in such a way is that the joint robot-object dynamics reduce to solely object dynamics. This further simplifies the simulation of contacts, as the non-penetration constraint in the dynamics now only applies to the object. This turns the quasi-dynamic contact dynamics into quasi-static contact dynamics, as we remove the dependency on time. We will further exploit this decoupling of the dynamics in the subsequent derivation of transition probabilities for the object. Yet, note that this assumption breaks when contacts between the robot and the environment significantly affect the robot state, e.g. when trying to move into a solid wall. Furthermore, enclosing grasps, i.e. making contact with one object from two opposing sides, will also break the high-stiffness assumption as the two contact points will affect each other. Due to these limitations, we focus the experiments on planar pushing under uncertainty.

4.1.1 Nominal Object Dynamics.

The discretized quasi-static contact dynamics of the object are

(𝒒+r𝒒+o)=(𝒖𝒒o+δ𝒒o);matrixsubscriptsuperscript𝒒𝑟subscriptsuperscript𝒒𝑜matrix𝒖superscript𝒒𝑜𝛿superscript𝒒𝑜\begin{pmatrix}\bm{q}^{r}_{+}\\ \bm{q}^{o}_{+}\end{pmatrix}=\begin{pmatrix}\bm{u}\\ \bm{q}^{o}+\delta\bm{q}^{o}\end{pmatrix};( start_ARG start_ROW start_CELL bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) = ( start_ARG start_ROW start_CELL bold_italic_u end_CELL end_ROW start_ROW start_CELL bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT + italic_δ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) ; (5a)
δ𝒒o=argminδ𝒒~oδ𝒒~o𝑴(𝒒o)δ𝒒~o,𝛿superscript𝒒𝑜subscriptargmin𝛿superscript~𝒒𝑜𝛿superscriptsuperscript~𝒒𝑜top𝑴superscript𝒒𝑜𝛿superscript~𝒒𝑜\delta\bm{q}^{o}=\operatorname*{arg\,min}_{\delta\tilde{\bm{q}}^{o}}{\delta% \tilde{\bm{q}}^{o}}^{\scriptscriptstyle\top}\bm{M}(\bm{q}^{o})\delta\tilde{\bm% {q}}^{o},italic_δ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_δ over~ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_δ over~ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_M ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) italic_δ over~ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , (5b)
s.t.d(𝒖,𝒒o+δ𝒒~o)0.\mathrm{s.t.}\quad d(\bm{u},\bm{q}^{o}+\delta\tilde{\bm{q}}^{o})\geq 0.roman_s . roman_t . italic_d ( bold_italic_u , bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT + italic_δ over~ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) ≥ 0 . (5c)

The control inputs 𝒖ndofr𝒖superscriptsuperscriptsubscript𝑛dof𝑟\bm{u}\in\mathbb{R}^{n_{\mathrm{dof}}^{r}}bold_italic_u ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT to the system dynamics are defined by the commanded joint positions of the robot. In (5b), 𝑴(𝒒o)𝑴superscript𝒒𝑜\bm{M}(\bm{q}^{o})bold_italic_M ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) is the inertia matrix of the object with respect to its current configuration 𝒒osuperscript𝒒𝑜\bm{q}^{o}bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT. Thus, the objective in (5b) aims to minimize the work required to overcome the friction between the object surface and the surface of the environment, e.g. the table the object is placed on. d(𝒒r,𝒒o)𝑑superscript𝒒𝑟superscript𝒒𝑜d(\bm{q}^{r},\bm{q}^{o})italic_d ( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) measures the shortest signed distance between the robot and the object and thus (5c) incorporates the non-penetration constraint, i.e. the robot does not penetrate a rigid object. As the quasi-static robot dynamics in (5a) are decoupled from the object configuration, the non-penetration constraint in (5c) does not depend on the previous robot state, but only on the control input 𝒖𝒖\bm{u}bold_italic_u. This simplification allows us to represent the system dynamics with the next desired robot position as the control input and the object position as the sole state variable. The quasi-static contact dynamics of the object in (5a)–(5c) can thus be summarized into the nominal forward dynamics of the object configuration:

𝒒+o=𝒇(𝒒o,𝒖).subscriptsuperscript𝒒𝑜𝒇superscript𝒒𝑜𝒖\bm{q}^{o}_{+}=\bm{f}(\bm{q}^{o},\bm{u}).bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) . (6)

In the following, we exploit the simplified mathematical structure of the quasi-static contact dynamics to model uncertainty and to analyze how object belief states propagate through the contact dynamics.

4.1.2 Noisy Object Dynamics.

We propose to model the uncertainty in the contact dynamics as additive noise that acts as a perturbation to the nominal quasi-static contact dynamics in (6). Hence, the noisy object dynamics are given by

𝒒+o=𝒇(𝒒o,𝒖)+η𝒘.subscriptsuperscript𝒒𝑜𝒇superscript𝒒𝑜𝒖𝜂𝒘\bm{q}^{o}_{+}=\bm{f}(\bm{q}^{o},\bm{u})+\eta\bm{w}.bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) + italic_η bold_italic_w . (7)

The perturbation 𝒘𝒘\bm{w}bold_italic_w is assumed to be sampled from a probability distribution p𝒘subscript𝑝𝒘p_{\bm{w}}italic_p start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT such that the resulting object configuration satisfies the non-penetration constraint in (5c). The noise coefficient η𝜂\etaitalic_η encodes if the robot is in contact with the object and is defined as

η={0ifd(𝒖,𝒒o)>0(nocontact),1else(contact).𝜂cases0if𝑑𝒖superscript𝒒𝑜0nocontactotherwise1elsecontactotherwise\eta=\begin{cases}0\quad\quad\mathrm{if}\,d(\bm{u},\bm{q}^{o})>0\;\mathrm{(no% \;contact)},\\ 1\hfill\mathrm{else}\;\mathrm{(contact)}.\end{cases}italic_η = { start_ROW start_CELL 0 roman_if italic_d ( bold_italic_u , bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) > 0 ( roman_no roman_contact ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 1 roman_else ( roman_contact ) . end_CELL start_CELL end_CELL end_ROW (8)

Hence, the model only considers uncertainty in the prediction of the object configuration when the robot is manipulating the object. The presented formulation of the noisy discrete-time contact dynamics in (7) captures two distinct modes: one when the robot is in contact with the object (η=1𝜂1\eta=1italic_η = 1) and another when the robot is not in contact with the object (η=0𝜂0\eta=0italic_η = 0). Note that we do not model different contact modes. The contact mode η=1𝜂1\eta=1italic_η = 1 captures arbitrary numbers of contact points with arbitrary contact geometries. The object dynamics can thus be formulated with

𝒒+o={𝒒oifη=0(nocontact),𝒇(𝒒o,𝒖)+𝒘else(contact).subscriptsuperscript𝒒𝑜casessuperscript𝒒𝑜if𝜂0nocontactotherwise𝒇superscript𝒒𝑜𝒖𝒘elsecontactotherwise\bm{q}^{o}_{+}=\begin{cases}\bm{q}^{o}\quad\quad\quad\mathrm{if}\,\eta=0\;% \mathrm{(no\;contact)},\\ \bm{f}(\bm{q}^{o},\bm{u})+\bm{w}\hfill\mathrm{else}\;\mathrm{(contact)}.\end{cases}bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = { start_ROW start_CELL bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT roman_if italic_η = 0 ( roman_no roman_contact ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) + bold_italic_w roman_else ( roman_contact ) . end_CELL start_CELL end_CELL end_ROW (9)

The no-contact mode entails that the object configuration does not change. This mode is not subject to uncertainty.

4.2 Object Belief Dynamics

Refer to caption
Figure 3: Belief dynamics through contact in a one-dimensional example. a) Illustration of object samples from the uniformly distributed initial belief (orange). The ground truth object is depicted in non-transparent orange. b) After executing a push from left to right, all samples that were on the left of the push were pushed by the robot. c) The probability mass that was on the left-hand side of the push (light-blue) is now concentrated in a distribution at the contact point according to the perturbation distribution (blue). The probability mass on the right-hand side of the push does not change (blue). d) The variance of the predicted object position is therefore a function of the control action, where a robust control action may decrease the variance over time.

In order to analyze how the belief about an object configuration and its associated uncertainty propagate through contact dynamics over time, we use the notation of probabilistic state transitions. Building upon the derived contact dynamics in (7), the state of the stochastic system is the object configuration and the action is the next desired robot configuration. Accordingly, we denote the state transition probability with

𝒒+op(|𝒒o,𝒖),\bm{q}^{o}_{+}\sim p(\cdot|\bm{q}^{o},\bm{u}),bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ∼ italic_p ( ⋅ | bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) , (10)

describing the probability distribution over object configurations at the next time step given the commanded robot configuration as well as the object configuration. Note that the random perturbation 𝒘𝒘\bm{w}bold_italic_w that directly acts on the object dynamics in (7) is captured by the stochasticity of the transition probability in (10). We furthermore denote the probability distribution over object configurations as belief b=p(𝒒o)𝑏𝑝superscript𝒒𝑜b=p(\bm{q}^{o})italic_b = italic_p ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ). Given the transition probability in (10), a control action 𝒖𝒖\bm{u}bold_italic_u and the object belief b𝑏bitalic_b, the resulting belief b+subscript𝑏b_{+}italic_b start_POSTSUBSCRIPT + end_POSTSUBSCRIPT can be predicted by marginalizing over the initial object configuration:

b+=p(𝒒+o|𝒖)=𝒬op(𝒒+o|𝒒o,𝒖)bd𝒒o.subscript𝑏𝑝conditionalsubscriptsuperscript𝒒𝑜𝒖subscriptsuperscript𝒬𝑜𝑝conditionalsubscriptsuperscript𝒒𝑜superscript𝒒𝑜𝒖𝑏differential-dsuperscript𝒒𝑜b_{+}=p(\bm{q}^{o}_{+}|\bm{u})=\int_{\mathcal{Q}^{o}}p(\bm{q}^{o}_{+}|\bm{q}^{% o},\bm{u})\;b\;\mathrm{d}\bm{q}^{o}.italic_b start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = italic_p ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT | bold_italic_u ) = ∫ start_POSTSUBSCRIPT caligraphic_Q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_POSTSUBSCRIPT italic_p ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT | bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) italic_b roman_d bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT . (11)

This equation represents the belief dynamics, which can be used not only for predicting the most likely object configuration, but also the variance of the belief after executing action 𝒖𝒖\bm{u}bold_italic_u. However, solving the integral in (11) is typically intractable for non-linear dynamics and non-Gaussian beliefs. Yet, in the following, we start by showing an example problem for which we can obtain a closed-form solution of (11), followed by the introduction of a general approximation.

Example 1. (1D Box-Pushing). We illustrate the effects of contacts on the belief with a one-dimensional pushing example as illustrated in Fig. 3. The hand (i.e. the robot) can be controlled directly while the box on the table (i.e. the object) can only be moved by making contact. The quasi-static contact dynamics in this example simplify to

q+o={qoifqo>u(nocontact),u+welse(contact).subscriptsuperscript𝑞𝑜casessuperscript𝑞𝑜ifsuperscript𝑞𝑜𝑢nocontactotherwise𝑢𝑤elsecontactotherwiseq^{o}_{+}=\begin{cases}q^{o}\quad\,\mathrm{if}\,q^{o}>u\;\mathrm{(no\;contact)% },\\ u+w\quad\mathrm{else}\hfill\mathrm{(contact)}.\end{cases}italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = { start_ROW start_CELL italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT roman_if italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT > italic_u ( roman_no roman_contact ) , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL italic_u + italic_w roman_else ( roman_contact ) . end_CELL start_CELL end_CELL end_ROW (12)

These one-dimensional piece-wise linear dynamics move the object to the right-hand side of the contact. The object does not move if no contact has been made. In this example, we consider the perturbation of the contact dynamics to be uniformly distributed, i.e. w𝒰[0,α]similar-to𝑤subscript𝒰0𝛼w\sim\mathcal{U}_{[0,\alpha]}italic_w ∼ caligraphic_U start_POSTSUBSCRIPT [ 0 , italic_α ] end_POSTSUBSCRIPT. Sub-figures a) and b) illustrate a particular instance of the object being at qosuperscript𝑞𝑜q^{o}italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT, which is then moved to q+osubscriptsuperscript𝑞𝑜q^{o}_{+}italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT due to the robot reaching the commanded position u𝑢uitalic_u. However, in this example the initial box position is subject to uncertainty. The initial belief b𝑏bitalic_b over box positions being given as a uniform distribution on an interval between box position q¯osuperscript¯𝑞𝑜\underline{q}^{o}under¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT and q¯osuperscript¯𝑞𝑜\bar{q}^{o}over¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT, respectively. The interval is indicated by the vertical dashed lines in all sub-figures. The initial belief is

b=𝒰[q¯o,q¯o](qo).𝑏subscript𝒰superscript¯𝑞𝑜superscript¯𝑞𝑜superscript𝑞𝑜b=\mathcal{U}_{[\underline{q}^{o},\bar{q}^{o}]}(q^{o}).italic_b = caligraphic_U start_POSTSUBSCRIPT [ under¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , over¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT ( italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) . (13)

Given a control action u[q¯o,q¯o]𝑢superscript¯𝑞𝑜superscript¯𝑞𝑜u\in[\underline{q}^{o},\bar{q}^{o}]italic_u ∈ [ under¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , over¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] as a commanded robot position, we can now predict the belief after contact by solving (11). While solving the integral in (11) is typically intractable, this example has a closed-form solution that is given by

b+=uq¯oq¯oq¯o𝒰[u,u+α](q+o)+q¯ouq¯oq¯o𝒰[u,q¯o](q+o).subscript𝑏𝑢superscript¯𝑞𝑜superscript¯𝑞𝑜superscript¯𝑞𝑜subscript𝒰𝑢𝑢𝛼subscriptsuperscript𝑞𝑜superscript¯𝑞𝑜𝑢superscript¯𝑞𝑜superscript¯𝑞𝑜subscript𝒰𝑢superscript¯𝑞𝑜subscriptsuperscript𝑞𝑜b_{+}=\frac{u-\underline{q}^{o}}{\bar{q}^{o}-\underline{q}^{o}}\mathcal{U}_{[u% ,u+\alpha]}(q^{o}_{+})+\frac{\bar{q}^{o}-u}{\bar{q}^{o}-\underline{q}^{o}}% \mathcal{U}_{[u,\bar{q}^{o}]}(q^{o}_{+}).italic_b start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = divide start_ARG italic_u - under¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_ARG start_ARG over¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT - under¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_ARG caligraphic_U start_POSTSUBSCRIPT [ italic_u , italic_u + italic_α ] end_POSTSUBSCRIPT ( italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) + divide start_ARG over¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT - italic_u end_ARG start_ARG over¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT - under¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_ARG caligraphic_U start_POSTSUBSCRIPT [ italic_u , over¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] end_POSTSUBSCRIPT ( italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT ) . (14)

We illustrate the belief dynamics in sub-figure c) by visualizing the initial belief in light-blue and the resulting belief in dark-blue. It can be seen that the contact dynamics result in a concentration of probability mass around the contact point. This is due to the fact that all probability mass to the left of the contact has moved to the same position interval, i.e. the contact point subject to perturbation. In the extreme case of pushing all the way through the interval of the uniform distribution, i.e. u=q¯o𝑢superscript¯𝑞𝑜u=\bar{q}^{o}italic_u = over¯ start_ARG italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT, the object belief is equivalent to the perturbation distribution with the mean shifted to the contact point. The closed-form belief dynamics in (14) show that the probability distribution can be controlled by the actions a robot takes. This is underlined by sub-figure d) plotting the variance VV\mathrm{V}roman_V of the object position after the push as a function of how far the robot pushed. Note that while in this specific example any control action u𝑢uitalic_u that makes contact with the object reduces the variance of the belief over its position, this may not be the case in general. An unfavorable control action could generally also increase the variance of the belief.

4.3 Variance Prediction.

While the 1D-example above showed a closed-form solution to the belief dynamics, this is typically intractable. Consequently, predicting the variance with

V+=Vb+[𝒒o],subscriptVsubscriptVsubscript𝑏delimited-[]superscript𝒒𝑜\mathrm{V}_{+}=\mathrm{V}_{b_{+}}\left[\bm{q}^{o}\right],roman_V start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = roman_V start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] , (15)

in closed-form is also intractable. A common approach is to use a Monte-Carlo approximation of the belief dynamics in (11) in order to compute the variance of the approximated belief update (Kappen (2015); Shirai et al. (2023)). However, this is problematic since this induces stochasticity in the prediction of the variance V+subscriptV\mathrm{V}_{+}roman_V start_POSTSUBSCRIPT + end_POSTSUBSCRIPT. Evaluating whether a control action reduces or increases the variance may thus be subject to uncertainty itself, which introduces numerical issues in downstream optimization techniques. The induced noise furthermore depends on the number of samples used to approximate the belief. Thus, numerical problems and approximation errors may only be avoided by choosing a high number of samples, rendering the underlying planning technique inefficient. In the following, we derive a different way of predicting the variance of object configurations without depending on the transition probability. In other words, we eliminate the need to sample from the belief transition distribution in (10) in order to predict the variance given a robot action.

Refer to caption
Figure 4: Block diagram depicting one iteration of BS-VP-STO. The algorithm starts with sampling a population of latent candidate trajectory variables 𝜺𝜺\bm{\varepsilon}bold_italic_ε. These are then decoded into robot trajectories 𝒒0:Krsubscriptsuperscript𝒒𝑟:0𝐾\bm{q}^{r}_{0:K}bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K end_POSTSUBSCRIPT using a contact prior. For each candidate trajectory the object belief is rolled-out using the nominal object dynamics. The variance gain together with the mean control cost is then used to compute the fitness of each candidate trajectory. Finally, the distribution of candidate trajectory variables weighted by the fitness is used to update the Gaussian approximation of the probability distribution using CMA-ES. After M𝑀Mitalic_M iterations, the algorithm returns the best performing candidate trajectory as solution.

Instead of predicting the updated variance after taking a control action by computing the variance of the predicted belief as in (15), we compute the variance of the noisy object dynamics in (7), i.e.

V+=Vb,pw[𝒇(𝒒o,𝒖)+η𝒘]=Vb[𝒇(𝒒o,𝒖)]+Vb,pw[η𝒘]+2Eb,pw[(𝒇(𝒒o,𝒖)Eb[𝒇(𝒒o,𝒖)])(η𝒘Eb,pw[η𝒘])].subscriptVsubscriptV𝑏subscript𝑝𝑤delimited-[]𝒇superscript𝒒𝑜𝒖𝜂𝒘subscriptV𝑏delimited-[]𝒇superscript𝒒𝑜𝒖subscriptV𝑏subscript𝑝𝑤delimited-[]𝜂𝒘2subscriptE𝑏subscript𝑝𝑤delimited-[]superscript𝒇superscript𝒒𝑜𝒖subscriptE𝑏delimited-[]𝒇superscript𝒒𝑜𝒖top𝜂𝒘subscriptE𝑏subscript𝑝𝑤delimited-[]𝜂𝒘\displaystyle\begin{split}\mathrm{V}_{+}&=\mathrm{V}_{b,p_{w}}[\bm{f}(\bm{q}^{% o},\bm{u})+\eta\bm{w}]\\ &=\mathrm{V}_{b}[\bm{f}(\bm{q}^{o},\bm{u})]+\mathrm{V}_{b,p_{w}}[\eta\bm{w}]+% \\ 2&\mathrm{E}_{b,p_{w}}\left[(\bm{f}(\bm{q}^{o},\bm{u})-\mathrm{E}_{b}[\bm{f}(% \bm{q}^{o},\bm{u})])^{\scriptscriptstyle\top}(\eta\bm{w}-\mathrm{E}_{b,p_{w}}[% \eta\bm{w}])\right].\end{split}start_ROW start_CELL roman_V start_POSTSUBSCRIPT + end_POSTSUBSCRIPT end_CELL start_CELL = roman_V start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) + italic_η bold_italic_w ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_V start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) ] + roman_V start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η bold_italic_w ] + end_CELL end_ROW start_ROW start_CELL 2 end_CELL start_CELL roman_E start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) - roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) ] ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_η bold_italic_w - roman_E start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η bold_italic_w ] ) ] . end_CELL end_ROW (16)

At this point, we introduce the assumption that the expectation of the perturbation 𝒘𝒘\bm{w}bold_italic_w is zero, i.e. Epw[𝒘]=𝟎subscriptEsubscript𝑝𝑤delimited-[]𝒘0\mathrm{E}_{p_{w}}[\bm{w}]=\bm{0}roman_E start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_w ] = bold_0. As a result, perturbations are modeled as zero-mean noise in the tangential directions with respect to the contact point and zero noise in the normal direction of the contact. We furthermore note that the contact indicator η𝜂\etaitalic_η and the perturbation are independent, such that the expectation of the product of the contact indicator and the perturbation is zero, i.e. Eb,pw[η𝒘]=Eb[η]Epw[𝒘]=𝟎subscriptE𝑏subscript𝑝𝑤delimited-[]𝜂𝒘subscriptE𝑏delimited-[]𝜂subscriptEsubscript𝑝𝑤delimited-[]𝒘0\mathrm{E}_{b,p_{w}}[\eta\bm{w}]=\mathrm{E}_{b}[\eta]\mathrm{E}_{p_{w}}[\bm{w}% ]=\bm{0}roman_E start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η bold_italic_w ] = roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_η ] roman_E start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_w ] = bold_0. Consequently, the third term in (16) simplifies to

Eb,pw[(𝒇(𝒒o,𝒖)Eb[𝒇(𝒒o,𝒖)])(η𝒘Eb,pw[η𝒘])]=Eb,pw[η(𝒇(𝒒o,𝒖)Eb[𝒇(𝒒o,𝒖)])𝒘]=Eb[η(𝒇(𝒒o,𝒖)Eb[𝒇(𝒒o,𝒖)])]Epw[𝒘]=𝟎.subscriptE𝑏subscript𝑝𝑤delimited-[]superscript𝒇superscript𝒒𝑜𝒖subscriptE𝑏delimited-[]𝒇superscript𝒒𝑜𝒖top𝜂𝒘subscriptE𝑏subscript𝑝𝑤delimited-[]𝜂𝒘subscriptE𝑏subscript𝑝𝑤delimited-[]𝜂superscript𝒇superscript𝒒𝑜𝒖subscriptE𝑏delimited-[]𝒇superscript𝒒𝑜𝒖top𝒘subscriptE𝑏superscriptdelimited-[]𝜂𝒇superscript𝒒𝑜𝒖subscriptE𝑏delimited-[]𝒇superscript𝒒𝑜𝒖topsubscriptEsubscript𝑝𝑤delimited-[]𝒘0\displaystyle\begin{split}&\mathrm{E}_{b,p_{w}}\left[(\bm{f}(\bm{q}^{o},\bm{u}% )-\mathrm{E}_{b}[\bm{f}(\bm{q}^{o},\bm{u})])^{\scriptscriptstyle\top}(\eta\bm{% w}-\mathrm{E}_{b,p_{w}}[\eta\bm{w}])\right]=\\ &\mathrm{E}_{b,p_{w}}\left[\eta(\bm{f}(\bm{q}^{o},\bm{u})-\mathrm{E}_{b}[\bm{f% }(\bm{q}^{o},\bm{u})])^{\scriptscriptstyle\top}\bm{w}\right]=\\ &\mathrm{E}_{b}\left[\eta(\bm{f}(\bm{q}^{o},\bm{u})-\mathrm{E}_{b}[\bm{f}(\bm{% q}^{o},\bm{u})])\right]^{\scriptscriptstyle\top}\mathrm{E}_{p_{w}}\left[\bm{w}% \right]=\bm{0}.\end{split}start_ROW start_CELL end_CELL start_CELL roman_E start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) - roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) ] ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_η bold_italic_w - roman_E start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η bold_italic_w ] ) ] = end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL roman_E start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η ( bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) - roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) ] ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_w ] = end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_η ( bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) - roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) ] ) ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_E start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_w ] = bold_0 . end_CELL end_ROW (17)

As a result, the predicted variance in (16) is equivalent to the sum of the nominally predicted variance and the variance of the induced noise. The predicted variance is thus

V+=Vb[𝒇(𝒒o,𝒖)]+Vb,pw[η𝒘].subscriptVsubscriptV𝑏delimited-[]𝒇superscript𝒒𝑜𝒖subscriptV𝑏subscript𝑝𝑤delimited-[]𝜂𝒘\mathrm{V}_{+}=\mathrm{V}_{b}\left[\bm{f}(\bm{q}^{o},\bm{u})\right]+\mathrm{V}% _{b,p_{w}}\left[\eta\bm{w}\right].roman_V start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = roman_V start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) ] + roman_V start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η bold_italic_w ] . (18)

The left-hand term is computed through the nominal object dynamics 𝒇𝒇\bm{f}bold_italic_f, thus not including perturbations. The right-hand term is the variance contribution from the noise acting on the object when making contact. With the expectation of pwsubscript𝑝𝑤p_{w}italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT being zero, i.e. Eb,pw[η𝒘]=𝟎subscriptE𝑏subscript𝑝𝑤delimited-[]𝜂𝒘0\mathrm{E}_{b,p_{w}}[\eta\bm{w}]=\bm{0}roman_E start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η bold_italic_w ] = bold_0, the variance of the noise is given with

Vb,pw[η𝒘]=Eb,pw[(η𝒘)(η𝒘)]=Eb,pw[η2𝒘𝒘]=Eb[η2]Epw[𝒘𝒘].subscriptV𝑏subscript𝑝𝑤delimited-[]𝜂𝒘subscriptE𝑏subscript𝑝𝑤delimited-[]superscript𝜂𝒘top𝜂𝒘subscriptE𝑏subscript𝑝𝑤delimited-[]superscript𝜂2superscript𝒘top𝒘subscriptE𝑏delimited-[]superscript𝜂2subscriptEsubscript𝑝𝑤delimited-[]superscript𝒘top𝒘\displaystyle\begin{split}&\mathrm{V}_{b,p_{w}}\left[\eta\bm{w}\right]=\mathrm% {E}_{b,p_{w}}\left[(\eta\bm{w})^{\scriptscriptstyle\top}(\eta\bm{w})\right]=\\ &\mathrm{E}_{b,p_{w}}\left[\eta^{2}\bm{w}^{\scriptscriptstyle\top}\bm{w}\right% ]=\mathrm{E}_{b}\left[\eta^{2}\right]\mathrm{E}_{p_{w}}\left[\bm{w}^{% \scriptscriptstyle\top}\bm{w}\right].\end{split}start_ROW start_CELL end_CELL start_CELL roman_V start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η bold_italic_w ] = roman_E start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ ( italic_η bold_italic_w ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( italic_η bold_italic_w ) ] = end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL roman_E start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_w ] = roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] roman_E start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_w ] . end_CELL end_ROW (19)

With η=η2{0,1}𝜂superscript𝜂201\eta=\eta^{2}\in\{0,1\}italic_η = italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ∈ { 0 , 1 }, the expectation of the squared contact indicator is equal to the expected contact indicator, i.e. Eb[η2]=Eb[η]subscriptE𝑏delimited-[]superscript𝜂2subscriptE𝑏delimited-[]𝜂\mathrm{E}_{b}\left[\eta^{2}\right]=\mathrm{E}_{b}\left[\eta\right]roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_η start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ] = roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_η ]. The expected value of the contact indicator is the probability of the robot making contact with the object given a belief over object positions and a control action. Furthermore, note that due to the zero-mean property of our noise distribution, the expectation of the squared perturbation is equivalent to the variance of the perturbation, i.e. Epw[𝒘𝒘]=V𝒘subscriptEsubscript𝑝𝑤delimited-[]superscript𝒘top𝒘subscriptV𝒘\mathrm{E}_{p_{w}}\left[\bm{w}^{\scriptscriptstyle\top}\bm{w}\right]=\mathrm{V% }_{\bm{w}}roman_E start_POSTSUBSCRIPT italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_w start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_w ] = roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT. Inserting these equalities in (19), the variance of the applied perturbation can be expressed with

Vb,pw[η𝒘]=Eb[η]V𝒘.subscriptV𝑏subscript𝑝𝑤delimited-[]𝜂𝒘subscriptE𝑏delimited-[]𝜂subscriptV𝒘\mathrm{V}_{b,p_{w}}\left[\eta\bm{w}\right]=\mathrm{E}_{b}\left[\eta\right]% \mathrm{V}_{\bm{w}}.roman_V start_POSTSUBSCRIPT italic_b , italic_p start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η bold_italic_w ] = roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_η ] roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT . (20)

As a result of this section, we predict the variance of the object configuration with

V+=Vb[𝒇(𝒒o,𝒖)]+Eb[η]V𝒘.subscriptVsubscriptV𝑏delimited-[]𝒇superscript𝒒𝑜𝒖subscriptE𝑏delimited-[]𝜂subscriptV𝒘\mathrm{V}_{+}=\mathrm{V}_{b}\left[\bm{f}(\bm{q}^{o},\bm{u})\right]+\mathrm{E}% _{b}\left[\eta\right]\mathrm{V}_{\bm{w}}.roman_V start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = roman_V start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) ] + roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_η ] roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT . (21)

The result in (21) is central to our contribution, as this allows us to predict the variance of the object configuration, i.e. the uncertainty, based on the variance of the perturbation V𝒘subscriptV𝒘\mathrm{V}_{\bm{w}}roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT that is assumed to be a constant value. Especially when those perturbations are constrained to the tangent space of the contact, sampling consistent perturbations involves expensive computations and makes the prediction of the variance stochastic.

4.3.1 Monte-Carlo Approximation of the Nominal Dynamics.

We approximate the variance contribution resulting from the nominal contact dynamics using a non-parametric representation of the belief, i.e. particles. We denote the approximated belief as a set of Npsubscript𝑁𝑝N_{p}italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT particles {𝒒oi,αi}i=1Npsuperscriptsubscriptsuperscriptsuperscript𝒒𝑜𝑖superscript𝛼𝑖𝑖1subscript𝑁𝑝\{{}^{i}\bm{q}^{o},{}^{i}\alpha\}_{i=1}^{N_{p}}{ start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT italic_α } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT, where each particle consists of a state sample 𝒒oisuperscriptsuperscript𝒒𝑜𝑖{}^{i}\bm{q}^{o}start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT and a corresponding weight αisuperscript𝛼𝑖{}^{i}\alphastart_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT italic_α. The weight of a particle, 0αi10superscript𝛼𝑖10\leq{}^{i}\alpha\leq 10 ≤ start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT italic_α ≤ 1, represents an approximate belief in the corresponding state sample 𝒒oisuperscriptsuperscript𝒒𝑜𝑖{}^{i}\bm{q}^{o}start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT. A prediction step consists of propagating each particle through the nominal contact dynamics, i.e.

𝒒+oi=𝒇(𝒒oi,𝒖),i{1,2,..,Np}.{}^{i}\bm{q}^{o}_{+}=\bm{f}({}^{i}\bm{q}^{o},\bm{u}),\,\forall i\in\{1,2,..,N_% {p}\}.start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = bold_italic_f ( start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) , ∀ italic_i ∈ { 1 , 2 , . . , italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT } . (22)

Since we do not take any measurements during planning, we assume that the particle weights are constant and equally distributed, such that αi=1/Np,isuperscript𝛼𝑖1subscript𝑁𝑝for-all𝑖{}^{i}\alpha=\nicefrac{{1}}{{N_{p}}},\;\forall istart_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT italic_α = / start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG , ∀ italic_i. Based on the particle-representation of the belief, we estimate the variance of the object configuration with

V^b[𝒒o]=1Npi=1Np(𝒒oi𝝁o)(𝒒oi𝝁o).subscript^V𝑏delimited-[]superscript𝒒𝑜1subscript𝑁𝑝superscriptsubscript𝑖1subscript𝑁𝑝superscriptsuperscriptsuperscript𝒒𝑜𝑖superscript𝝁𝑜topsuperscriptsuperscript𝒒𝑜𝑖superscript𝝁𝑜\hat{\mathrm{V}}_{b}[\bm{q}^{o}]=\frac{1}{N_{p}}\sum_{i=1}^{N_{p}}\left({}^{i}% \bm{q}^{o}-\bm{\mu}^{o}\right)^{\scriptscriptstyle\top}\left({}^{i}\bm{q}^{o}-% \bm{\mu}^{o}\right).over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT ( start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT - bold_italic_μ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT - bold_italic_μ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) . (23)

The empirical mean object configuration is computed with 𝝁o=1/Npi=1Np𝒒oisuperscript𝝁𝑜1subscript𝑁𝑝superscriptsubscript𝑖1subscript𝑁𝑝superscriptsuperscript𝒒𝑜𝑖\bm{\mu}^{o}=\nicefrac{{1}}{{N_{p}}}\sum_{i=1}^{N_{p}}{}^{i}\bm{q}^{o}bold_italic_μ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT = / start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT. The probability of making contact Eb[η]subscriptE𝑏delimited-[]𝜂\mathrm{E}_{b}\left[\eta\right]roman_E start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_η ] is approximated as

E^b[η]=1Npi=1Npηi,subscript^E𝑏delimited-[]𝜂1subscript𝑁𝑝superscriptsubscript𝑖1subscript𝑁𝑝superscript𝜂𝑖\hat{\mathrm{E}}_{b}\left[\eta\right]=\frac{1}{N_{p}}\sum_{i=1}^{N_{p}}{}^{i}\eta,over^ start_ARG roman_E end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_η ] = divide start_ARG 1 end_ARG start_ARG italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_ARG ∑ start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT italic_η , (24)

where ηisuperscript𝜂𝑖{}^{i}\etastart_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT italic_η indicates if the i𝑖iitalic_i-th particle has an object configuration that is in contact with the robot. As a result, we approximate the overall variance prediction with

V^+=V^b[𝒇(𝒒o,𝒖)]+E^b[η]V𝒘.subscript^Vsubscript^V𝑏delimited-[]𝒇superscript𝒒𝑜𝒖subscript^E𝑏delimited-[]𝜂subscriptV𝒘\hat{\mathrm{V}}_{+}=\hat{\mathrm{V}}_{b}\left[\bm{f}(\bm{q}^{o},\bm{u})\right% ]+\hat{\mathrm{E}}_{b}\left[\eta\right]\mathrm{V}_{\bm{w}}.over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT + end_POSTSUBSCRIPT = over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u ) ] + over^ start_ARG roman_E end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ italic_η ] roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT . (25)

When comparing the predicted variance V^+subscript^V\hat{\mathrm{V}}_{+}over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT + end_POSTSUBSCRIPT against the variance of the current time step V^b[𝒒o]subscript^V𝑏delimited-[]superscript𝒒𝑜\hat{\mathrm{V}}_{b}[\bm{q}^{o}]over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ], the same set of particles is used to compute the empirical variance as in (23). Thus, evaluating the approximated variance dynamics does not involve sampling and is thus deterministic. We exploit this approximation in Sec. 5 when optimizing robot trajectories based on the predicted variance.

5 Stochastic Trajectory Optimization for Robust Manipulation

Given our objective to push an object into a desired goal configuration subject to stochastic contact dynamics (cf. (2)), this section presents a framework that optimizes for robust robot trajectories directly in the belief space over possible object configurations. This framework extends our previous work VP-STO (Jankowski et al. (2023)) to belief-space via-point-based stochastic trajectory optimization (BS-VP-STO). Our framework exploits the variance prediction developed in Sec. 4 for synthesizing robust manipulation behavior. Due to the quasi-static pushing model in (5a) - (5c), a robot trajectory is equivalent to a control trajectory, i.e. 𝒒1:Kr=𝒖0:K1subscriptsuperscript𝒒𝑟:1𝐾subscript𝒖:0𝐾1\bm{q}^{r}_{1:K}=\bm{u}_{0:K-1}bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 : italic_K end_POSTSUBSCRIPT = bold_italic_u start_POSTSUBSCRIPT 0 : italic_K - 1 end_POSTSUBSCRIPT. Thus, BS-VP-STO is a shooting method aiming at minimizing an objective that depends solely on the object configuration as in (1). Due to the non-smooth nature of the contact dynamics and the resulting non-smooth cost function with respect to the optimization variable, we approach the optimization problem with a gradient-free, i.e. zero-order, evolutionary optimization technique.

Fig. 4 illustrates the optimization loop that is based on zero-order optimization of the variable 𝜺𝜺\bm{\varepsilon}bold_italic_ε, which uniquely encodes a robot trajectory. At the beginning of the m𝑚mitalic_m-th iteration, we sample Ncandsubscript𝑁candN_{\mathrm{cand}}italic_N start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT candidate trajectories from a latent Gaussian distribution that represents the current solution to the optimization problem:

𝜺j𝒩(𝜺¯m,𝚺m),j{1,2,,Ncand}.formulae-sequencesimilar-tosuperscript𝜺𝑗𝒩subscript¯𝜺𝑚subscript𝚺𝑚for-all𝑗12subscript𝑁cand\bm{\varepsilon}^{j}\sim\mathcal{N}(\bar{\bm{\varepsilon}}_{m},\bm{\Sigma}_{m}% ),\,\forall j\in\{1,2,\ldots,N_{\mathrm{cand}}\}.bold_italic_ε start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ∼ caligraphic_N ( over¯ start_ARG bold_italic_ε end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) , ∀ italic_j ∈ { 1 , 2 , … , italic_N start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT } . (26)

The latent variable 𝜺jsubscript𝜺𝑗\bm{\varepsilon}_{j}bold_italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT translates to a robot trajectory through an affine map** g𝑔gitalic_g, i.e. 𝒖0:K1j=g(𝜺j)subscriptsuperscript𝒖𝑗:0𝐾1𝑔subscript𝜺𝑗\bm{u}^{j}_{0:K-1}=g(\bm{\varepsilon}_{j})bold_italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K - 1 end_POSTSUBSCRIPT = italic_g ( bold_italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). This affine map** imposes a contact prior on the sampling of robot trajectories, which is presented in Sec. 5.2. Given a candidate robot trajectory 𝒖0:K1jsubscriptsuperscript𝒖𝑗:0𝐾1\bm{u}^{j}_{0:K-1}bold_italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K - 1 end_POSTSUBSCRIPT and an initial belief over object positions b0=p(𝒒o)subscript𝑏0𝑝superscript𝒒𝑜b_{0}=p(\bm{q}^{o})italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_p ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ), we compute the nominal belief dynamics. This results in a nominal belief trajectory b~0:Kjsubscriptsuperscript~𝑏𝑗:0𝐾\tilde{b}^{j}_{0:K}over~ start_ARG italic_b end_ARG start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K end_POSTSUBSCRIPT for each candidate 𝜺jsubscript𝜺𝑗\bm{\varepsilon}_{j}bold_italic_ε start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Based on the nominal belief trajectory we compute the step-wise predicted variance as developed in Sec. 4. The predicted variance is subsequently used in a cost and a constraint to the optimization problem, which is further outlined in Sec. 5.1. Together with a cost for controlling the mean of the object configuration, the total cost of each candidate trajectory is used to update the parameters of the latent Gaussian distribution, i.e. 𝜺¯m+1,𝚺m+1subscript¯𝜺𝑚1subscript𝚺𝑚1\bar{\bm{\varepsilon}}_{m+1},\bm{\Sigma}_{m+1}over¯ start_ARG bold_italic_ε end_ARG start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT , bold_Σ start_POSTSUBSCRIPT italic_m + 1 end_POSTSUBSCRIPT based on Covariance Matrix Adaptation (CMA-ES) (Hansen (2016)). After M𝑀Mitalic_M iterations, we return the best performing sample that returned the lowest cost.

5.1 Variance Gain Control

In Sec. 4 we derive an approximation of the one-step prediction of the variance of the object configuration. However, this does not enable the prediction of the variance after multiple time-steps, which is due to the fact that the prediction of the variance V^k+1subscript^V𝑘1\hat{\mathrm{V}}_{k+1}over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT in (25) requires the belief of the previous time step bksubscript𝑏𝑘b_{k}italic_b start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT to be known. Therefore, instead of directly controlling the variance at the end of the trajectory V^Ksubscript^V𝐾\hat{\mathrm{V}}_{K}over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT, we propose to control the predicted variance at each time step. Given a robot trajectory 𝒖0:K1jsubscriptsuperscript𝒖𝑗:0𝐾1\bm{u}^{j}_{0:K-1}bold_italic_u start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K - 1 end_POSTSUBSCRIPT, we thus compute the nominal belief over object configurations at each time step, i.e. b~ksubscript~𝑏𝑘\tilde{b}_{k}over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, via the the nominal object dynamics in (6). Given the particle set that approximates the initial belief b0=p(𝒒o)subscript𝑏0𝑝superscript𝒒𝑜b_{0}=p(\bm{q}^{o})italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = italic_p ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) as an input to BS-VP-STO, the nominal belief rollout is computed by applying the nominal forward dynamics to all particles:

𝒒k+1oi=𝒇(𝒒koi,𝒖k),i{1,2,..,Np}.{}^{i}\bm{q}^{o}_{k+1}=\bm{f}\left({}^{i}\bm{q}^{o}_{k},\bm{u}_{k}\right),\,% \forall i\in\{1,2,..,N_{p}\}.start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT = bold_italic_f ( start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , ∀ italic_i ∈ { 1 , 2 , . . , italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT } . (27)

The one-step prediction of the variance at each time step can then be computed as

V^k+1=V^b~k[𝒇(𝒒o,𝒖k)]+E^b~k[η]V𝒘=V^b~k+1[𝒒o]+E^b~k[η]V𝒘.subscript^V𝑘1subscript^Vsubscript~𝑏𝑘delimited-[]𝒇superscript𝒒𝑜subscript𝒖𝑘subscript^Esubscript~𝑏𝑘delimited-[]𝜂subscriptV𝒘subscript^Vsubscript~𝑏𝑘1delimited-[]superscript𝒒𝑜subscript^Esubscript~𝑏𝑘delimited-[]𝜂subscriptV𝒘\displaystyle\begin{split}\hat{\mathrm{V}}_{k+1}&=\hat{\mathrm{V}}_{\tilde{b}_% {k}}\left[\bm{f}(\bm{q}^{o},\bm{u}_{k})\right]+\hat{\mathrm{E}}_{\tilde{b}_{k}% }\left[\eta\right]\mathrm{V}_{\bm{w}}\\ &=\hat{\mathrm{V}}_{\tilde{b}_{k+1}}\left[\bm{q}^{o}\right]+\hat{\mathrm{E}}_{% \tilde{b}_{k}}\left[\eta\right]\mathrm{V}_{\bm{w}}.\end{split}start_ROW start_CELL over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_CELL start_CELL = over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_f ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) ] + over^ start_ARG roman_E end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η ] roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] + over^ start_ARG roman_E end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η ] roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT . end_CELL end_ROW (28)

In order to quantify the change of uncertainty due to a given control action 𝒖ksubscript𝒖𝑘\bm{u}_{k}bold_italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, we are interested in the amount of variance that is gained over one time step. Note that the variance of a continuous random variable is closely related to its differential entropy. While the variance as defined in (3) is equivalent to the trace of the covariance matrix, i.e. the sum over all eigenvalues, the upper bound of the differential entropy is monotonic in the determinant of the covariance matrix, i.e. the product over all eigenvalues (Cover and Thomas (2005)). As a result, the variance of a continuous random variable yields an upper bound for its entropy. Therefore, we introduce a new metric γ𝛾\gammaitalic_γ, which we call variance gain, measuring the relative change of the variance after applying an action 𝒖ksubscript𝒖𝑘\bm{u}_{k}bold_italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, i.e.

γk=V^k+1+V^b~k[𝒒o]+V𝒘=V^b~k+1[𝒒o]+E^b~k[η]V𝒘V^b~k[𝒒o]+V𝒘.subscript𝛾𝑘superscriptsubscript^V𝑘1subscript^Vsubscript~𝑏𝑘delimited-[]superscript𝒒𝑜subscriptV𝒘subscript^Vsubscript~𝑏𝑘1delimited-[]superscript𝒒𝑜subscript^Esubscript~𝑏𝑘delimited-[]𝜂subscriptV𝒘subscript^Vsubscript~𝑏𝑘delimited-[]superscript𝒒𝑜subscriptV𝒘\gamma_{k}=\frac{\hat{\mathrm{V}}_{k+1}^{+}}{\hat{\mathrm{V}}_{\tilde{b}_{k}}% \left[\bm{q}^{o}\right]+\mathrm{V}_{\bm{w}}}=\frac{\hat{\mathrm{V}}_{\tilde{b}% _{k+1}}\left[\bm{q}^{o}\right]+\hat{\mathrm{E}}_{\tilde{b}_{k}}\left[\eta% \right]\mathrm{V}_{\bm{w}}}{\hat{\mathrm{V}}_{\tilde{b}_{k}}\left[\bm{q}^{o}% \right]+\mathrm{V}_{\bm{w}}}.italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = divide start_ARG over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT + end_POSTSUPERSCRIPT end_ARG start_ARG over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] + roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT end_ARG = divide start_ARG over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] + over^ start_ARG roman_E end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η ] roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT end_ARG start_ARG over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] + roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT end_ARG . (29)

The variance gain γ𝛾\gammaitalic_γ is the ratio of the output variance, i.e. the predicted variance V^k+1subscript^V𝑘1\hat{\mathrm{V}}_{k+1}over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT, to the input variance, i.e. V^k+V𝒘subscript^V𝑘subscriptV𝒘\hat{\mathrm{V}}_{k}+\mathrm{V}_{\bm{w}}over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT + roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT. Fig. 5 illustrates three different robot actions resulting in different variance gains. It shows that the contact geometry plays a crucial role when planning to make contact between the robot and an object. The variance gain γ𝛾\gammaitalic_γ reflects that the contact geometry affects the robustness of a contact, e.g. when pushing. The left sub-figure shows that using one finger for pushing a circular object with uncertain location results in an increase of variance (γ>1𝛾1\gamma>1italic_γ > 1). In contrast, the right sub-figure shows that using two fingers with one finger pushing the object towards the other finger results in a decrease of variance (γ<1𝛾1\gamma<1italic_γ < 1). Using a flat contact surface to push an object with uncertain location keeps the variance constant as shown in the middle sub-figure (γ=1𝛾1\gamma=1italic_γ = 1). Note that the variance gain is lower or equal to one for robot actions that have zero probability of making contact with the object. In this case, the belief does not change, i.e. V^b~k+1[𝒒o]=V^b~k[𝒒o]subscript^Vsubscript~𝑏𝑘1delimited-[]superscript𝒒𝑜subscript^Vsubscript~𝑏𝑘delimited-[]superscript𝒒𝑜\hat{\mathrm{V}}_{\tilde{b}_{k+1}}\left[\bm{q}^{o}\right]=\hat{\mathrm{V}}_{% \tilde{b}_{k}}\left[\bm{q}^{o}\right]over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] = over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ], since no perturbation is injected into the belief, i.e. E^b~k[η]V𝒘=0subscript^Esubscript~𝑏𝑘delimited-[]𝜂subscriptV𝒘0\hat{\mathrm{E}}_{\tilde{b}_{k}}\left[\eta\right]\mathrm{V}_{\bm{w}}=0over^ start_ARG roman_E end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ italic_η ] roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT = 0. Thus, a no-contact action results in

γnocontact=V^b~k[𝒒o]V^b~k[𝒒o]+V𝒘1.subscript𝛾nocontactsubscript^Vsubscript~𝑏𝑘delimited-[]superscript𝒒𝑜subscript^Vsubscript~𝑏𝑘delimited-[]superscript𝒒𝑜subscriptV𝒘1\gamma_{\mathrm{no-contact}}=\frac{\hat{\mathrm{V}}_{\tilde{b}_{k}}\left[\bm{q% }^{o}\right]}{\hat{\mathrm{V}}_{\tilde{b}_{k}}\left[\bm{q}^{o}\right]+\mathrm{% V}_{\bm{w}}}\leq 1.italic_γ start_POSTSUBSCRIPT roman_no - roman_contact end_POSTSUBSCRIPT = divide start_ARG over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] end_ARG start_ARG over^ start_ARG roman_V end_ARG start_POSTSUBSCRIPT over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] + roman_V start_POSTSUBSCRIPT bold_italic_w end_POSTSUBSCRIPT end_ARG ≤ 1 . (30)
Refer to caption
Figure 5: Belief dynamics through contact in a two-dimensional example. The three sub-figures on the right illustrate the predicted belief b+subscript𝑏b_{+}italic_b start_POSTSUBSCRIPT + end_POSTSUBSCRIPT via samples in orange. All three cases started from the same initial belief b𝑏bitalic_b that is depicted in the left-most sub-figure. The visualization of the prediction on the left shows an increase of the variance of the object position as a consequence of pushing with a single contact point (γ>1𝛾1\gamma>1italic_γ > 1). The second prediction shows a constant variance as a consequence of pushing with a flat contact surface (γ=1𝛾1\gamma=1italic_γ = 1). The right-most sub-figure shows a decrease of the variance of object position as a consequence of pushing with two contact points (γ<1𝛾1\gamma<1italic_γ < 1).

We propose to enforce robustness in the optimization problem by constraining the solution to variance gains smaller or equal to one at all steps, i.e.

γk1k.subscript𝛾𝑘1for-all𝑘\gamma_{k}\leq 1\quad\forall\,k.italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 1 ∀ italic_k . (31)

Due to the zero-order technique that we use for optimizing the robot trajectory, we deploy the robustness constraint as a barrier cost, i.e. a discontinuous cost that is zero if the constraint is satisfied and returns a high value if the constraint is violated. We can further reduce the predicted variance by adding a cost term that is active if the constraint is already satisfied. This is encapsulated in the following cost term

crobust=λck=0K1e1γkK1,subscript𝑐robustsubscript𝜆𝑐superscriptsubscriptproduct𝑘0𝐾1superscript𝑒1subscript𝛾𝑘𝐾1c_{\mathrm{robust}}=\lambda_{c}\prod_{k=0}^{K-1}e^{-\frac{1-\gamma_{k}}{K-1}},italic_c start_POSTSUBSCRIPT roman_robust end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ∏ start_POSTSUBSCRIPT italic_k = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K - 1 end_POSTSUPERSCRIPT italic_e start_POSTSUPERSCRIPT - divide start_ARG 1 - italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_ARG start_ARG italic_K - 1 end_ARG end_POSTSUPERSCRIPT , (32)

with a discontinuous cost weight

λc={1ifmaxkγk1,103else.subscript𝜆𝑐cases1ifsubscript𝑘subscript𝛾𝑘1otherwisesuperscript103elseotherwise\lambda_{c}=\begin{cases}1\quad\quad\mathrm{if}\max_{k}\gamma_{k}\leq 1,\\ 10^{3}\hfill\mathrm{else}.\end{cases}italic_λ start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT = { start_ROW start_CELL 1 roman_if roman_max start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ≤ 1 , end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 10 start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT roman_else . end_CELL start_CELL end_CELL end_ROW (33)

The total cost of a candidate robot trajectory is computed as the sum of the variance gain control cost in (32) and a task-specific cost ctasksubscript𝑐taskc_{\mathrm{task}}italic_c start_POSTSUBSCRIPT roman_task end_POSTSUBSCRIPT that computes the deviation of the mean of the object configuration to the desired goal configuration.

5.2 Trajectory Sampling with a Contact Prior

Refer to caption
Figure 6: Comparison of uninformed and informed sampling of robot trajectories. The left sub-figure shows trajectory samples drawn from a probability distribution computed without a contact prior (𝑸𝒒=𝟎subscript𝑸𝒒0\bm{Q}_{\bm{q}}=\bm{0}bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT = bold_0). The right sub-figure shows trajectory samples drawn from a probability distribution computed with a contact prior (𝑸𝒒>𝟎subscript𝑸𝒒0\bm{Q}_{\bm{q}}>\bm{0}bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT > bold_0). The contact prior guides the sampling of robot trajectories towards regions where the robot is likely to make contact with the object given the object belief.

The efficiency of sampling-based optimization algorithms depend on the quality of the generated samples. In this section, we present how the object belief can be used to inform the sampling of robot trajectories for manipulation tasks. In general, it is desirable to sample trajectories with a likelihood that is proportional to the negative cost that can be expected from executing the trajectory. In the following, we denote a robot trajectory with 𝒒0:Kr=[𝒒0r,𝒒1r,,𝒒Kr]subscriptsuperscript𝒒𝑟:0𝐾subscriptsuperscript𝒒𝑟0subscriptsuperscript𝒒𝑟1subscriptsuperscript𝒒𝑟𝐾\bm{q}^{r}_{0:K}=\left[\bm{q}^{r}_{0},\bm{q}^{r}_{1},\ldots,\bm{q}^{r}_{K}\right]bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K end_POSTSUBSCRIPT = [ bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ], where K𝐾Kitalic_K corresponds to the number of discretized steps captured by the trajectory. Suppose that the cost is given as a function of the robot trajectory, i.e. c=fc(𝒒0:Kr)𝑐subscript𝑓𝑐subscriptsuperscript𝒒𝑟:0𝐾c=f_{c}(\bm{q}^{r}_{0:K})italic_c = italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K end_POSTSUBSCRIPT ), then we would like to sample robot trajectories from a corresponding probability distribution with

p(𝒒0:Kr)exp(fc(𝒒0:Kr)).proportional-to𝑝subscriptsuperscript𝒒𝑟:0𝐾subscript𝑓𝑐subscriptsuperscript𝒒𝑟:0𝐾p(\bm{q}^{r}_{0:K})\propto\exp\left(-f_{c}(\bm{q}^{r}_{0:K})\right).italic_p ( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K end_POSTSUBSCRIPT ) ∝ roman_exp ( - italic_f start_POSTSUBSCRIPT italic_c end_POSTSUBSCRIPT ( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K end_POSTSUBSCRIPT ) ) . (34)

While we do not have access to such a generative probability distribution, we approximate it with a prior that improves the sample-efficiency of the optimization algorithm compared to sampling initial guesses from an uninformed probability distribution. The idea is to sample robot trajectories in regions where the robot is likely to make contact with the object given its belief. This is motivated by the observation that making contact is a necessary precondition of manipulating an object.

Figure 6 illustrates the impact of the contact prior on trajectory samples. Without the contact prior, a large portion of the candidate trajectories explores regions in which the robot does not move into the object belief, and thus does not contribute to the optimization process.

5.2.1 Via-point-based Trajectory Parameterization.

In order to efficiently synthesize robot trajectories in a low-dimensional space, we adopt the via-point-based trajectory representation as in Jankowski et al. (2023). The robot configuration is given with

𝒒r(t)=𝚽via(t)𝜽+ϕ0(t,𝒒0r,𝒒˙0r),superscript𝒒𝑟𝑡subscript𝚽via𝑡𝜽subscriptbold-italic-ϕ0𝑡superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟\bm{q}^{r}(t)=\bm{\Phi}_{\mathrm{via}}(t)\bm{\theta}+\bm{\phi}_{0}(t,\bm{q}_{0% }^{r},\dot{\bm{q}}_{0}^{r}),bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_t ) = bold_Φ start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT ( italic_t ) bold_italic_θ + bold_italic_ϕ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) , (35)

where the robot trajectory is parameterized with

𝜽=(𝒒via1𝒒viaN)Nndofr.𝜽matrixsubscriptsuperscript𝒒1viasuperscriptsubscript𝒒via𝑁superscript𝑁superscriptsubscript𝑛dof𝑟\bm{\theta}=\begin{pmatrix}\bm{q}^{1}_{\mathrm{via}}\\ \vdots\\ \bm{q}_{\mathrm{via}}^{N}\end{pmatrix}\in\mathbb{R}^{N\cdot n_{\mathrm{dof}}^{% r}}.bold_italic_θ = ( start_ARG start_ROW start_CELL bold_italic_q start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL bold_italic_q start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) ∈ blackboard_R start_POSTSUPERSCRIPT italic_N ⋅ italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT . (36)

The trajectory parameter 𝜽𝜽\bm{\theta}bold_italic_θ contains N𝑁Nitalic_N via-points the trajectory passes through. The basis functions 𝚽via(t)subscript𝚽via𝑡\bm{\Phi}_{\mathrm{via}}(t)bold_Φ start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT ( italic_t ) enforce that the trajectory passes exactly through the via configurations while smoothly interpolating with minimal acceleration. The basis functions furthermore enforce that the velocity at the end of the trajectory is zero. Note that the last ndofrsuperscriptsubscript𝑛dof𝑟n_{\mathrm{dof}}^{r}italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT elements of 𝜽𝜽\bm{\theta}bold_italic_θ are the final robot configuration at the end of the trajectory, i.e. 𝒒r(T)=𝒒Kr=𝒒viaNsuperscript𝒒𝑟𝑇superscriptsubscript𝒒𝐾𝑟superscriptsubscript𝒒via𝑁\bm{q}^{r}(T)=\bm{q}_{K}^{r}=\bm{q}_{\mathrm{via}}^{N}bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_T ) = bold_italic_q start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = bold_italic_q start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. The basis offset ϕ0(t,𝒒0r,𝒒˙0r)subscriptbold-italic-ϕ0𝑡superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟\bm{\phi}_{0}(t,\bm{q}_{0}^{r},\dot{\bm{q}}_{0}^{r})bold_italic_ϕ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) incorporates the initial robot position 𝒒0rsuperscriptsubscript𝒒0𝑟\bm{q}_{0}^{r}bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT and velocity with 𝒒˙0rsuperscriptsubscript˙𝒒0𝑟\dot{\bm{q}}_{0}^{r}over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT. T𝑇Titalic_T denotes the duration of the trajectory. We use the time scaling algorithm in Jankowski et al. (2023) for computing the duration of a trajectory based on a given parameter 𝜽𝜽\bm{\theta}bold_italic_θ such that user-defined velocity and acceleration limits are enforced. For implementation details on how to compute the basis functions and offsets, please refer to Jankowski et al. (2022).

In the following, we are interested in computing a Gaussian distribution of the via-points 𝜽𝜽\bm{\theta}bold_italic_θ to efficiently sample from. Due to the affine map** from via-points to robot trajectories in (35), this corresponds to sampling from a Gaussian distribution of continuous robot trajectories.

5.2.2 Gaussian Contact Prior.

The contact prior is a probability distribution that guides the sampling of robot trajectories towards regions where the robot is likely to make contact with the object. To allow for variations in how to approach the contact with the object, we only consider the final robot configuration of the trajectory to be subject to the contact prior. This is incorporated by exploiting the parameterization of the robot trajectory in (35), where the final robot configuration is explicitly given by the last ndofrsuperscriptsubscript𝑛dof𝑟n_{\mathrm{dof}}^{r}italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT elements of 𝜽𝜽\bm{\theta}bold_italic_θ.

Suppose that a conditional Gaussian distribution

pc(𝒒r|𝒒o)=𝒩(𝒇c(𝒒o),𝚺r|o)subscript𝑝cconditionalsuperscript𝒒𝑟superscript𝒒𝑜𝒩subscript𝒇csuperscript𝒒𝑜superscript𝚺conditional𝑟𝑜p_{\mathrm{c}}(\bm{q}^{r}|\bm{q}^{o})=\mathcal{N}(\bm{f}_{\mathrm{c}}(\bm{q}^{% o}),\bm{\Sigma}^{r|o})italic_p start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT | bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) = caligraphic_N ( bold_italic_f start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) , bold_Σ start_POSTSUPERSCRIPT italic_r | italic_o end_POSTSUPERSCRIPT ) (37)

approximates the probability density of a robot configuration making contact with the object. Furthermore, suppose that the object configuration is Gaussian distributed as well with 𝒒o𝒩(𝝁o,𝚺o)similar-tosuperscript𝒒𝑜𝒩superscript𝝁𝑜superscript𝚺𝑜\bm{q}^{o}\sim\mathcal{N}(\bm{\mu}^{o},\bm{\Sigma}^{o})bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ∼ caligraphic_N ( bold_italic_μ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT , bold_Σ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ). Practically, we find the Gaussian distribution of object configurations by approximating the initial belief b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT with a Gaussian distribution for computing the contact prior. This lets us compute a probability distribution over robot configurations indicating how likely it is to establish a contact between the robot and the object:

pc(𝒒r)=𝒩(𝒇c(𝝁o),𝚺r|o+𝑨𝚺o𝑨),subscript𝑝csuperscript𝒒𝑟𝒩subscript𝒇csuperscript𝝁𝑜superscript𝚺conditional𝑟𝑜𝑨superscript𝚺𝑜superscript𝑨topp_{\mathrm{c}}(\bm{q}^{r})=\mathcal{N}(\bm{f}_{\mathrm{c}}(\bm{\mu}^{o}),\bm{% \Sigma}^{r|o}+\bm{A}\bm{\Sigma}^{o}\bm{A}^{\scriptscriptstyle\top}),italic_p start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) = caligraphic_N ( bold_italic_f start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_μ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) , bold_Σ start_POSTSUPERSCRIPT italic_r | italic_o end_POSTSUPERSCRIPT + bold_italic_A bold_Σ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT bold_italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) , (38)

with 𝑨=𝒇c/𝒒o|𝝁o𝑨evaluated-atsubscript𝒇csuperscript𝒒𝑜superscript𝝁𝑜\bm{A}=\nicefrac{{\partial\bm{f}_{\mathrm{c}}}}{{\partial\bm{q}^{o}}}|_{\bm{% \mu}^{o}}bold_italic_A = / start_ARG ∂ bold_italic_f start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT end_ARG start_ARG ∂ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT bold_italic_μ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT end_POSTSUBSCRIPT. The corresponding prior on the trajectory parameter 𝜽𝜽\bm{\theta}bold_italic_θ is then given by

pc(𝜽=(𝒒via1𝒒viaN))=𝒩((𝟎𝒒¯c),(𝟎𝟎𝟎𝑸𝒒)1),subscript𝑝c𝜽matrixsubscriptsuperscript𝒒1viasuperscriptsubscript𝒒via𝑁𝒩matrix0subscript¯𝒒csuperscriptmatrix000subscript𝑸𝒒1p_{\mathrm{c}}\left(\bm{\theta}{=}\begin{pmatrix}\bm{q}^{1}_{\mathrm{via}}\\ \vdots\\ \bm{q}_{\mathrm{via}}^{N}\end{pmatrix}\right)=\mathcal{N}\left(\begin{pmatrix}% \bm{0}\\ \vdots\\ \bar{\bm{q}}_{\mathrm{c}}\end{pmatrix},\begin{pmatrix}\bm{0}&\!\cdots&\!\!\bm{% 0}\\ \vdots&\!\ddots&\!\!\vdots\\ \bm{0}&\!\cdots&\!\!\bm{Q}_{\bm{q}}\end{pmatrix}^{-1}\right),italic_p start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_θ = ( start_ARG start_ROW start_CELL bold_italic_q start_POSTSUPERSCRIPT 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL bold_italic_q start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) ) = caligraphic_N ( ( start_ARG start_ROW start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) , ( start_ARG start_ROW start_CELL bold_0 end_CELL start_CELL ⋯ end_CELL start_CELL bold_0 end_CELL end_ROW start_ROW start_CELL ⋮ end_CELL start_CELL ⋱ end_CELL start_CELL ⋮ end_CELL end_ROW start_ROW start_CELL bold_0 end_CELL start_CELL ⋯ end_CELL start_CELL bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) , (39)

where 𝑸𝒒=(𝚺r|o+𝑨𝚺o𝑨)1subscript𝑸𝒒superscriptsuperscript𝚺conditional𝑟𝑜𝑨superscript𝚺𝑜superscript𝑨top1\bm{Q}_{\bm{q}}=\left(\bm{\Sigma}^{r|o}+\bm{A}\bm{\Sigma}^{o}\bm{A}^{% \scriptscriptstyle\top}\right)^{-1}bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT = ( bold_Σ start_POSTSUPERSCRIPT italic_r | italic_o end_POSTSUPERSCRIPT + bold_italic_A bold_Σ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT bold_italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT describes the precision matrix of the contact prior with respect to the mean contact configuration 𝒒¯c=𝒇c(𝝁o)subscript¯𝒒csubscript𝒇csuperscript𝝁𝑜\bar{\bm{q}}_{\mathrm{c}}=\bm{f}_{\mathrm{c}}(\bm{\mu}^{o})over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT = bold_italic_f start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_μ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ). We denote the contact prior with

pc(𝜽)=𝒩(𝜽¯c,𝑸𝜽1).subscript𝑝c𝜽𝒩subscript¯𝜽csuperscriptsubscript𝑸𝜽1p_{\mathrm{c}}(\bm{\theta})=\mathcal{N}(\bar{\bm{\theta}}_{\mathrm{c}},\bm{Q}_% {\bm{\theta}}^{-1}).italic_p start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_θ ) = caligraphic_N ( over¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT , bold_italic_Q start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) . (40)

Note that the resulting covariance matrix 𝑸𝜽1superscriptsubscript𝑸𝜽1\bm{Q}_{\bm{\theta}}^{-1}bold_italic_Q start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is degenerated due to zero-precision values for the via-points except for 𝒒viaNsuperscriptsubscript𝒒via𝑁\bm{q}_{\mathrm{via}}^{N}bold_italic_q start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. For resolving the degeneration, we regularize the covariance matrix by combining the contact prior with a smoothness prior as described in the following.

5.2.3 Gaussian Contact Prior in Joint Space.

Optimizing in the joint space of an articulated robot such as robot arms may be beneficial when kinematic and dynamic limitations are to be considered during planning. Sampling in joint space requires to represent the contact prior in joint space as well. Suppose that the robot’s end-effector, that is supposed to manipulate the object, has a configuration given by 𝒙rsuperscript𝒙𝑟\bm{x}^{r}bold_italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT which is computed from the robot’s joint positions 𝒒rsuperscript𝒒𝑟\bm{q}^{r}bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT via forward kinematics 𝒙r=𝒇fk(𝒒r)superscript𝒙𝑟subscript𝒇fksuperscript𝒒𝑟\bm{x}^{r}=\bm{f}_{\mathrm{fk}}(\bm{q}^{r})bold_italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT = bold_italic_f start_POSTSUBSCRIPT roman_fk end_POSTSUBSCRIPT ( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ). We may adopt the contact prior in (38) to formulate a prior distribution over configurations of the end-effector, i.e.

pc(𝒙r)=𝒩(𝒇c(𝝁o),𝚺r|o+𝑨𝚺o𝑨).subscript𝑝csuperscript𝒙𝑟𝒩subscript𝒇csuperscript𝝁𝑜superscript𝚺conditional𝑟𝑜𝑨superscript𝚺𝑜superscript𝑨topp_{\mathrm{c}}(\bm{x}^{r})=\mathcal{N}(\bm{f}_{\mathrm{c}}(\bm{\mu}^{o}),\bm{% \Sigma}^{r|o}+\bm{A}\bm{\Sigma}^{o}\bm{A}^{\scriptscriptstyle\top}).italic_p start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) = caligraphic_N ( bold_italic_f start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_μ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ) , bold_Σ start_POSTSUPERSCRIPT italic_r | italic_o end_POSTSUPERSCRIPT + bold_italic_A bold_Σ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT bold_italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) . (41)

For computing a corresponding Gaussian distribution in joint space, the forward kinematics are linearized around a mean joint position 𝒒¯crsuperscriptsubscript¯𝒒c𝑟\bar{\bm{q}}_{\mathrm{c}}^{r}over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT with

𝒙r𝒇fk(𝒒¯cr)+𝑱(𝒒¯cr)(𝒒r𝒒¯cr),superscript𝒙𝑟subscript𝒇fksuperscriptsubscript¯𝒒c𝑟𝑱superscriptsubscript¯𝒒c𝑟superscript𝒒𝑟superscriptsubscript¯𝒒c𝑟\bm{x}^{r}\approx\bm{f}_{\mathrm{fk}}(\bar{\bm{q}}_{\mathrm{c}}^{r})+\bm{J}(% \bar{\bm{q}}_{\mathrm{c}}^{r})\left(\bm{q}^{r}-\bar{\bm{q}}_{\mathrm{c}}^{r}% \right),bold_italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ≈ bold_italic_f start_POSTSUBSCRIPT roman_fk end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) + bold_italic_J ( over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) ( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT - over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) , (42)

where 𝑱(𝒒)=𝒙r/𝒒r|𝒒¯cr𝑱𝒒evaluated-atsuperscript𝒙𝑟superscript𝒒𝑟superscriptsubscript¯𝒒c𝑟\bm{J}(\bm{q})=\nicefrac{{\partial\bm{x}^{r}}}{{\partial\bm{q}^{r}}}|_{\bar{% \bm{q}}_{\mathrm{c}}^{r}}bold_italic_J ( bold_italic_q ) = / start_ARG ∂ bold_italic_x start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_ARG start_ARG ∂ bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_ARG | start_POSTSUBSCRIPT over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUBSCRIPT denotes the Jacobian with respect to the end-effector configuration. The mean joint position 𝒒¯crsuperscriptsubscript¯𝒒c𝑟\bar{\bm{q}}_{\mathrm{c}}^{r}over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT can be computed via inverse kinematics with respect to the mean end-effector configuration 𝒇c(𝝁o)subscript𝒇csuperscript𝝁𝑜\bm{f}_{\mathrm{c}}(\bm{\mu}^{o})bold_italic_f start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_μ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ). Consequently, the Gaussian contact prior for the end-effector can be locally transformed into the joint space, resulting in a Gaussian distribution pc(𝒒r)=𝒩(𝒒¯cr,𝑸𝒒1)subscript𝑝csuperscript𝒒𝑟𝒩superscriptsubscript¯𝒒c𝑟superscriptsubscript𝑸𝒒1p_{\mathrm{c}}(\bm{q}^{r})=\mathcal{N}(\bar{\bm{q}}_{\mathrm{c}}^{r},\bm{Q}_{% \bm{q}}^{-1})italic_p start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) = caligraphic_N ( over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ). The joint space contact precision matrix is computed with

𝑸𝒒=𝑱(𝒒¯cr)(𝚺r|o+𝑨𝚺o𝑨)1𝑱(𝒒¯cr).subscript𝑸𝒒𝑱superscriptsuperscriptsubscript¯𝒒c𝑟topsuperscriptsuperscript𝚺conditional𝑟𝑜𝑨superscript𝚺𝑜superscript𝑨top1𝑱superscriptsubscript¯𝒒c𝑟\bm{Q}_{\bm{q}}=\bm{J}(\bar{\bm{q}}_{\mathrm{c}}^{r})^{\scriptscriptstyle\top}% \left(\bm{\Sigma}^{r|o}+\bm{A}\bm{\Sigma}^{o}\bm{A}^{\scriptscriptstyle\top}% \right)^{-1}\bm{J}(\bar{\bm{q}}_{\mathrm{c}}^{r}).bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT = bold_italic_J ( over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_Σ start_POSTSUPERSCRIPT italic_r | italic_o end_POSTSUPERSCRIPT + bold_italic_A bold_Σ start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT bold_italic_A start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_J ( over¯ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) . (43)

5.2.4 Gaussian Smoothness Prior.

Refer to caption
Figure 7: Smooth trajectories sampled from the product of the smoothness prior and the contact prior. The contact prior is indicated by the orange ellipses and circles with the mean of the contact prior in the center. The velocity profile of the trajectories is encoded through color with low velocities in blue and high velocities in yellow. Note that all trajectories start and end with exactly zero velocity.

The smoothness prior, introduced in our previous work (Jankowski et al. (2023)), incorporates temporal correlations between via-points by computing a Gaussian distribution that expresses a high likelihood for low-acceleration profiles. A typical objective in trajectory optimization is to minimize the integral over squared accelerations of the candidate trajectory, i.e.

Js=120T𝒒¨r(t)𝑹𝒒𝒒¨r(t)𝑑t.subscript𝐽s12superscriptsubscript0𝑇superscript¨𝒒limit-from𝑟top𝑡subscript𝑹𝒒superscript¨𝒒𝑟𝑡differential-d𝑡J_{\mathrm{s}}=\frac{1}{2}\int_{0}^{T}\ddot{\bm{q}}^{r{\scriptscriptstyle\top}% }(t)\bm{R}_{\bm{q}}\ddot{\bm{q}}^{r}(t)dt.italic_J start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over¨ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r ⊤ end_POSTSUPERSCRIPT ( italic_t ) bold_italic_R start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT over¨ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_t ) italic_d italic_t . (44)

The positive definite matrix 𝑹𝒒subscript𝑹𝒒\bm{R}_{\bm{q}}bold_italic_R start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT encodes the desired smoothing for the individual degrees of freedom. Using the parameterization in (35), this objective can be expressed using the via-point parameter 𝜽𝜽\bm{\theta}bold_italic_θ and the initial conditions for the trajectory 𝒒0r,𝒒˙0rsuperscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟\bm{q}_{0}^{r},\dot{\bm{q}}_{0}^{r}bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT, i.e. Js(𝜽,𝒒0r,𝒒˙0r)subscript𝐽s𝜽superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟J_{\mathrm{s}}(\bm{\theta},\bm{q}_{0}^{r},\dot{\bm{q}}_{0}^{r})italic_J start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( bold_italic_θ , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ). As a next step, we express the smoothness prior as a probability distribution parameterized with the negative objective in (44) with

ps(𝜽,𝒒0r,𝒒˙0r)eJs(𝜽,𝒒0r,𝒒˙0r).proportional-tosubscript𝑝s𝜽superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟superscript𝑒subscript𝐽s𝜽superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟p_{\mathrm{s}}(\bm{\theta},\bm{q}_{0}^{r},\dot{\bm{q}}_{0}^{r})\propto e^{-J_{% \mathrm{s}}(\bm{\theta},\bm{q}_{0}^{r},\dot{\bm{q}}_{0}^{r})}.italic_p start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( bold_italic_θ , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) ∝ italic_e start_POSTSUPERSCRIPT - italic_J start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( bold_italic_θ , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) end_POSTSUPERSCRIPT . (45)

Interestingly, this results in a joint Gaussian distribution over the trajectory parameter and the initial conditions. We then compute a Gaussian smoothness prior on the trajectory parameter by conditioning on the initial conditions, i.e.

ps(𝜽|𝒒0r,𝒒˙0r)=𝒩(𝜽¯s,𝑹𝜽1),subscript𝑝sconditional𝜽superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟𝒩subscript¯𝜽ssuperscriptsubscript𝑹𝜽1p_{\mathrm{s}}(\bm{\theta}|\bm{q}_{0}^{r},\dot{\bm{q}}_{0}^{r})=\mathcal{N}(% \bar{\bm{\theta}}_{\mathrm{s}},\bm{R}_{\bm{\theta}}^{-1}),italic_p start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( bold_italic_θ | bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) = caligraphic_N ( over¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT , bold_italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) , (46)

with the precision matrix being computed with

𝑹𝜽=0T𝚽¨via(t)𝑹𝒒𝚽¨via(t)𝑑t.subscript𝑹𝜽superscriptsubscript0𝑇subscriptsuperscript¨𝚽topvia𝑡subscript𝑹𝒒subscript¨𝚽via𝑡differential-d𝑡\bm{R}_{\bm{\theta}}=\int_{0}^{T}\ddot{\bm{\Phi}}^{\scriptscriptstyle\top}_{% \mathrm{via}}(t)\bm{R}_{\bm{q}}\ddot{\bm{\Phi}}_{\mathrm{via}}(t)dt.bold_italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over¨ start_ARG bold_Φ end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT ( italic_t ) bold_italic_R start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT over¨ start_ARG bold_Φ end_ARG start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT ( italic_t ) italic_d italic_t . (47)

Please refer to the appendix for the derivation of (47) and for details on how to compute the smoothness prior mean 𝜽¯ssubscript¯𝜽s\bar{\bm{\theta}}_{\mathrm{s}}over¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT. Note that the smoothness precision matrix 𝑹𝜽subscript𝑹𝜽\bm{R}_{\bm{\theta}}bold_italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT can be computed offline as it does not depend on the trajectory parameter 𝜽𝜽\bm{\theta}bold_italic_θ.

5.2.5 Product of Gaussian Priors.

Refer to caption
Figure 8: BS-VP-STO optimizing the trajectory of a rectangular robot to push a circular object into a goal region subject to uncertain initial object location (represented by a particle-based belief representation) and uncertain contact dynamics. The sub-figures show the best performing candidate solution with the corresponding velocity profile, as well as the other trajectory samples in light gray, after 1, 10, 40 and 120 iterations from left to right.

Given the two priors for making contact and smooth trajectories respectively, we find the informed via-point distribution by fusing the two probabilistic priors via computing the normalized product of the two priors, such that

p(𝜽)=𝒩(𝜽¯,𝚺𝜽)pc(𝜽)ps(𝜽|𝒒0r,𝒒˙0r).𝑝𝜽𝒩¯𝜽subscript𝚺𝜽proportional-tosubscript𝑝c𝜽subscript𝑝sconditional𝜽superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟p(\bm{\theta})=\mathcal{N}(\bar{\bm{\theta}},\bm{\Sigma}_{\bm{\theta}})\propto p% _{\mathrm{c}}(\bm{\theta})p_{\mathrm{s}}(\bm{\theta}|\bm{q}_{0}^{r},\dot{\bm{q% }}_{0}^{r}).italic_p ( bold_italic_θ ) = caligraphic_N ( over¯ start_ARG bold_italic_θ end_ARG , bold_Σ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) ∝ italic_p start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT ( bold_italic_θ ) italic_p start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( bold_italic_θ | bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) . (48)

The product of two multivariate Gaussians is again a multivariate Gaussian, with the resulting parameters given by

𝚺𝜽=(𝑸𝜽+𝑹𝜽)1,subscript𝚺𝜽superscriptsubscript𝑸𝜽subscript𝑹𝜽1\bm{\Sigma}_{\bm{\theta}}=\left(\bm{Q}_{\bm{\theta}}+\bm{R}_{\bm{\theta}}% \right)^{-1},bold_Σ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT = ( bold_italic_Q start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT + bold_italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT , (49a)
𝜽¯=𝚺𝜽(𝑸𝜽𝜽¯c+𝑹𝜽𝜽¯s).¯𝜽subscript𝚺𝜽subscript𝑸𝜽subscript¯𝜽csubscript𝑹𝜽subscript¯𝜽s\bar{\bm{\theta}}=\bm{\Sigma}_{\bm{\theta}}(\bm{Q}_{\bm{\theta}}\bar{\bm{% \theta}}_{\mathrm{c}}+\bm{R}_{\bm{\theta}}\bar{\bm{\theta}}_{\mathrm{s}}).over¯ start_ARG bold_italic_θ end_ARG = bold_Σ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ( bold_italic_Q start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT over¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT roman_c end_POSTSUBSCRIPT + bold_italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT over¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ) . (49b)

Fig. 7 illustrates trajectories sampled from the product of the contact prior, parameterized by 𝑸𝒒subscript𝑸𝒒\bm{Q}_{\bm{q}}bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT, and the smoothness prior, parameterized by 𝑹𝒒subscript𝑹𝒒\bm{R}_{\bm{q}}bold_italic_R start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT. Within each sub-figure, all trajectories are drawn from a single Gaussian distribution. Note that all trajectories start and end with zero velocity.

5.2.6 Optimizing and Sampling in Latent Space.

Given the product of priors, we use the informed generative via-point distribution p(𝜽)𝑝𝜽p(\bm{\theta})italic_p ( bold_italic_θ ) as a probabilistic initial guess for optimizing robot trajectories with CMA-ES. Instead of directly sampling trajectory candidates 𝜽𝜽\bm{\theta}bold_italic_θ from an uninformed distribution, e.g. white noise, we sample and optimize for 𝜺Nndofr𝜺superscript𝑁superscriptsubscript𝑛dof𝑟\bm{\varepsilon}\in\mathbb{R}^{Nn_{\mathrm{dof}}^{r}}bold_italic_ε ∈ blackboard_R start_POSTSUPERSCRIPT italic_N italic_n start_POSTSUBSCRIPT roman_dof end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT. For a given 𝜺𝜺\bm{\varepsilon}bold_italic_ε, we compute 𝜽𝜽\bm{\theta}bold_italic_θ through an affine transformation as follows:

𝜽=𝜽¯+𝑳𝜽𝜺.𝜽¯𝜽subscript𝑳𝜽𝜺\bm{\theta}=\bar{\bm{\theta}}+\bm{L}_{\bm{\theta}}\bm{\varepsilon}.bold_italic_θ = over¯ start_ARG bold_italic_θ end_ARG + bold_italic_L start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT bold_italic_ε . (50)

Here, 𝑳𝜽subscript𝑳𝜽\bm{L}_{\bm{\theta}}bold_italic_L start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT is the Cholesky decomposition of the covariance matrix 𝚺𝜽subscript𝚺𝜽\bm{\Sigma}_{\bm{\theta}}bold_Σ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT. The parameters 𝜽¯¯𝜽\bar{\bm{\theta}}over¯ start_ARG bold_italic_θ end_ARG and 𝚺𝜽subscript𝚺𝜽\bm{\Sigma}_{\bm{\theta}}bold_Σ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT incorporate the prior as defined in (49). The idea of this additional transformation is to decouple the optimization variable 𝜺𝜺\bm{\varepsilon}bold_italic_ε from the particular prior. In each iteration m𝑚mitalic_m of BS-VP-STO, we obtain the new population of candidate solutions by sampling Ncandsubscript𝑁candN_{\mathrm{cand}}italic_N start_POSTSUBSCRIPT roman_cand end_POSTSUBSCRIPT robot trajectories via

𝜽𝒩(𝜽¯+𝑳𝜽𝜺¯m,𝑳𝜽𝚺m𝑳𝜽).similar-to𝜽𝒩¯𝜽subscript𝑳𝜽subscript¯𝜺𝑚subscript𝑳𝜽subscript𝚺𝑚superscriptsubscript𝑳𝜽top\bm{\theta}\sim\mathcal{N}\left(\bar{\bm{\theta}}+\bm{L}_{\bm{\theta}}\bar{\bm% {\varepsilon}}_{m},\bm{L}_{\bm{\theta}}\bm{\Sigma}_{m}\bm{L}_{\bm{\theta}}^{% \scriptscriptstyle\top}\right).bold_italic_θ ∼ caligraphic_N ( over¯ start_ARG bold_italic_θ end_ARG + bold_italic_L start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT over¯ start_ARG bold_italic_ε end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , bold_italic_L start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT bold_italic_L start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) . (51)

When initializing the CMA-ES distribution as white noise, i.e. 𝜺¯0=𝟎subscript¯𝜺00\bar{\bm{\varepsilon}}_{0}=\bm{0}over¯ start_ARG bold_italic_ε end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_0 and 𝚺0=𝑰subscript𝚺0𝑰\bm{\Sigma}_{0}=\bm{I}bold_Σ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_italic_I, we effectively sample the first population from the informed distribution in (49), as inserting the initial parameters into (51) yields

𝜽𝒩(𝜽¯,𝑳𝜽𝑳𝜽)=𝒩(𝜽¯,𝚺𝜽).similar-to𝜽𝒩¯𝜽subscript𝑳𝜽superscriptsubscript𝑳𝜽top𝒩¯𝜽subscript𝚺𝜽\bm{\theta}\sim\mathcal{N}\left(\bar{\bm{\theta}},\bm{L}_{\bm{\theta}}\bm{L}_{% \bm{\theta}}^{\scriptscriptstyle\top}\right)=\mathcal{N}\left(\bar{\bm{\theta}% },\bm{\Sigma}_{\bm{\theta}}\right).bold_italic_θ ∼ caligraphic_N ( over¯ start_ARG bold_italic_θ end_ARG , bold_italic_L start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT bold_italic_L start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) = caligraphic_N ( over¯ start_ARG bold_italic_θ end_ARG , bold_Σ start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT ) . (52)

Eventually, given a sampled trajectory parameter 𝜽𝜽\bm{\theta}bold_italic_θ, we find the control trajectory with respect to the system in (6) by discretizing the robot trajectory in (35), i.e.

𝒖k=𝒒r(t=Tk+1K).subscript𝒖𝑘superscript𝒒𝑟𝑡𝑇𝑘1𝐾\bm{u}_{k}=\bm{q}^{r}\left(t=T\cdot\frac{k+1}{K}\right).bold_italic_u start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_t = italic_T ⋅ divide start_ARG italic_k + 1 end_ARG start_ARG italic_K end_ARG ) . (53)

Note that due to the quasi-static model in (6), the dynamics can be rolled out with an arbitrary temporal resolution.

Example 2. (Single-horizon Robust 2D Object-Pushing). We showcase the BS-VP-STO pipeline over multiple iterations for a 2D object pushing example, illustrated in Fig. 8. The task for the robot, a rectangular geometry, is to push the object, a circular geometry, into a target region. We consider two sources of uncertainty in this example: a) The initial position of the object is uncertain, which is reflected by an initial belief; and b) the contact dynamics are uncertain. This task requires exploring contact modes that are robust to these uncertainties, as implicitly done by the presented planning algorithm.

After the first iteration, the best solution corresponds to the robot making contact with its long side. Note that this solution corresponds to the best candidate of the initial population sampled from the product of Gaussian priors, without any CMA-ES updates. Due to the probabilistic contact prior, almost all of the 30 initial candidates bring the robot into contact with the object, enabling an informative sampling of the cost landscape. After 10 iterations of BS-VP-STO, the algorithm found a solution making robust contact with the object while moving it slightly towards the target area. The solution after 40 iterations enables the robot to almost push the object into the target area. After 120 iterations, the algorithm found a solution for pushing the object robustly into the target area, while also optimizing the overall motion duration which is incorporated into ctasksubscript𝑐taskc_{\mathrm{task}}italic_c start_POSTSUBSCRIPT roman_task end_POSTSUBSCRIPT. The tight distribution of candidate solutions after 120 iterations indicates that CMA-ES has converged.

6 Receding-horizon BS-VP-STO

Planning a pushing maneuver over a single long horizon is challenging for two reasons: i) The dimensionality of the solution space of the optimization problem grows as the solution requires higher expressiveness. In the presented algorithm, this is reflected by an increasing number of via-points N𝑁Nitalic_N that parameterize the robot trajectory. ii) In BS-VP-STO, the belief is rolled out using the nominal object dynamics, i.e. without inducing noise. For robust candidate trajectories, the belief may collapse to a Dirac-delta distribution after kcollapsesubscript𝑘collapsek_{\mathrm{collapse}}italic_k start_POSTSUBSCRIPT roman_collapse end_POSTSUBSCRIPT steps, i.e. b~k=δ(𝒒ko),kkcollapseformulae-sequencesubscript~𝑏𝑘𝛿subscriptsuperscript𝒒𝑜𝑘for-all𝑘subscript𝑘collapse\tilde{b}_{k}=\delta(\bm{q}^{o}_{k}),\forall k\geq k_{\mathrm{collapse}}over~ start_ARG italic_b end_ARG start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_δ ( bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , ∀ italic_k ≥ italic_k start_POSTSUBSCRIPT roman_collapse end_POSTSUBSCRIPT, resulting in zero variance. In this case, the variance gain at those time steps is either γk=0subscript𝛾𝑘0\gamma_{k}=0italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 0 if the robot does not touch the object, or γk=1subscript𝛾𝑘1\gamma_{k}=1italic_γ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = 1 if the robot touches the object. Thus, the optimization is strongly biased towards not touching the object after kcollapsesubscript𝑘collapsek_{\mathrm{collapse}}italic_k start_POSTSUBSCRIPT roman_collapse end_POSTSUBSCRIPT steps.

Input: Robot configuration 𝒒0rsubscriptsuperscript𝒒𝑟0\bm{q}^{r}_{0}bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and velocity 𝒒˙0rsubscriptsuperscript˙𝒒𝑟0\dot{\bm{q}}^{r}_{0}over˙ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT,   object belief b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, receding horizon length H𝐻Hitalic_H.
Output: Robot trajectory 𝒖𝒖\bm{u}bold_italic_u.
1 𝒖𝒖\bm{u}\leftarrow\emptysetbold_italic_u ← ∅
2 while task not solved do
3       𝒒0:Kr,𝒒˙0:Krsubscriptsuperscript𝒒𝑟:0𝐾subscriptsuperscript˙𝒒𝑟:0𝐾absent\bm{q}^{r*}_{0:K},\dot{\bm{q}}^{r*}_{0:K}\leftarrowbold_italic_q start_POSTSUPERSCRIPT italic_r ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K end_POSTSUBSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K end_POSTSUBSCRIPT ← BS-VP-STO(𝒒0r,𝒒˙0r,b0)subscriptsuperscript𝒒𝑟0subscriptsuperscript˙𝒒𝑟0subscript𝑏0(\bm{q}^{r}_{0},\dot{\bm{q}}^{r}_{0},b_{0})( bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT )
4       𝒖0:H1𝒒1:Hrsubscriptsuperscript𝒖:0𝐻1subscriptsuperscript𝒒𝑟:1𝐻\bm{u}^{*}_{0:H-1}\leftarrow\bm{q}^{r*}_{1:H}bold_italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_H - 1 end_POSTSUBSCRIPT ← bold_italic_q start_POSTSUPERSCRIPT italic_r ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 1 : italic_H end_POSTSUBSCRIPT
5       for k0𝑘0k\leftarrow 0italic_k ← 0 to H1𝐻1H-1italic_H - 1 do
             // Stochastic rollout
6             𝒒k+1oip(|𝒒koi,𝒖k),i{1,2,..,Np}{}^{i}\bm{q}^{o}_{k+1}\sim p(\cdot|{}^{i}\bm{q}^{o}_{k},\bm{u}^{*}_{k}),\,% \forall i\in\{1,2,..,N_{p}\}start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k + 1 end_POSTSUBSCRIPT ∼ italic_p ( ⋅ | start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , bold_italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) , ∀ italic_i ∈ { 1 , 2 , . . , italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT }
7            
8       end for
9      𝒒0r,𝒒˙0r𝒒Hr,𝒒˙Hrformulae-sequencesubscriptsuperscript𝒒𝑟0subscriptsuperscript˙𝒒𝑟0subscriptsuperscript𝒒𝑟𝐻subscriptsuperscript˙𝒒𝑟𝐻\bm{q}^{r}_{0},\dot{\bm{q}}^{r}_{0}\leftarrow\bm{q}^{r*}_{H},\dot{\bm{q}}^{r*}% _{H}bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← bold_italic_q start_POSTSUPERSCRIPT italic_r ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT
10       b0{𝒒Hoi}i=1Npsubscript𝑏0superscriptsubscriptsuperscriptsubscriptsuperscript𝒒𝑜𝐻𝑖𝑖1subscript𝑁𝑝b_{0}\leftarrow\{{}^{i}\bm{q}^{o}_{H}\}_{i=1}^{N_{p}}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ← { start_FLOATSUPERSCRIPT italic_i end_FLOATSUPERSCRIPT bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT end_POSTSUPERSCRIPT
11       𝒖𝒖absent\bm{u}\leftarrowbold_italic_u ← concatenate(𝒖,𝒖0:H1)𝒖subscriptsuperscript𝒖:0𝐻1\left(\bm{u},\bm{u}^{*}_{0:H-1}\right)( bold_italic_u , bold_italic_u start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_H - 1 end_POSTSUBSCRIPT )
12      
13 end while
Algorithm 1 Receding-horizon BS-VP-STO

For these reasons, we propose receding-horizon BS-VP-STO, a planning scheme for pushing maneuvers over longer horizons. The scheme alternates between computing a robust push via BS-VP-STO and performing a stochastic rollout of the solution. This allows to plan pushing maneuvers over multiple shorter horizons, while optimizing for task progress, i.e. pushing the object towards the goal, and robustness over a single horizon.

Alg. 1 sketches the receding-horizon procedure for planning pushing maneuvers. Starting from the initial robot configuration 𝒒0rsubscriptsuperscript𝒒𝑟0\bm{q}^{r}_{0}bold_italic_q start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and velocity 𝒒˙0rsubscriptsuperscript˙𝒒𝑟0\dot{\bm{q}}^{r}_{0}over˙ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, a given initial belief b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and the number of time steps for a receding horizon H𝐻Hitalic_H, BS-VP-STO is used to generate a robust pushing trajectory 𝒒0:Krsubscriptsuperscript𝒒𝑟:0𝐾\bm{q}^{r*}_{0:K}bold_italic_q start_POSTSUPERSCRIPT italic_r ∗ end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 : italic_K end_POSTSUBSCRIPT. In order to update the belief for the subsequent receding-horizon, we perform a stochastic rollout of the robust pushing trajectory, i.e. sampling from the transition probability in (10). Note that the update of the belief can be extended by taking observations into account. For this, line 9 in Alg. 1 may be replaced by a Bayesian state estimation update, e.g. a particle filter (Arulampalam et al. (2002)), thus turning the offline planning algorithm into an online re-planning approach. When planning offline, the optimized pushing trajectories of the receding-horizons are sequenced to form a single continuous trajectory over a long horizon. This process may be repeated until a task-specific termination criterion is satisfied, e.g. the mean object configuration is within bounds of the target.

Refer to caption
Figure 9: Receding-horizon BS-VP-STO optimizing the trajectory of a rectangular robot to push a circular object into a goal region subject to an uncertain initial object location and uncertain contact dynamics. The sub-figures show the robot trajectory optimized over the corresponding receding-horizon together with a stochastic rollout of the object dynamics. In each receding-horizon, the algorithm optimizes for a tradeoff between pushing progress and robustness. The resulting pushing maneuver is more robust than the solution found when optimizing over a single horizon as in Fig. 8.

Example 3. (Receding-horizon Robust 2D Object-Pushing). In this example, we solve the same problem as in Example 2, while iteratively optimizing over multiple receding horizons instead of running BS-VP-STO only once over the full horizon. For this, we only adapt the task-specific cost to reflect the pushing progress towards the goal. We measure this progress by computing the distance dk=E[𝒒ko]𝒒deso2subscript𝑑𝑘subscriptnormEdelimited-[]subscriptsuperscript𝒒𝑜𝑘subscriptsuperscript𝒒𝑜des2d_{k}=||\mathrm{E}[\bm{q}^{o}_{k}]-\bm{q}^{o}_{\mathrm{des}}||_{2}italic_d start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = | | roman_E [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ] - bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT between the target area center 𝒒desosubscriptsuperscript𝒒𝑜des\bm{q}^{o}_{\mathrm{des}}bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT and the mean object position at that time step. We compare the distance at the end of the receding horizon with the distance at the beginning of it in order to make the cost invariant to absolute distance to the target. Progress is thus defined as d0dKsubscript𝑑0subscript𝑑𝐾d_{0}-d_{K}italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT. Consequently, the task-cost ctasksubscript𝑐taskc_{\mathrm{task}}italic_c start_POSTSUBSCRIPT roman_task end_POSTSUBSCRIPT is defined as follows

ctask(𝒖0:K1)=eλ(d0dK).subscript𝑐tasksubscript𝒖:0𝐾1superscript𝑒𝜆subscript𝑑0subscript𝑑𝐾c_{\mathrm{task}}(\bm{u}_{0:K-1})=e^{-\lambda\left(d_{0}-d_{K}\right)}.italic_c start_POSTSUBSCRIPT roman_task end_POSTSUBSCRIPT ( bold_italic_u start_POSTSUBSCRIPT 0 : italic_K - 1 end_POSTSUBSCRIPT ) = italic_e start_POSTSUPERSCRIPT - italic_λ ( italic_d start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_d start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT . (54)

Fig. 9 illustrates each solution of the optimization over multiple shorter horizons. It can be seen that a single-horizon push moves the object belief towards the target area with high probability. The overall pushing maneuver, obtained by sequencing the robot trajectories of the individual horizons, has no constraints on the number of parameters, i.e. the number of via-points, as the number of receding horizon operations is not fixed but rather tied to a goal check. It is therefore expected to be more expressive than a single-horizon solution and thus more robust.

7 Experiments: Robust Pushing

This section presents robot experiments validating the theory and algorithmic approach developed in this article. We use objects that the robot has never interacted with before, with the geometry of the objects being the only information available. In all experiments, we compare the performance of the proposed approach against a baseline that only uses the nominal model of the contact dynamics without considering uncertainties. For this, the planner assumes that the initial object position is known and it uses the nominal object dynamics from Sec. 4 as a deterministic dynamics model. Consequently, we run the baseline without the cost and constraints on the variance gain.

7.1 Implementation

Refer to caption
Figure 10: Open-loop single-hand pushing experiment: The task for the robot is to use the rectangular geometry of its hand (highlighted in green) to push the object with a circular geometry (highlighted in orange) into a target position without sensory feedback. The experiment consists of repeating the same pushing plan open-loop until the object diverged off the path such that the robot does not make contact anymore.

We implement the contact dynamics as in (6) for the special case of circular and rectangular shapes in two dimensions. In order to approximate the initial belief via particles, we found that Np=20subscript𝑁𝑝20N_{p}=20italic_N start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = 20 particles are sufficient to generate robust plans. For computing the CMA-ES updates to the Gaussian distribution over candidate trajectories, we use the Python package provided by the authors of Hansen (2016). We provide additional details on the cost design in the appendix. The planning algorithm was executed on a laptop with an Intel Core i9-14900HX CPU and 32GB of RAM. On average, generating a robust pushing plan for the full target path takes around seven seconds wall-clock time.

7.2 Open-Loop Single-Hand Pushing

The first experiment takes Example 3 from simulation into the real world. As an end-effector, the robot uses the rectangular-shaped hand of the Franka robot to push the target object, without considering the fingers for contacts. Fig. 10 illustrates the experimental setup showing the initial robot configuration, the initial object position and the target object position. We compare trajectories of 2D positions and yaw angles of the hand generated by our approach (receding-horizon BS-VP-STO) and a baseline approach. The baseline optimizes for efficient trajectories assuming that the nominal contact dynamics accurately predict the object trajectory. To evaluate the robustness of generated plans, we execute the plan to let the robot push the object into the target position and use the same plan for pushing the object back to the initial position. Subsequently, the same plan is executed repeatedly open-loop with the object located where the previous execution ended. We expect that uncertainties in the contact dynamics will lead to deviations from the nominal model, resulting in the accumulation of control errors. Hence, we measure robustness by counting the number of successive runs until the robot loses the object. For a video showing qualitative results, see Extension 1. In the following, we report quantitative results.

7.2.1 Deterministic Baseline

As the baseline does not account for uncertainties in the contact dynamics, the resulting optimal trajectory consists of pushing in a straight line towards the target position while kee** the long side of the hand perpendicular to the pushing direction. We execute 10 experiments with the optimal baseline plan, resulting in an average of 6.86.86.86.8 (min. 5555, max. 9999) successive runs until the robot loses the object. This is the result of deviations from the nominal model due to real-world perturbations such as imperfect friction surfaces and mass distributions, leading to an accumulation of errors in the object positions when pushing a circular object with a flat surface.

7.2.2 Receding-horizon BS-VP-STO

The proposed algorithm uses the same nominal model that was used in the baseline, while modeling additional uncertainty. The resulting optimal trajectory is executed in 10101010 open-loop experiments without additional perturbations to compare the result with the baseline. All experiments have been stopped after 40404040 successive runs as the system did not show any sign of accumulating errors. This indicates that the optimized trajectory actively controls the uncertainty in the object position by kee** track of the open-loop propagated belief.

7.3 Open-Loop Bi-manual Pushing

Refer to caption
Figure 11: Open-loop bi-manual pushing experiment: The two manipulators are considered as one bi-manual robot with two end-effectors, each equipped with a ball-shaped end-effector. The object (bottle, glass or can) is placed in front of the robot and the goal is to push the object along the target path (red circle). All objects were chosen due to their circular shape for the sake of a simple implementation of the quasi-static contact dynamics. The initial belief over the object position is Gaussian distributed with a mean equal to the initial object position and a covariance that reflects the uncertainty in the object detection. The contact dynamics are modeled probabilistically to reflect uncertainty. For the experiments with the can, we add additional perturbations by placing wooden cubes (red cubes) on the target path and by changing the center of mass to be off-center by placing a heavy tool in the can (yellow content of the can).

We choose to validate the proposed algorithm with a bi-manual pushing task, consisting of two Franka robot arms that are equipped with ball-shaped end-effectors. Note that we treat the two robot arms as one bi-manual robot. We plan 2D trajectories for the ball-shaped end-effectors in a plane parallel to the table. The dynamics are modeled in this 2D plane, where the two robots are abstracted as two independent circles. The objects are abstracted as circles as well. Fig. 11 shows the experimental setup. Initially, the object (bottle, glass or can) is placed in front the bi-manual robot and the goal for the robot is to push the object along a circular target path. The belief over object positions is initialized with a Gaussian distribution. We furthermore modeled the noise in the contact dynamics with a Gaussian distribution and we tuned the covariance to capture the stochasticity in the contact dynamics. To prevent collisions between the two end-effectors, we include a collision constraint enforcing that the two end-effectors do not touch. We also add a constraint that prevents the robot from crossing its arms. Note that the time, location and number of contacts is subject to planning, i.e. we do not impose any heuristic that forces the robot to use both end-effectors for pushing. Instead, the proposed cost and constraint on the variance gain drives the optimization algorithm to find stabilizing contact configurations and sequences, such as using both end-effectors for pushing.

7.3.1 Qualitative Planning Results.

Refer to caption
Figure 12: Qualitative comparison between the proposed receding-horizon BS-VP-STO algorithm and the deterministic baseline. We perform 1000 stochastic rollouts of each generated plan using the proposed stochastic object dynamics. The left and right end-effectors are visualized with black and blue circles, respectively. The target path is depicted in light-green. a) For testing the baseline plan, the object position is initialized with the expected position (orange circle). For the proposed approach, we initialize the object position according to the modeled uncertainty. b) Plans generated for an object with 5 cm radius, i.e. the bottle and the can. c) Plans generated for an object with 3 cm radius, i.e. the glass.

We illustrate plans generated with the deterministic baseline and with the proposed receding-horizon BS-VP-STO algorithm in Fig. 12. Each plan is presented with 1000 stochastic rollouts of the object dynamics used for optimization. For an evaluation of the real-world performance of the plans, please refer to Sec. 7.3.3.

Deterministic Baseline. In all plans generated with the deterministic baseline, the robot uses only one end-effector at a time for pushing the object. This is not surprising as the pushing progress, i.e. the mean control problem, is equally optimized when using one or two end-effectors. Thus, the baseline is not forced to discover the coordination between the two end-effectors and converges to the simpler solution, i.e. using one end-effector. When performing open-loop stochastic rollouts of the baseline plans, the object deviates from the planned trajectory after some time and thus the robot is not able to successfully push the object along the whole target path.

Receding-horizon BS-VP-STO. In contrast, we observe that the robot uses both end-effectors to push the object along the target path when planning with the receding-horizon BS-VP-STO algorithm. Note that the strategy of using two end-effectors for pushing deliberately emerges from planning in belief space. The proposed algorithm discovers the use of two end-effectors by penalizing sampled robot trajectories that result in an increasing uncertainty about the object position, i.e. using one end-effector. At the beginning of the pushing maneuver, the robot performs an action that reduces uncertainty by placing its end-effectors such that they enclose the initial belief. This effectively brings the particles closer together, resulting in a decreasing variance. The robot then starts to make contact with its two end-effectors side-by-side to push the object along the target path with high probability. After pushing the object along the first half of the circular target path, the no-collision constraint between the two end-effectors forces the robot to break the contact and to find a new contact configuration with which it can continue pushing. Consequently, the robot has to move its left end-effector around the object without touching it. Afterwards, the robot makes contact again with the object and continues pushing until it reaches the initial position again. When performing open-loop stochastic rollouts of the robust plans, the robot has a high probability of being successful at pushing the object along the target path despite the perturbations and the lack of feedback.

7.3.2 Quantitative Planning Results & Ablation Studies.

Refer to caption
Figure 13: Success rates of receding-horizon BS-VP-STO over the number of BS-VP-STO iterations M𝑀Mitalic_M. We compare the performance of the planning algorithm with the contact prior 𝑸𝒒>𝟎subscript𝑸𝒒0\bm{Q}_{\bm{q}}>\bm{0}bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT > bold_0 and without 𝑸𝒒=𝟎subscript𝑸𝒒0\bm{Q}_{\bm{q}}=\bm{0}bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT = bold_0; and we evaluate the scalability of the planning algorithm to many degrees of freedom (14 DoF). A planning run is considered successful if a valid solution is found within 500 single-horizon optimizations.

In the following, we present ablation studies of the different components of receding-horizon BS-VP-STO. We evaluate i) the dependence of the overall algorithmic performance on the number of iterations taken in each BS-VP-STO instance within the receding-horizon setting, ii) the relevance of a contact-prior compared to an uninformed proposal distribution, and iii) how the receding-horizon BS-VP-STO algorithm scales with the number of degrees of freedom of the robot. We summarize all findings in Fig. 13 which evaluates the success rate of the receding-horizon BS-VP-STO algorithm, i.e. if the planning algorithm finds a valid solution for the robot pushing an object with 5 cm radius. A plan is considered successful if the mean of the object belief is within 1 cm tolerance to the target location and if the plan is valid with respect to the robustness constraints on the variance gain. The planning algorithm is aborted after 500 iterations of receding-horizon BS-VP-STO and the plan is considered a failure. Sub-figure b) on the right-hand side in Fig. 12 illustrates a successful plan for the problem considered for the ablations. We instantiate the planning problem with a varying number M𝑀Mitalic_M of iterations for one BS-VP-STO instance within the receding-horizon scheme. For each M𝑀Mitalic_M, we run receding-horizon BS-VP-STO 50 times and measure the success rate of the planning algorithm.

Refer to caption
Figure 14: Snapshots of the robot behavior synthesized with the receding-horizon BS-VP-STO algorithm and executed on the real bi-manual system. Each snapshot shows an overlay of five experiments that were conducted with different initial positions of the bottle. The robot successfully pushed the bottle along the target path in all five experiments. At the beginning, i.e. in the first image, the robot encloses the initial belief with its two end-effectors such that robust pushing is possible. After moving along the first half of the circular target path, i.e. in the third image, the robot re-positions its two end-effectors to continue pushing the bottle along the target path while avoiding collisions between the two arms.

Impact of the Number of BS-VP-STO Iterations. Fig. 13 illustrates the statistics of success over the number of BS-VP-STO iterations. We observe that the success rate increases with the number of iterations M𝑀Mitalic_M and reaches around 100%percent100100\%100 % success rate with M=4𝑀4M=4italic_M = 4 BS-VP-STO iterations. Note that running BS-VP-STO for one iteration poses a special case of the algorithm since no CMA-ES update is performed, resulting in only sampling an initial candidate population and picking the best performing candidate. This procedure in fact corresponds to the predictive sampling algorithm introduced in Howell et al. (2022).

The planning time increases linearly with the number of iterations. For reference, performing one BS-VP-STO iteration for four degrees of freedom takes approximately 0.01 seconds wall-clock time. In this setup, we execute the pushing plan for an execution horizon H𝐻Hitalic_H that corresponds to 0.2 seconds. Thus, it would be possible to run the planning algorithm in an online receding horizon fashion with up to M=20𝑀20M=20italic_M = 20 BS-VP-STO iterations per receding-horizon.

Impact of the Contact Prior. To evaluate the impact of the contact prior, we set the contact precision matrix in BS-VP-STO to 𝑸𝒒=𝟎subscript𝑸𝒒0\bm{Q}_{\bm{q}}=\bm{0}bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT = bold_0 (cf. (39)). This corresponds to uninformed trajectory samples as depicted in the left sub-figure of Fig. 6. In Fig. 13 we observe that the success rate of the algorithm without the contact prior is significantly lower than the success rate of the algorithm with the contact prior. This indicates that the contact prior is improving the efficiency of the planning algorithm to find robust manipulation actions in few iterations. Computing the contact prior requires a matrix inversion to compute 𝑸𝒒subscript𝑸𝒒\bm{Q}_{\bm{q}}bold_italic_Q start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT with dimensionality equivalent to the number of degrees of freedom. Since this operation has to be performed only once for a single-horizon, the computational overhead of the contact prior is negligible.

Scalability to Many Degrees of Freedom. Last, we evaluate the scalability of the proposed planning algorithm to many degrees of freedom by moving from planning in the 2D plane to planning in the joint space of the bi-manual robot, which has 14 degrees of freedom (seven degrees of freedom per robot arm). Coordinating all degrees of freedom adds complexity to the planning problem, while at the same time increasing the dimensionality of the search space. In Fig. 13 we observe that the success rate for a given number of iterations M𝑀Mitalic_M drops when increasing the complexity of the problem, requiring more iterations to discover the required coordination between the 14 joints. While the contact prior in joint space (cf. Sec. 5.2.3) imposes coordination between the seven joints of the individual arms, BS-VP-STO is still required to find joint trajectories such that the two end-effectors touch the object at the right place at the right time. The planning time per iteration is not significantly increased compared to planning for four degrees of freedom, since only an additional forward kinematics computation for the belief rollout is required. However, note that the contact dynamics are still modeled using the ball-shaped abstraction of the end-effector. Modeling the whole kinematic chain of the robot arms as contact geometries adds additional computational complexity to the planner.

7.3.3 Real-world Pushing Results

Fig. 14 shows snapshots of the robot behavior planned with the receding-horizon BS-VP-STO algorithm. In addition, a full video of the experiment can be found in Extension 2. We executed the planned robot trajectories for each of the three objects (bottle, glass, can), where we conducted five experiments for each object by placing the object at different initial positions. Out of 15 executed plans, the robot successfully pushed the object along the target path in 14 experiments. The robot failed to push the glass in one experiment, where the object was too far away from the mean initial position such the robot did not enclose the object during the initial uncertainty-reducing action.

We furthermore evaluated the deterministic baseline planner with the same three objects, while placing the objects only at the expected location. In all three experiments, the actual object position deviated from the planned object position after a few pushes, resulting in the robot loosing contact with the object and thus failing to push the object along the target path. We show a video example of the motion generated by the baseline in Extension 2.

7.4 Closed-Loop Bi-manual Pushing

Refer to caption
Refer to caption
Figure 15: Closed-loop bi-manual pushing experiment: The left image shows the external perturbations applied by manually moving the object. A camera continuously provides noisy observations of the object position to the closed-loop controller. The right image shows how the proposed approach is able to generate robust plans in real-time, enabling uncertainty-aware model-predictive control loops.

Last, we show that receding-horizon BS-VP-STO can be used in a closed control loop, gaining additional robustness against out-of-distribution disturbances. We use the same bi-manual robot setup as in Sec. 7.3. However, instead of pushing an object along a target path, the task is to push the object to the center of the table while its position is perturbed by a human. In addition, noisy observations of the object position are provided by a camera for closing the loop. We deploy a particle filter for continuously updating the belief based on both the stochastic rollout and noisy observations of the object as described in Sec. 6. We qualitatively compare the resulting behavior of the robot when being controlled with receding-horizon BS-VP-STO against the behavior when using the deterministic nominal model in a model-predictive controller. Both controllers run at a control rate of 5555 Hz with M=6𝑀6M=6italic_M = 6 BS-VP-STO iterations per control step. Fig. 15 illustrates the external perturbations applied to the object and the resulting robot behavior when controlled with our proposed robust approach. A video of the resulting behavior of both control approaches can be found in Extension 3.

We observe that the robot robustly pushes the object back to the center of the table using both end-effectors when controlled with the proposed approach. The additional state estimation enables the robot to also react to out-of-distribution perturbations, re-generating robust plans given noisy measurement updates. For the deterministic baseline, we observe that the continuous feedback enables the robot to maintain contact with the object and to generate consistent pushes. However, we observe that the robot only uses one end-effector for pushing as also observed for the open-loop experiment in Sec. 7.3. In contrast to our approach, this leads to larger control errors during pushing that need to be corrected, resulting in the deterministic baseline taking more time to accurately bring the object back to the center of the table.

8 Discussion

In this article, we investigated the problem of planning robust manipulation actions subject to stochastic contact dynamics. The quasi-static model used to predict the contact dynamics is a simplification of the real-world contact dynamics tailored to the particular problem of slow pushing. In particular, reducing the dynamics from joint robot-object dynamics to solely object dynamics enables efficient reasoning over belief dynamics. However, the provided model excludes other categories of manipulation tasks involving effects such as gras** objects. We leave it to future work to investigate how other manipulation dynamics can be reduced to object-only dynamics.

Furthermore, we have shown that informed prior distributions for sampling candidate actions is beneficial, if not necessary, for sampling-based optimization for contact-rich manipulation. We used the product of Gaussian priors to bias the sampling towards smooth and contact-making trajectories. As the evolutionary optimization algorithm CMA-ES is based on sampling from and iterating on Gaussian distributions, we incorporated our Gaussian-distributed prior to initialize CMA-ES. Yet, we see great potential in the use of non-Gaussian priors that are further optimized with a stochastic optimization algorithm such as BS-VP-STO. Especially when the system has many degrees of freedom such as for two robot arms or articulated robotic hands, sampling from a proposal distribution that captures possibly non-Gaussian correlations between the degrees of freedom, e.g. correlations between fingers, is expected to be a key to scalable, real-time control through contacts.

Last, we show that a manipulation task such as pushing, which is typically approached with high-bandwidth closed-loop control, can also be stabilized by planning appropriate open-loop actions that deliberately optimize for robustness. However, if the robot is not able to anticipate perturbations or the statistical properties of the perturbations, feedback is required to be able to stabilize the manipulation. We thus show that the developed approach can be used in a closed-loop controller by integrating the open-loop planning algorithm in a receding-horizon control scheme and by feeding back observations of the object configuration. We furthermore show that a model-predictive controller based on receding-horizon BS-VP-STO even outperforms a controller based on a deterministic model for pushing tasks.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: JJ was supported by the Swiss National Science Foundation (SNSF) through the CODIMAN project. LB was supported by an Amazon Web Services Lighthouse scholarship. NH received EPSRC funding via the “From Sensing to Collaboration” programme grant [EP/V000748/1].

References

  • Andrychowicz et al. (2020) Andrychowicz OM, Baker B, Chociej M, Józefowicz R, McGrew B, Pachocki J, Petron A, Plappert M, Powell G, Ray A, Schneider J, Sidor S, Tobin J, Welinder P, Weng L and Zaremba W (2020) Learning dexterous in-hand manipulation. The International Journal of Robotics Research 39(1): 3–20. 10.1177/0278364919887447. URL https://doi.org/10.1177/0278364919887447.
  • Arulampalam et al. (2002) Arulampalam MS, Maskell S, Gordon N and Clapp T (2002) A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Transactions on signal processing 50(2): 174–188.
  • Aydinoglu and Posa (2022) Aydinoglu A and Posa M (2022) Real-time multi-contact model predictive control via admm. In: International Conference on Robotics and Automation (ICRA). p. 3414–3421.
  • Aydinoglu et al. (2022) Aydinoglu A, Sieg P, Preciado VM and Posa M (2022) Stabilization of complementarity systems via contact-aware controllers. IEEE Transactions on Robotics 38(3): 1735–1754. 10.1109/TRO.2021.3120931.
  • Bhardwaj et al. (2022) Bhardwaj M, Sundaralingam B, Mousavian A, Ratliff N, Fox D, Ramos F and Boots B (2022) STORM: An Integrated Framework for Fast Joint-Space Model-Predictive Control for Reactive Manipulation. In: Conference on Robot Learning (CoRL). pp. 750–759.
  • Chen et al. (2021) Chen C, Culbertson P, Lepert M, Schwager M and Bohg J (2021) Trajectotree: Trajectory optimization meets tree search for planning multi-contact dexterous manipulation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 8262–8268. 10.1109/IROS51168.2021.9636346.
  • Cheng et al. (2021) Cheng X, Huang E, Hou Y and Mason MT (2021) Contact mode guided sampling-based planning for quasistatic dexterous manipulation in 2d. In: 2021 IEEE International Conference on Robotics and Automation (ICRA). pp. 6520–6526. 10.1109/ICRA48506.2021.9560766.
  • Cleac’h et al. (2021) Cleac’h SL, Howell T, Yang S, Lee CY, Zhang J, Bishop A, Schwager M and Manchester Z (2021) Fast contact-implicit model-predictive control. URL https://arxiv.longhoe.net/abs/2107.05616.
  • Cover and Thomas (2005) Cover TM and Thomas JA (2005) Differential Entropy, chapter 8. John Wiley & Sons, Ltd. ISBN 9780471748823, pp. 243–259. https://doi.org/10.1002/047174882X.ch8. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/047174882X.ch8.
  • Dogar and Srinivasa (2011) Dogar M and Srinivasa S (2011) A framework for push-gras** in clutter. In: Hugh Durrant-Whyte NR and Abbeel P (eds.) Proceedings of Robotics: Science and Systems (RSS ’11). MIT Press, pp. 65–73.
  • Erdmann and Mason (1988) Erdmann M and Mason M (1988) An exploration of sensorless manipulation. IEEE Journal on Robotics and Automation 4(4): 369–379. 10.1109/56.800.
  • Ha et al. (2020) Ha JS, Driess D and Toussaint M (2020) A probabilistic framework for constrained manipulations and task and motion planning under uncertainty. In: 2020 IEEE International Conference on Robotics and Automation (ICRA). pp. 6745–6751. 10.1109/ICRA40945.2020.9196840.
  • Haarnoja et al. (2019) Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan J, Kumar V, Zhu H, Gupta A, Abbeel P and Levine S (2019) Soft actor-critic algorithms and applications.
  • Handa et al. (2023) Handa A, Allshire A, Makoviychuk V, Petrenko A, Singh R, Liu J, Makoviichuk D, Van Wyk K, Zhurkevich A, Sundaralingam B and Narang Y (2023) Dextreme: Transfer of agile in-hand manipulation from simulation to reality. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). pp. 5977–5984. 10.1109/ICRA48891.2023.10160216.
  • Hansen (2016) Hansen N (2016) The CMA evolution strategy: A tutorial. arXiv preprint arXiv:1604.00772 .
  • Hogan and Rodriguez (2020) Hogan FR and Rodriguez A (2020) Reactive planar non-prehensile manipulation with hybrid model predictive control. The International Journal of Robotics Research 39(7): 755–773.
  • Hogan (1984) Hogan N (1984) Impedance control: An approach to manipulation. In: 1984 American Control Conference. pp. 304–313. 10.23919/ACC.1984.4788393.
  • Howell et al. (2022) Howell T, Gileadi N, Tunyasuvunakool S, Zakka K, Erez T and Tassa Y (2022) Predictive sampling: Real-time behaviour synthesis with mujoco. arXiv preprint arXiv:2212.00541 .
  • Jankowski et al. (2023) Jankowski J, Brudermüller L, Hawes N and Calinon S (2023) VP-STO: Via-point-based stochastic trajectory optimization for reactive robot behavior. International Conference on Robotics and Automation (ICRA) .
  • Jankowski et al. (2022) Jankowski J, Racca M and Calinon S (2022) From Key Positions to Optimal Basis Functions for Probabilistic Adaptive Control. IEEE Robotics and Automation Letters 7(2): 3242–3249.
  • Kappen (2015) Kappen HJ (2015) Adaptive importance sampling for control and inference. Journal of Statistical Physics 162: 1244–1266. URL https://api.semanticscholar.org/CorpusID:6382641.
  • Koval et al. (2016) Koval MC, Pollard NS and Srinivasa SS (2016) Pre- and post-contact policy decomposition for planar contact manipulation under uncertainty. The International Journal of Robotics Research 35(1-3): 244–264. 10.1177/0278364915594474. URL https://doi.org/10.1177/0278364915594474.
  • Lynch and Mason (1995) Lynch KM and Mason MT (1995) Stable pushing: Mechanics, controllability, and planning. The International Journal of Robotics Research 15: 533 – 556. URL https://api.semanticscholar.org/CorpusID:13041894.
  • Marcucci et al. (2017) Marcucci T, Deits R, Gabiccini M, Bicchi A and Tedrake R (2017) Approximate hybrid model predictive control for multi-contact push recovery in complex environments. In: 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids). pp. 31–38. 10.1109/HUMANOIDS.2017.8239534.
  • Mason (2001) Mason MT (2001) Mechanics of Robotic Manipulation. MIT Press.
  • Migimatsu and Bohg (2020) Migimatsu T and Bohg J (2020) Object-centric task and motion planning in dynamic environments. IEEE Robotics and Automation Letters 5(2): 844–851. 10.1109/LRA.2020.2965875.
  • Muratore et al. (2022) Muratore F, Ramos F, Turk G, Yu W, Gienger M and Peters J (2022) Robot learning from randomized simulations: A review. Frontiers in Robotics and AI 9. 10.3389/frobt.2022.799893. URL https://www.frontiersin.org/articles/10.3389/frobt.2022.799893.
  • Okamoto et al. (2018) Okamoto K, Goldshtein M and Tsiotras P (2018) Optimal covariance control for stochastic systems under chance constraints. IEEE Control Systems Letters 2(2): 266–271. 10.1109/LCSYS.2018.2826038.
  • Pang (2021) Pang T (2021) A convex quasistatic time-step** scheme for rigid multibody systems with contact and friction. 2021 IEEE International Conference on Robotics and Automation (ICRA) : 6614–6620URL https://api.semanticscholar.org/CorpusID:228095227.
  • Pang et al. (2023) Pang T, Suh HJT, Yang L and Tedrake R (2023) Global planning for contact-rich manipulation via local smoothing of quasi-dynamic contact models. IEEE Transactions on Robotics 39(6): 4691–4711. 10.1109/TRO.2023.3300230.
  • Rodriguez (2021) Rodriguez A (2021) The unstable queen: Uncertainty, mechanics, and tactile feedback. Science Robotics 6(54): eabi4667. 10.1126/scirobotics.abi4667. URL https://www.science.org/doi/abs/10.1126/scirobotics.abi4667.
  • Schulman et al. (2015) Schulman J, Levine S, Abbeel P, Jordan M and Moritz P (2015) Trust region policy optimization. In: Bach F and Blei D (eds.) Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, volume 37. Lille, France: PMLR, pp. 1889–1897. URL https://proceedings.mlr.press/v37/schulman15.html.
  • Schulman et al. (2017) Schulman J, Wolski F, Dhariwal P, Radford A and Klimov O (2017) Proximal policy optimization algorithms.
  • Shirai et al. (2023) Shirai Y, Jha DK and Raghunathan AU (2023) Covariance steering for uncertain contact-rich systems. In: 2023 IEEE International Conference on Robotics and Automation (ICRA). pp. 7923–7929. 10.1109/ICRA48891.2023.10160249.
  • Todorov et al. (2012) Todorov E, Erez T and Tassa Y (2012) Mujoco: A physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. pp. 5026–5033. 10.1109/IROS.2012.6386109.
  • Toussaint et al. (2020) Toussaint M, Ha JS and Driess D (2020) Describing physics for physical reasoning: Force-based sequential manipulation planning. IEEE Robotics and Automation Letters 5(4): 6209–6216. 10.1109/LRA.2020.3010462.
  • Toussaint et al. (2022) Toussaint M, Harris J, Ha JS, Driess D and Hönig W (2022) Sequence-of-constraints mpc: Reactive timing-optimal control of sequential manipulation. 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) : 13753–13760URL https://api.semanticscholar.org/CorpusID:247362576.

Appendix

Separation of the Expected Cost

Suppose that the state 𝒙nx𝒙superscriptsubscript𝑛𝑥\bm{x}\in\mathbb{R}^{n_{x}}bold_italic_x ∈ blackboard_R start_POSTSUPERSCRIPT italic_n start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT end_POSTSUPERSCRIPT that is to be controlled is a random variable that is distributed with 𝒙psimilar-to𝒙𝑝\bm{x}\sim pbold_italic_x ∼ italic_p. Furthermore suppose that the task is described by a quadratic cost of the state and a desired state 𝒙dessubscript𝒙des\bm{x}_{\mathrm{des}}bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT, i.e.

Jdet(𝒙)=(𝒙𝒙des)(𝒙𝒙des).subscript𝐽det𝒙superscript𝒙subscript𝒙destop𝒙subscript𝒙desJ_{\mathrm{det}}(\bm{x})=\left(\bm{x}-\bm{x}_{\mathrm{des}}\right)^{% \scriptscriptstyle\top}\left(\bm{x}-\bm{x}_{\mathrm{des}}\right).italic_J start_POSTSUBSCRIPT roman_det end_POSTSUBSCRIPT ( bold_italic_x ) = ( bold_italic_x - bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_x - bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT ) . (55)

The corresponding stochastic optimal control objective is given by the expectation of the quadratic cost of the state:

Jsto=Ep[Jdet(𝒙)]=Ep[(𝒙𝒙des)(𝒙𝒙des)]=Ep[𝒙𝒙]2Ep[𝒙]𝒙des+𝒙des𝒙des.subscript𝐽stosubscriptE𝑝delimited-[]subscript𝐽det𝒙subscriptE𝑝delimited-[]superscript𝒙subscript𝒙destop𝒙subscript𝒙dessubscriptE𝑝delimited-[]superscript𝒙top𝒙2subscriptE𝑝superscriptdelimited-[]𝒙topsubscript𝒙dessuperscriptsubscript𝒙destopsubscript𝒙des\displaystyle\begin{split}J_{\mathrm{sto}}&=\mathrm{E}_{p}\left[J_{\mathrm{det% }}(\bm{x})\right]\\ &=\mathrm{E}_{p}\left[\left(\bm{x}-\bm{x}_{\mathrm{des}}\right)^{% \scriptscriptstyle\top}\left(\bm{x}-\bm{x}_{\mathrm{des}}\right)\right]\\ &=\mathrm{E}_{p}\left[\bm{x}^{\scriptscriptstyle\top}\bm{x}\right]-2\mathrm{E}% _{p}\left[\bm{x}\right]^{\scriptscriptstyle\top}\bm{x}_{\mathrm{des}}+\bm{x}_{% \mathrm{des}}^{\scriptscriptstyle\top}\bm{x}_{\mathrm{des}}.\end{split}start_ROW start_CELL italic_J start_POSTSUBSCRIPT roman_sto end_POSTSUBSCRIPT end_CELL start_CELL = roman_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ italic_J start_POSTSUBSCRIPT roman_det end_POSTSUBSCRIPT ( bold_italic_x ) ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ ( bold_italic_x - bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_x - bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT ) ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x ] - 2 roman_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT + bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT . end_CELL end_ROW (56)

In the following, we denote the mean of the state with

𝒙¯=Ep[𝒙].¯𝒙subscriptE𝑝delimited-[]𝒙\bar{\bm{x}}=\mathrm{E}_{p}\left[\bm{x}\right].over¯ start_ARG bold_italic_x end_ARG = roman_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x ] . (57)

Furthermore, the variance of the state is defined as the expectation of the squared deviations from the mean:

Vp[𝒙]=Ep[(𝒙𝒙¯)(𝒙𝒙¯)],=Ep[𝒙𝒙+𝒙¯𝒙¯2𝒙¯𝒙],=Ep[𝒙𝒙]+𝒙¯𝒙¯2𝒙¯Ep[𝒙],=Ep[𝒙𝒙]𝒙¯𝒙¯.\displaystyle\begin{split}\mathrm{V}_{p}\left[\bm{x}\right]&=\mathrm{E}_{p}% \left[(\bm{x}-\bar{\bm{x}})^{\scriptscriptstyle\top}(\bm{x}-\bar{\bm{x}})% \right],\\ &=\mathrm{E}_{p}\left[\bm{x}^{\scriptscriptstyle\top}\bm{x}+\bar{\bm{x}}^{% \scriptscriptstyle\top}\bar{\bm{x}}-2\bar{\bm{x}}^{\scriptscriptstyle\top}\bm{% x}\right],\\ &=\mathrm{E}_{p}\left[\bm{x}^{\scriptscriptstyle\top}\bm{x}\right]+\bar{\bm{x}% }^{\scriptscriptstyle\top}\bar{\bm{x}}-2\bar{\bm{x}}^{\scriptscriptstyle\top}% \mathrm{E}_{p}\left[\bm{x}\right],\\ &=\mathrm{E}_{p}\left[\bm{x}^{\scriptscriptstyle\top}\bm{x}\right]-\bar{\bm{x}% }^{\scriptscriptstyle\top}\bar{\bm{x}}.\end{split}start_ROW start_CELL roman_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x ] end_CELL start_CELL = roman_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ ( bold_italic_x - over¯ start_ARG bold_italic_x end_ARG ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( bold_italic_x - over¯ start_ARG bold_italic_x end_ARG ) ] , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x + over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_x end_ARG - 2 over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x ] , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x ] + over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_x end_ARG - 2 over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT roman_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x ] , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = roman_E start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x ] - over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_x end_ARG . end_CELL end_ROW (58)

As a result, the stochastic optimal control objective can be rewritten in terms of the deterministic quadratic cost of the mean state and the variance of the state with respect to its probability distribution p𝑝pitalic_p:

Jsto=𝒙¯𝒙¯2𝒙¯𝒙des+𝒙des𝒙des+Vp[𝒙]=(𝒙¯𝒙des)(𝒙¯𝒙des)+Vp[𝒙]=Jdet(𝒙¯)+Vp[𝒙].subscript𝐽stosuperscript¯𝒙top¯𝒙2superscript¯𝒙topsubscript𝒙dessuperscriptsubscript𝒙destopsubscript𝒙dessubscriptV𝑝delimited-[]𝒙superscript¯𝒙subscript𝒙destop¯𝒙subscript𝒙dessubscriptV𝑝delimited-[]𝒙subscript𝐽det¯𝒙subscriptV𝑝delimited-[]𝒙\displaystyle\begin{split}J_{\mathrm{sto}}&=\bar{\bm{x}}^{\scriptscriptstyle% \top}\bar{\bm{x}}-2\bar{\bm{x}}^{\scriptscriptstyle\top}\bm{x}_{\mathrm{des}}+% \bm{x}_{\mathrm{des}}^{\scriptscriptstyle\top}\bm{x}_{\mathrm{des}}+\mathrm{V}% _{p}\left[\bm{x}\right]\\ &=\left(\bar{\bm{x}}-\bm{x}_{\mathrm{des}}\right)^{\scriptscriptstyle\top}% \left(\bar{\bm{x}}-\bm{x}_{\mathrm{des}}\right)+\mathrm{V}_{p}\left[\bm{x}% \right]\\ &=J_{\mathrm{det}}(\bar{\bm{x}})+\mathrm{V}_{p}\left[\bm{x}\right].\end{split}start_ROW start_CELL italic_J start_POSTSUBSCRIPT roman_sto end_POSTSUBSCRIPT end_CELL start_CELL = over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¯ start_ARG bold_italic_x end_ARG - 2 over¯ start_ARG bold_italic_x end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT + bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT + roman_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = ( over¯ start_ARG bold_italic_x end_ARG - bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ( over¯ start_ARG bold_italic_x end_ARG - bold_italic_x start_POSTSUBSCRIPT roman_des end_POSTSUBSCRIPT ) + roman_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x ] end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = italic_J start_POSTSUBSCRIPT roman_det end_POSTSUBSCRIPT ( over¯ start_ARG bold_italic_x end_ARG ) + roman_V start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT [ bold_italic_x ] . end_CELL end_ROW (59)

Smoothness Prior

The goal of this section is to prove that ps(𝒒via)subscript𝑝ssubscript𝒒viap_{\mathrm{s}}(\bm{q}_{\mathrm{via}})italic_p start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( bold_italic_q start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT ) is the normalized probability density function of the unnormalized function eJs(𝒒via)superscript𝑒subscript𝐽ssubscript𝒒viae^{-J_{\mathrm{s}}(\bm{q}_{\mathrm{via}})}italic_e start_POSTSUPERSCRIPT - italic_J start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( bold_italic_q start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT. Jssubscript𝐽sJ_{\mathrm{s}}italic_J start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT is defined with

Js=120T𝒒¨r(t)𝑹𝒒𝒒¨r(t)𝑑t.subscript𝐽s12superscriptsubscript0𝑇superscript¨𝒒limit-from𝑟top𝑡subscript𝑹𝒒superscript¨𝒒𝑟𝑡differential-d𝑡J_{\mathrm{s}}=\frac{1}{2}\int_{0}^{T}\ddot{\bm{q}}^{r{\scriptscriptstyle\top}% }(t)\bm{R}_{\bm{q}}\ddot{\bm{q}}^{r}(t)dt.italic_J start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over¨ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r ⊤ end_POSTSUPERSCRIPT ( italic_t ) bold_italic_R start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT over¨ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_t ) italic_d italic_t . (60)

The acceleration is an affine function of the trajectory parameter, i.e.

𝒒¨r(t)=𝚽¨via(t)𝜽+ϕ¨0(t,𝒒0r,𝒒˙0r).superscript¨𝒒𝑟𝑡subscript¨𝚽via𝑡𝜽subscript¨bold-italic-ϕ0𝑡superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟\ddot{\bm{q}}^{r}(t)=\ddot{\bm{\Phi}}_{\mathrm{via}}(t)\bm{\theta}{+}\ddot{\bm% {\phi}}_{0}(t,\bm{q}_{0}^{r},\dot{\bm{q}}_{0}^{r}).over¨ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_t ) = over¨ start_ARG bold_Φ end_ARG start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT ( italic_t ) bold_italic_θ + over¨ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) . (61)

Now, suppose that the basis offset is computed as follows:

ϕ¨0(t,𝒒0r,𝒒˙0r)=𝚽¨𝒒0(t)𝒒0r+𝚽¨𝒒˙0(t)𝒒˙0r.subscript¨bold-italic-ϕ0𝑡superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟subscript¨𝚽subscript𝒒0𝑡superscriptsubscript𝒒0𝑟subscript¨𝚽subscript˙𝒒0𝑡superscriptsubscript˙𝒒0𝑟\ddot{\bm{\phi}}_{0}(t,\bm{q}_{0}^{r},\dot{\bm{q}}_{0}^{r})=\ddot{\bm{\Phi}}_{% \bm{q}_{0}}(t)\bm{q}_{0}^{r}+\ddot{\bm{\Phi}}_{\dot{\bm{q}}_{0}}(t)\dot{\bm{q}% }_{0}^{r}.over¨ start_ARG bold_italic_ϕ end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT ( italic_t , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) = over¨ start_ARG bold_Φ end_ARG start_POSTSUBSCRIPT bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT + over¨ start_ARG bold_Φ end_ARG start_POSTSUBSCRIPT over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT . (62)

We exploit this affine dependency on the initial position and velocity by rewriting the acceleration as a linear function in a combined trajectory parameter 𝝃𝝃\bm{\xi}bold_italic_ξ, i.e.

𝒒¨r(t)=𝚽¨(t)𝝃,superscript¨𝒒𝑟𝑡¨𝚽𝑡𝝃\ddot{\bm{q}}^{r}(t)=\ddot{\bm{\Phi}}(t)\bm{\xi},over¨ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ( italic_t ) = over¨ start_ARG bold_Φ end_ARG ( italic_t ) bold_italic_ξ , (63)

where the new basis function matrix and the new trajectory parameter are given by

𝚽¨(t)=(𝚽¨via(t)𝚽¨𝒒0(t)𝚽¨𝒒˙0(t)),𝝃=(𝜽𝒒0r𝒒˙0r).formulae-sequence¨𝚽𝑡matrixsubscript¨𝚽via𝑡subscript¨𝚽subscript𝒒0𝑡subscript¨𝚽subscript˙𝒒0𝑡𝝃matrix𝜽superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟\ddot{\bm{\Phi}}(t)=\begin{pmatrix}\ddot{\bm{\Phi}}_{\mathrm{via}}(t)&\!\ddot{% \bm{\Phi}}_{\bm{q}_{0}}(t)&\!\ddot{\bm{\Phi}}_{\dot{\bm{q}}_{0}}(t)\end{% pmatrix},\quad\bm{\xi}=\begin{pmatrix}\bm{\theta}\\ \bm{q}_{0}^{r}\\ \dot{\bm{q}}_{0}^{r}\end{pmatrix}.over¨ start_ARG bold_Φ end_ARG ( italic_t ) = ( start_ARG start_ROW start_CELL over¨ start_ARG bold_Φ end_ARG start_POSTSUBSCRIPT roman_via end_POSTSUBSCRIPT ( italic_t ) end_CELL start_CELL over¨ start_ARG bold_Φ end_ARG start_POSTSUBSCRIPT bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) end_CELL start_CELL over¨ start_ARG bold_Φ end_ARG start_POSTSUBSCRIPT over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_t ) end_CELL end_ROW end_ARG ) , bold_italic_ξ = ( start_ARG start_ROW start_CELL bold_italic_θ end_CELL end_ROW start_ROW start_CELL bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_CELL end_ROW start_ROW start_CELL over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT end_CELL end_ROW end_ARG ) . (64)

As a result, the smoothness objective is equivalent to

Js=120T𝝃𝚽¨(t)𝑹𝒒𝚽¨(t)𝝃𝑑t,=12𝝃0T𝚽¨(t)𝑹𝒒𝚽¨(t)𝑑t𝝃,=12𝝃𝑹𝝃𝝃.\displaystyle\begin{split}J_{\mathrm{s}}&=\frac{1}{2}\int_{0}^{T}\bm{\xi}^{% \scriptscriptstyle\top}\ddot{\bm{\Phi}}(t)^{\scriptscriptstyle\top}\bm{R}_{\bm% {q}}\ddot{\bm{\Phi}}(t)\bm{\xi}\,dt,\\ &=\frac{1}{2}\bm{\xi}\int_{0}^{T}\ddot{\bm{\Phi}}(t)^{\scriptscriptstyle\top}% \bm{R}_{\bm{q}}\ddot{\bm{\Phi}}(t)\,dt\,\bm{\xi},\\ &=\frac{1}{2}\bm{\xi}\bm{R}_{\bm{\xi}}\bm{\xi}.\end{split}start_ROW start_CELL italic_J start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG 2 end_ARG ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT bold_italic_ξ start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT over¨ start_ARG bold_Φ end_ARG ( italic_t ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_R start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT over¨ start_ARG bold_Φ end_ARG ( italic_t ) bold_italic_ξ italic_d italic_t , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_italic_ξ ∫ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over¨ start_ARG bold_Φ end_ARG ( italic_t ) start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT bold_italic_R start_POSTSUBSCRIPT bold_italic_q end_POSTSUBSCRIPT over¨ start_ARG bold_Φ end_ARG ( italic_t ) italic_d italic_t bold_italic_ξ , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL = divide start_ARG 1 end_ARG start_ARG 2 end_ARG bold_italic_ξ bold_italic_R start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT bold_italic_ξ . end_CELL end_ROW (65)

We obtain a smoothness metric

𝑹𝝃=(𝑹𝜽𝑹𝜽|𝒒0,𝒒˙0𝑹𝜽|𝒒0,𝒒˙0𝑹𝒒0,𝒒˙0),subscript𝑹𝝃matrixsubscript𝑹𝜽subscript𝑹conditional𝜽subscript𝒒0subscript˙𝒒0superscriptsubscript𝑹conditional𝜽subscript𝒒0subscript˙𝒒0topsubscript𝑹subscript𝒒0subscript˙𝒒0\bm{R}_{\bm{\xi}}=\begin{pmatrix}\bm{R}_{\bm{\theta}}&\bm{R}_{\bm{\theta}|\bm{% q}_{0},\dot{\bm{q}}_{0}}\\ \bm{R}_{\bm{\theta}|\bm{q}_{0},\dot{\bm{q}}_{0}}^{\scriptscriptstyle\top}&\bm{% R}_{\bm{q}_{0},\dot{\bm{q}}_{0}}\end{pmatrix},bold_italic_R start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT = ( start_ARG start_ROW start_CELL bold_italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT end_CELL start_CELL bold_italic_R start_POSTSUBSCRIPT bold_italic_θ | bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL bold_italic_R start_POSTSUBSCRIPT bold_italic_θ | bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT end_CELL start_CELL bold_italic_R start_POSTSUBSCRIPT bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) , (66)

that is the result of the integral in the second equation in (65). With this result, the smoothness objective is in fact a quadratic function in 𝝃𝝃\bm{\xi}bold_italic_ξ and can thus be expressed as a zero-mean Gaussian distribution with

ps(𝝃)=ps(𝜽,𝒒0r,𝒒˙0r)=𝒩(𝟎,𝑹𝝃1).subscript𝑝s𝝃subscript𝑝s𝜽superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟𝒩0superscriptsubscript𝑹𝝃1p_{\mathrm{s}}(\bm{\xi})=p_{\mathrm{s}}(\bm{\theta},\bm{q}_{0}^{r},\dot{\bm{q}% }_{0}^{r})=\mathcal{N}\left(\bm{0},\bm{R}_{\bm{\xi}}^{-1}\right).italic_p start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( bold_italic_ξ ) = italic_p start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( bold_italic_θ , bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) = caligraphic_N ( bold_0 , bold_italic_R start_POSTSUBSCRIPT bold_italic_ξ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) . (67)

This Gaussian distribution is a joint distribution over the original trajectory parameter 𝜽𝜽\bm{\theta}bold_italic_θ and the initial position and velocity. At the time of constructing the smoothness prior, the initial position and velocity are given. Consequently, we obtain the smoothness prior for 𝜽𝜽\bm{\theta}bold_italic_θ by conditioning on the initial conditions, i.e.

ps(𝜽|𝒒0r,𝒒˙0r)=𝒩(𝜽¯s,𝑹𝜽1),𝜽¯s=𝑹𝜽1𝑹𝜽|𝒒0,𝒒˙0(𝒒0𝒒˙0).formulae-sequencesubscript𝑝sconditional𝜽superscriptsubscript𝒒0𝑟superscriptsubscript˙𝒒0𝑟𝒩subscript¯𝜽ssuperscriptsubscript𝑹𝜽1subscript¯𝜽ssuperscriptsubscript𝑹𝜽1subscript𝑹conditional𝜽subscript𝒒0subscript˙𝒒0matrixsubscript𝒒0subscript˙𝒒0\displaystyle\begin{split}p_{\mathrm{s}}(\bm{\theta}|\bm{q}_{0}^{r},\dot{\bm{q% }}_{0}^{r})=\mathcal{N}\left(\bar{\bm{\theta}}_{\mathrm{s}},\bm{R}_{\bm{\theta% }}^{-1}\right),\\ \bar{\bm{\theta}}_{\mathrm{s}}=\bm{R}_{\bm{\theta}}^{-1}\bm{R}_{\bm{\theta}|% \bm{q}_{0},\dot{\bm{q}}_{0}}\begin{pmatrix}\bm{q}_{0}\\ \dot{\bm{q}}_{0}\end{pmatrix}.\end{split}start_ROW start_CELL italic_p start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT ( bold_italic_θ | bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT ) = caligraphic_N ( over¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT , bold_italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ) , end_CELL end_ROW start_ROW start_CELL over¯ start_ARG bold_italic_θ end_ARG start_POSTSUBSCRIPT roman_s end_POSTSUBSCRIPT = bold_italic_R start_POSTSUBSCRIPT bold_italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_R start_POSTSUBSCRIPT bold_italic_θ | bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( start_ARG start_ROW start_CELL bold_italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL over˙ start_ARG bold_italic_q end_ARG start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_CELL end_ROW end_ARG ) . end_CELL end_ROW (68)

Experiment: Implementation Details

In the following, we provide the details of the implementations for running the experiments as described in Sec. 7.

Path Tracking Cost.

In a single-horizon optimization problem, we define the task-specific cost based on the object belief only. Given a predicted trajectory of particles, we denote the initial belief with b0subscript𝑏0b_{0}italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT and the belief at the end of the horizon with bTsubscript𝑏𝑇b_{T}italic_b start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT. For both beliefs, we compute the expected object states, i.e. 𝒒¯0o=Eb0[𝒒o]subscriptsuperscript¯𝒒𝑜0subscriptEsubscript𝑏0delimited-[]superscript𝒒𝑜\bar{\bm{q}}^{o}_{0}=\mathrm{E}_{b_{0}}[\bm{q}^{o}]over¯ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = roman_E start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ] and 𝒒¯To=EbT[𝒒o]subscriptsuperscript¯𝒒𝑜𝑇subscriptEsubscript𝑏𝑇delimited-[]superscript𝒒𝑜\bar{\bm{q}}^{o}_{T}=\mathrm{E}_{b_{T}}[\bm{q}^{o}]over¯ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = roman_E start_POSTSUBSCRIPT italic_b start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_POSTSUBSCRIPT [ bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT ], for consequently computing the cost with respect to the candidate solution. The target path is defined as a circle with

𝒒patho(s)=rpath(cos(2πs)sin(2πs)),s[0,1].formulae-sequencesubscriptsuperscript𝒒𝑜path𝑠subscript𝑟pathmatrix2𝜋𝑠2𝜋𝑠𝑠01\bm{q}^{o}_{\mathrm{path}}(s)=r_{\mathrm{path}}\begin{pmatrix}\cos(2\pi s)\\ \sin(2\pi s)\end{pmatrix},\,s\in[0,1].bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_path end_POSTSUBSCRIPT ( italic_s ) = italic_r start_POSTSUBSCRIPT roman_path end_POSTSUBSCRIPT ( start_ARG start_ROW start_CELL roman_cos ( 2 italic_π italic_s ) end_CELL end_ROW start_ROW start_CELL roman_sin ( 2 italic_π italic_s ) end_CELL end_ROW end_ARG ) , italic_s ∈ [ 0 , 1 ] . (69)

The radius of the circle is rpath=0.15subscript𝑟path0.15r_{\mathrm{path}}=0.15italic_r start_POSTSUBSCRIPT roman_path end_POSTSUBSCRIPT = 0.15. Next, we find the point on the target path that is closest to the expected object states with

sk=argmins𝒒¯ko𝒒patho(s)2,s.t.s[0,1],formulae-sequencesubscript𝑠𝑘subscriptargmin𝑠subscriptnormsubscriptsuperscript¯𝒒𝑜𝑘subscriptsuperscript𝒒𝑜path𝑠2st𝑠01s_{k}=\operatorname*{arg\,min}_{s}||\bar{\bm{q}}^{o}_{k}-\bm{q}^{o}_{\mathrm{% path}}(s)||_{2},\,\mathrm{s.t.}\,s\in[0,1],italic_s start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = start_OPERATOR roman_arg roman_min end_OPERATOR start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT | | over¯ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT - bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_path end_POSTSUBSCRIPT ( italic_s ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , roman_s . roman_t . italic_s ∈ [ 0 , 1 ] , (70)

for k={0,T}𝑘0𝑇k=\{0,T\}italic_k = { 0 , italic_T }. Consequently, we compute the path tracking cost with

cpath=ewprogress(s0sT)+werror𝒒¯To𝒒patho(sT)22.subscript𝑐pathsuperscript𝑒subscript𝑤progresssubscript𝑠0subscript𝑠𝑇subscript𝑤errorsuperscriptsubscriptnormsubscriptsuperscript¯𝒒𝑜𝑇subscriptsuperscript𝒒𝑜pathsubscript𝑠𝑇22c_{\mathrm{path}}=e^{w_{\mathrm{progress}}(s_{0}-s_{T})}+w_{\mathrm{error}}||% \bar{\bm{q}}^{o}_{T}-\bm{q}^{o}_{\mathrm{path}}(s_{T})||_{2}^{2}.italic_c start_POSTSUBSCRIPT roman_path end_POSTSUBSCRIPT = italic_e start_POSTSUPERSCRIPT italic_w start_POSTSUBSCRIPT roman_progress end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT - italic_s start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT + italic_w start_POSTSUBSCRIPT roman_error end_POSTSUBSCRIPT | | over¯ start_ARG bold_italic_q end_ARG start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT - bold_italic_q start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT roman_path end_POSTSUBSCRIPT ( italic_s start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT ) | | start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (71)

The first cost term measures the tracking progress by comparing the progress variable s𝑠sitalic_s at the end of the horizon with the beginning of the horizon. We scale the first cost term with wprogress=100subscript𝑤progress100w_{\mathrm{progress}}=100italic_w start_POSTSUBSCRIPT roman_progress end_POSTSUBSCRIPT = 100. The second cost term penalizes deviations from the path, where the squared error term is weighted with werror=2000subscript𝑤error2000w_{\mathrm{error}}=2000italic_w start_POSTSUBSCRIPT roman_error end_POSTSUBSCRIPT = 2000.