-
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts
Authors:
Saadullah Amin,
Noon Pokaratsiri Goldstein,
Morgan Kelly Wixted,
Alejandro García-Rudolph,
Catalina Martínez-Costa,
Günter Neumann
Abstract:
Despite the advances in digital healthcare systems offering curated structured knowledge, much of the critical information still lies in large volumes of unlabeled and unstructured clinical texts. These texts, which often contain protected health information (PHI), are exposed to information extraction tools for downstream applications, risking patient identification. Existing works in de-identifi…
▽ More
Despite the advances in digital healthcare systems offering curated structured knowledge, much of the critical information still lies in large volumes of unlabeled and unstructured clinical texts. These texts, which often contain protected health information (PHI), are exposed to information extraction tools for downstream applications, risking patient identification. Existing works in de-identification rely on using large-scale annotated corpora in English, which often are not suitable in real-world multilingual settings. Pre-trained language models (LM) have shown great potential for cross-lingual transfer in low-resource settings. In this work, we empirically show the few-shot cross-lingual transfer property of LMs for named entity recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke domain. We annotate a gold evaluation dataset to assess few-shot setting performance where we only use a few hundred labeled examples for training. Our model improves the zero-shot F1-score from 73.7% to 91.2% on the gold evaluation set when adapting Multilingual BERT (mBERT) (Devlin et al., 2019) from the MEDDOCAN (Marimon et al., 2019) corpus with our few-shot cross-lingual target corpus. When generalized to an out-of-sample test set, the best model achieves a human-evaluation F1-score of 97.2%.
△ Less
Submitted 10 April, 2022;
originally announced April 2022.
-
Reactive Motion Generation on Learned Riemannian Manifolds
Authors:
Hadi Beik-Mohammadi,
Søren Hauberg,
Georgios Arvanitidis,
Gerhard Neumann,
Leonel Rozo
Abstract:
In recent decades, advancements in motion learning have enabled robots to acquire new skills and adapt to unseen conditions in both structured and unstructured environments. In practice, motion learning methods capture relevant patterns and adjust them to new conditions such as dynamic obstacle avoidance or variable targets. In this paper, we investigate the robot motion learning paradigm from a R…
▽ More
In recent decades, advancements in motion learning have enabled robots to acquire new skills and adapt to unseen conditions in both structured and unstructured environments. In practice, motion learning methods capture relevant patterns and adjust them to new conditions such as dynamic obstacle avoidance or variable targets. In this paper, we investigate the robot motion learning paradigm from a Riemannian manifold perspective. We argue that Riemannian manifolds may be learned via human demonstrations in which geodesics are natural motion skills. The geodesics are generated using a learned Riemannian metric produced by our novel variational autoencoder (VAE), which is especially intended to recover full-pose end-effector states and joint space configurations. In addition, we propose a technique for facilitating on-the-fly end-effector/multiple-limb obstacle avoidance by resha** the learned manifold using an obstacle-aware ambient metric. The motion generated using these geodesics may naturally result in multiple-solution tasks that have not been explicitly demonstrated previously. We extensively tested our approach in task space and joint space scenarios using a 7-DoF robotic manipulator. We demonstrate that our method is capable of learning and generating motion skills based on complicated motion patterns demonstrated by a human operator. Additionally, we assess several obstacle avoidance strategies and generate trajectories in multiple-mode settings.
△ Less
Submitted 17 August, 2023; v1 submitted 15 March, 2022;
originally announced March 2022.
-
What Matters For Meta-Learning Vision Regression Tasks?
Authors:
Ning Gao,
Hanna Ziesche,
Ngo Anh Vien,
Michael Volpp,
Gerhard Neumann
Abstract:
Meta-learning is widely used in few-shot classification and function regression due to its ability to quickly adapt to unseen tasks. However, it has not yet been well explored on regression tasks with high dimensional inputs such as images. This paper makes two main contributions that help understand this barely explored area. \emph{First}, we design two new types of cross-category level vision re…
▽ More
Meta-learning is widely used in few-shot classification and function regression due to its ability to quickly adapt to unseen tasks. However, it has not yet been well explored on regression tasks with high dimensional inputs such as images. This paper makes two main contributions that help understand this barely explored area. \emph{First}, we design two new types of cross-category level vision regression tasks, namely object discovery and pose estimation of unprecedented complexity in the meta-learning domain for computer vision. To this end, we (i) exhaustively evaluate common meta-learning techniques on these tasks, and (ii) quantitatively analyze the effect of various deep learning techniques commonly used in recent meta-learning algorithms in order to strengthen the generalization capability: data augmentation, domain randomization, task augmentation and meta-regularization. Finally, we (iii) provide some insights and practical recommendations for training meta-learning algorithms on vision regression tasks. \emph{Second}, we propose the addition of functional contrastive learning (FCL) over the task representations in Conditional Neural Processes (CNPs) and train in an end-to-end fashion. The experimental results show that the results of prior work are misleading as a consequence of a poor choice of the loss function as well as too small meta-training sets. Specifically, we find that CNPs outperform MAML on most tasks without fine-tuning. Furthermore, we observe that naive task augmentation without a tailored design results in underfitting.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Hierarchical Policy Learning for Mechanical Search
Authors:
Oussama Zenkri,
Ngo Anh Vien,
Gerhard Neumann
Abstract:
Retrieving objects from clutters is a complex task, which requires multiple interactions with the environment until the target object can be extracted. These interactions involve executing action primitives like gras** or pushing as well as setting priorities for the objects to manipulate and the actions to execute. Mechanical Search (MS) is a framework for object retrieval, which uses a heurist…
▽ More
Retrieving objects from clutters is a complex task, which requires multiple interactions with the environment until the target object can be extracted. These interactions involve executing action primitives like gras** or pushing as well as setting priorities for the objects to manipulate and the actions to execute. Mechanical Search (MS) is a framework for object retrieval, which uses a heuristic algorithm for pushing and rule-based algorithms for high-level planning. While rule-based policies profit from human intuition in how they work, they usually perform sub-optimally in many cases. Deep reinforcement learning (RL) has shown great performance in complex tasks such as taking decisions through evaluating pixels, which makes it suitable for training policies in the context of object-retrieval. In this work, we first formulate the MS problem in a principled formulation as a hierarchical POMDP. Based on this formulation, we propose a hierarchical policy learning approach for the MS problem. For demonstration, we present two main parameterized sub-policies: a push policy and an action selection policy. When integrated into the hierarchical POMDP's policy, our proposed sub-policies increase the success rate of retrieving the target object from less than 32% to nearly 80%, while reducing the computation time for push actions from multiple seconds to less than 10 milliseconds.
△ Less
Submitted 28 February, 2022;
originally announced February 2022.
-
Number of $k$-normal elements over a finite field
Authors:
Josimar J. R. Aguirre,
Victor G. L. Neumann
Abstract:
An element $α\in \mathbb{F}_{q^n}$ is a normal element over $\mathbb{F}_q$ if the conjugates $α^{q^i}$, $0 \leq i \leq n-1$, are linearly independent over $\mathbb{F}_q$. Hence a normal basis for $\mathbb{F}_{q^n}$ over $\mathbb{F}_q$ is of the form $\{α,α^q, \ldots, α^{q^{n-1}}\}$, where $α\in \mathbb{F}_{q^n}$ is normal over $\mathbb{F}_q$. In 2013, Huczynska, Mullen, Panario and Thomson introdu…
▽ More
An element $α\in \mathbb{F}_{q^n}$ is a normal element over $\mathbb{F}_q$ if the conjugates $α^{q^i}$, $0 \leq i \leq n-1$, are linearly independent over $\mathbb{F}_q$. Hence a normal basis for $\mathbb{F}_{q^n}$ over $\mathbb{F}_q$ is of the form $\{α,α^q, \ldots, α^{q^{n-1}}\}$, where $α\in \mathbb{F}_{q^n}$ is normal over $\mathbb{F}_q$. In 2013, Huczynska, Mullen, Panario and Thomson introduce the concept of k-normal elements, as a generalization of the notion of normal elements. In the last few years, several results have been known about these numbers. In this paper, we give an explicit combinatorial formula for the number of $k$-normal elements in the general case, answering an open problem proposed by Huczynska et al. (2013).
△ Less
Submitted 20 February, 2022;
originally announced February 2022.
-
About $r$- primitive and $k$-normal elements in finite fields
Authors:
Cícero Carvalho,
Josimar J. R. Aguirre,
Victor G. L. Neumann
Abstract:
In 2013, Huczynska, Mullen, Panario and Thomson introduced the concept of $k$-normal elements: an element $α\in \mathbb{F}_{q^n}$ is $k$-normal over $\mathbb{F}_q$ if the greatest common divisor of the polynomials $g_α(x)= αx^{n-1}+α^qx^{n-2}+\ldots +α^{q^{n-2}}x+α^{q^{n-1}}$ and $x^n-1$ in $\mathbb{F}_{q^n}[x]$ has degree $k$, generalizing the concept of normal elements (normal in the usual sense…
▽ More
In 2013, Huczynska, Mullen, Panario and Thomson introduced the concept of $k$-normal elements: an element $α\in \mathbb{F}_{q^n}$ is $k$-normal over $\mathbb{F}_q$ if the greatest common divisor of the polynomials $g_α(x)= αx^{n-1}+α^qx^{n-2}+\ldots +α^{q^{n-2}}x+α^{q^{n-1}}$ and $x^n-1$ in $\mathbb{F}_{q^n}[x]$ has degree $k$, generalizing the concept of normal elements (normal in the usual sense is $0$-normal). In this paper we discuss the existence of $r$-primitive, $k$-normal elements in $\mathbb{F}_{q^n}$ over $\mathbb{F}_{q}$, where an element $α\in \mathbb{F}_{q^n}^*$ is $r$-primitive if its multiplicative order is $\frac{q^n-1}{r}$. We provide many general results about the existence of this class of elements and we work a numerical example over finite fields of characteristic $11$.
△ Less
Submitted 24 December, 2021;
originally announced December 2021.
-
Specializing Versatile Skill Libraries using Local Mixture of Experts
Authors:
Onur Celik,
Dongzhuoran Zhou,
Ge Li,
Philipp Becker,
Gerhard Neumann
Abstract:
A long-cherished vision in robotics is to equip robots with skills that match the versatility and precision of humans. For example, when playing table tennis, a robot should be capable of returning the ball in various ways while precisely placing it at the desired location. A common approach to model such versatile behavior is to use a Mixture of Experts (MoE) model, where each expert is a context…
▽ More
A long-cherished vision in robotics is to equip robots with skills that match the versatility and precision of humans. For example, when playing table tennis, a robot should be capable of returning the ball in various ways while precisely placing it at the desired location. A common approach to model such versatile behavior is to use a Mixture of Experts (MoE) model, where each expert is a contextual motion primitive. However, learning such MoEs is challenging as most objectives force the model to cover the entire context space, which prevents specialization of the primitives resulting in rather low-quality components. Starting from maximum entropy reinforcement learning (RL), we decompose the objective into optimizing an individual lower bound per mixture component. Further, we introduce a curriculum by allowing the components to focus on a local context region, enabling the model to learn highly accurate skill representations. To this end, we use local context distributions that are adapted jointly with the expert primitives. Our lower bound advocates an iterative addition of new components, where new components will concentrate on local context regions not covered by the current MoE. This local and incremental learning results in a modular MoE model of high accuracy and versatility, where both properties can be scaled by adding more components on the fly. We demonstrate this by an extensive ablation and on two challenging simulated robot skill learning tasks. We compare our achieved performance to LaDiPS and HiREPS, a known hierarchical policy search method for learning diverse skills.
△ Less
Submitted 10 January, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Human-machine Symbiosis: A Multivariate Perspective for Physically Coupled Human-machine Systems
Authors:
Jairo Inga,
Miriam Ruess,
Jan Heinrich Robens,
Thomas Nelius,
Sean Kille,
Philipp Dahlinger,
Roland Thomaschke,
Gerhard Neumann,
Sven Matthiesen,
Sören Hohmann,
Andrea Kiesel
Abstract:
The notion of symbiosis has been increasingly mentioned in research on physically coupled human-machine systems. Yet, a uniform specification on which aspects constitute human-machine symbiosis is missing. By combining the expertise of different disciplines, we elaborate on a multivariate perspective of symbiosis as the highest form of physically coupled human-machine systems. Four dimensions are…
▽ More
The notion of symbiosis has been increasingly mentioned in research on physically coupled human-machine systems. Yet, a uniform specification on which aspects constitute human-machine symbiosis is missing. By combining the expertise of different disciplines, we elaborate on a multivariate perspective of symbiosis as the highest form of physically coupled human-machine systems. Four dimensions are considered: Task, interaction, performance, and experience. First, human and machine work together to accomplish a common task conceptualized on both a decision and an action level (task dimension). Second, each partner possesses an internal representation of own as well as the other partner's intentions and influence on the environment. This alignment, which is the core of the interaction, constitutes the symbiotic understanding between both partners, being the basis of a joint, highly coordinated and effective action (interaction dimension). Third, the symbiotic interaction leads to synergetic effects regarding the intention recognition and complementary strengths of the partners, resulting in a higher overall performance (performance dimension). Fourth, symbiotic systems specifically change the user's experiences, like flow, acceptance, sense of agency, and embodiment (experience dimension). This multivariate perspective is flexible and generic and is also applicable in diverse human-machine scenarios, hel** to bridge barriers between different disciplines.
△ Less
Submitted 29 November, 2021;
originally announced November 2021.
-
Switching Recurrent Kalman Networks
Authors:
Giao Nguyen-Quynh,
Philipp Becker,
Chen Qiu,
Maja Rudolph,
Gerhard Neumann
Abstract:
Forecasting driving behavior or other sensor measurements is an essential component of autonomous driving systems. Often real-world multivariate time series data is hard to model because the underlying dynamics are nonlinear and the observations are noisy. In addition, driving data can often be multimodal in distribution, meaning that there are distinct predictions that are likely, but averaging c…
▽ More
Forecasting driving behavior or other sensor measurements is an essential component of autonomous driving systems. Often real-world multivariate time series data is hard to model because the underlying dynamics are nonlinear and the observations are noisy. In addition, driving data can often be multimodal in distribution, meaning that there are distinct predictions that are likely, but averaging can hurt model performance. To address this, we propose the Switching Recurrent Kalman Network (SRKN) for efficient inference and prediction on nonlinear and multi-modal time-series data. The model switches among several Kalman filters that model different aspects of the dynamics in a factorized latent state. We empirically test the resulting scalable and interpretable deep state-space model on toy data sets and real driving data from taxis in Porto. In all cases, the model can capture the multimodal nature of the dynamics in the data.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
Versatile Inverse Reinforcement Learning via Cumulative Rewards
Authors:
Niklas Freymuth,
Philipp Becker,
Gerhard Neumann
Abstract:
Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the gene…
▽ More
Inverse Reinforcement Learning infers a reward function from expert demonstrations, aiming to encode the behavior and intentions of the expert. Current approaches usually do this with generative and uni-modal models, meaning that they encode a single behavior. In the common setting, where there are various solutions to a problem and the experts show versatile behavior this severely limits the generalization capabilities of these methods. We propose a novel method for Inverse Reinforcement Learning that overcomes these problems by formulating the recovered reward as a sum of iteratively trained discriminators. We show on simulated tasks that our approach is able to recover general, high-quality reward functions and produces policies of the same quality as behavioral cloning approaches designed for versatile behavior.
△ Less
Submitted 15 November, 2021;
originally announced November 2021.
-
Bugs in our Pockets: The Risks of Client-Side Scanning
Authors:
Hal Abelson,
Ross Anderson,
Steven M. Bellovin,
Josh Benaloh,
Matt Blaze,
Jon Callas,
Whitfield Diffie,
Susan Landau,
Peter G. Neumann,
Ronald L. Rivest,
Jeffrey I. Schiller,
Bruce Schneier,
Vanessa Teague,
Carmela Troncoso
Abstract:
Our increasing reliance on digital technology for personal, economic, and government affairs has made it essential to secure the communications and devices of private citizens, businesses, and governments. This has led to pervasive use of cryptography across society. Despite its evident advantages, law enforcement and national security agencies have argued that the spread of cryptography has hinde…
▽ More
Our increasing reliance on digital technology for personal, economic, and government affairs has made it essential to secure the communications and devices of private citizens, businesses, and governments. This has led to pervasive use of cryptography across society. Despite its evident advantages, law enforcement and national security agencies have argued that the spread of cryptography has hindered access to evidence and intelligence. Some in industry and government now advocate a new technology to access targeted data: client-side scanning (CSS). Instead of weakening encryption or providing law enforcement with backdoor keys to decrypt communications, CSS would enable on-device analysis of data in the clear. If targeted information were detected, its existence and, potentially, its source, would be revealed to the agencies; otherwise, little or no information would leave the client device. Its proponents claim that CSS is a solution to the encryption versus public safety debate: it offers privacy -- in the sense of unimpeded end-to-end encryption -- and the ability to successfully investigate serious crime. In this report, we argue that CSS neither guarantees efficacious crime prevention nor prevents surveillance. Indeed, the effect is the opposite. CSS by its nature creates serious security and privacy risks for all society while the assistance it can provide for law enforcement is at best problematic. There are multiple ways in which client-side scanning can fail, can be evaded, and can be abused.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Cooperative Assistance in Robotic Surgery through Multi-Agent Reinforcement Learning
Authors:
Paul Maria Scheikl,
Balázs Gyenes,
Tornike Davitashvili,
Rayan Younis,
André Schulze,
Beat P. Müller-Stich,
Gerhard Neumann,
Martin Wagner,
Franziska Mathis-Ullrich
Abstract:
Cognitive cooperative assistance in robot-assisted surgery holds the potential to increase quality of care in minimally invasive interventions. Automation of surgical tasks promises to reduce the mental exertion and fatigue of surgeons. In this work, multi-agent reinforcement learning is demonstrated to be robust to the distribution shift introduced by pairing a learned policy with a human team me…
▽ More
Cognitive cooperative assistance in robot-assisted surgery holds the potential to increase quality of care in minimally invasive interventions. Automation of surgical tasks promises to reduce the mental exertion and fatigue of surgeons. In this work, multi-agent reinforcement learning is demonstrated to be robust to the distribution shift introduced by pairing a learned policy with a human team member. Multi-agent policies are trained directly from images in simulation to control multiple instruments in a sub task of the minimally invasive removal of the gallbladder. These agents are evaluated individually and in cooperation with humans to demonstrate their suitability as autonomous assistants. Compared to human teams, the hybrid teams with artificial agents perform better considering completion time (44.4% to 71.2% shorter) as well as number of collisions (44.7% to 98.0% fewer). Path lengths, however, increase under control of an artificial agent (11.4% to 33.5% longer). A multi-agent formulation of the learning problem was favored over a single-agent formulation on this surgical sub task, due to the sequential learning of the two instruments. This approach may be extended to other tasks that are difficult to formulate within the standard reinforcement learning framework. Multi-agent reinforcement learning may shift the paradigm of cognitive robotic surgery towards seamless cooperation between surgeons and assistive technologies.
△ Less
Submitted 10 October, 2021;
originally announced October 2021.
-
A Study on Dense and Sparse (Visual) Rewards in Robot Policy Learning
Authors:
Abdalkarim Mohtasib,
Gerhard Neumann,
Heriberto Cuayahuitl
Abstract:
Deep Reinforcement Learning (DRL) is a promising approach for teaching robots new behaviour. However, one of its main limitations is the need for carefully hand-coded reward signals by an expert. We argue that it is crucial to automate the reward learning process so that new skills can be taught to robots by their users. To address such automation, we consider task success classifiers using visual…
▽ More
Deep Reinforcement Learning (DRL) is a promising approach for teaching robots new behaviour. However, one of its main limitations is the need for carefully hand-coded reward signals by an expert. We argue that it is crucial to automate the reward learning process so that new skills can be taught to robots by their users. To address such automation, we consider task success classifiers using visual observations to estimate the rewards in terms of task success. In this work, we study the performance of multiple state-of-the-art deep reinforcement learning algorithms under different types of reward: Dense, Sparse, Visual Dense, and Visual Sparse rewards. Our experiments in various simulation tasks (Pendulum, Reacher, Pusher, and Fetch Reach) show that while DRL agents can learn successful behaviours using visual rewards when the goal targets are distinguishable, their performance may decrease if the task goal is not clearly visible. Our results also show that visual dense rewards are more successful than visual sparse rewards and that there is no single best algorithm for all tasks.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
A family of codes with variable locality and availability
Authors:
Cícero Carvalho,
Victor G. L. Neumann
Abstract:
In this work we present a class of locally recoverable codes, i.e. codes where an erasure at a position $P$ of a codeword may be recovered from the knowledge of the entries in the positions of a recovery set $R_P$. The codes in the class that we define have availability, meaning that for each position $P$ there are several distinct recovery sets. Also, the entry at position $P$ may be recovered ev…
▽ More
In this work we present a class of locally recoverable codes, i.e. codes where an erasure at a position $P$ of a codeword may be recovered from the knowledge of the entries in the positions of a recovery set $R_P$. The codes in the class that we define have availability, meaning that for each position $P$ there are several distinct recovery sets. Also, the entry at position $P$ may be recovered even in the presence of erasures in some of the positions of the recovery sets, and the number of supported erasures may vary among the various recovery sets.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Navigate-and-Seek: a Robotics Framework for People Localization in Agricultural Environments
Authors:
Riccardo Polvara,
Francesco Del Duchetto,
Gerhard Neumann,
Marc Hanheide
Abstract:
The agricultural domain offers a working environment where many human laborers are nowadays employed to maintain or harvest crops, with huge potential for productivity gains through the introduction of robotic automation. Detecting and localizing humans reliably and accurately in such an environment, however, is a prerequisite to many services offered by fleets of mobile robots collaborating with…
▽ More
The agricultural domain offers a working environment where many human laborers are nowadays employed to maintain or harvest crops, with huge potential for productivity gains through the introduction of robotic automation. Detecting and localizing humans reliably and accurately in such an environment, however, is a prerequisite to many services offered by fleets of mobile robots collaborating with human workers. Consequently, in this paper, we expand on the concept of a topological particle filter (TPF) to accurately and individually localize and track workers in a farm environment, integrating information from heterogeneous sensors and combining local active sensing (exploiting a robot's onboard sensing employing a Next-Best-Sense planning approach) and global localization (using affordable IoT GNSS devices). We validate the proposed approach in topologies created for the deployment of robotics fleets to support fruit pickers in a real farm environment. By combining multi-sensor observations on the topological level complemented by active perception through the NBS approach, we show that we can improve the accuracy of picker localization in comparison to prior work.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Differentiable Robust LQR Layers
Authors:
Ngo Anh Vien,
Gerhard Neumann
Abstract:
This paper proposes a differentiable robust LQR layer for reinforcement learning and imitation learning under model uncertainty and stochastic dynamics. The robust LQR layer can exploit the advantages of robust optimal control and model-free learning. It provides a new type of inductive bias for stochasticity and uncertainty modeling in control systems. In particular, we propose an efficient way t…
▽ More
This paper proposes a differentiable robust LQR layer for reinforcement learning and imitation learning under model uncertainty and stochastic dynamics. The robust LQR layer can exploit the advantages of robust optimal control and model-free learning. It provides a new type of inductive bias for stochasticity and uncertainty modeling in control systems. In particular, we propose an efficient way to differentiate through a robust LQR optimization program by rewriting it as a convex program (i.e. semi-definite program) of the worst-case cost. Based on recent work on using convex optimization inside neural network layers, we develop a fully differentiable layer for optimizing this worst-case cost, i.e. we compute the derivative of a performance measure w.r.t the model's unknown parameters, model uncertainty and stochasticity parameters. We demonstrate the proposed method on imitation learning and approximate dynamic programming on stochastic and uncertain domains. The experiment results show that the proposed method can optimize robust policies under uncertain situations, and are able to achieve a significantly better performance than existing methods that do not model uncertainty directly.
△ Less
Submitted 10 June, 2021;
originally announced June 2021.
-
Learning Riemannian Manifolds for Geodesic Motion Skills
Authors:
Hadi Beik-Mohammadi,
Søren Hauberg,
Georgios Arvanitidis,
Gerhard Neumann,
Leonel Rozo
Abstract:
For robots to work alongside humans and perform in unstructured environments, they must learn new motion skills and adapt them to unseen situations on the fly. This demands learning models that capture relevant motion patterns, while offering enough flexibility to adapt the encoded skills to new requirements, such as dynamic obstacle avoidance. We introduce a Riemannian manifold perspective on thi…
▽ More
For robots to work alongside humans and perform in unstructured environments, they must learn new motion skills and adapt them to unseen situations on the fly. This demands learning models that capture relevant motion patterns, while offering enough flexibility to adapt the encoded skills to new requirements, such as dynamic obstacle avoidance. We introduce a Riemannian manifold perspective on this problem, and propose to learn a Riemannian manifold from human demonstrations on which geodesics are natural motion skills. We realize this with a variational autoencoder (VAE) over the space of position and orientations of the robot end-effector. Geodesic motion skills let a robot plan movements from and to arbitrary points on the data manifold. They also provide a straightforward method to avoid obstacles by redefining the ambient metric in an online fashion. Moreover, geodesics naturally exploit the manifold resulting from multiple--mode tasks to design motions that were not explicitly demonstrated previously. We test our learning framework using a 7-DoF robotic manipulator, where the robot satisfactorily learns and reproduces realistic skills featuring elaborated motion patterns, avoids previously unseen obstacles, and generates novel movements in multiple-mode settings.
△ Less
Submitted 1 July, 2021; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Residual Feedback Learning for Contact-Rich Manipulation Tasks with Uncertainty
Authors:
Alireza Ranjbar,
Ngo Anh Vien,
Hanna Ziesche,
Joschka Boedecker,
Gerhard Neumann
Abstract:
While classic control theory offers state of the art solutions in many problem scenarios, it is often desired to improve beyond the structure of such solutions and surpass their limitations. To this end, residual policy learning (RPL) offers a formulation to improve existing controllers with reinforcement learning (RL) by learning an additive "residual" to the output of a given controller. However…
▽ More
While classic control theory offers state of the art solutions in many problem scenarios, it is often desired to improve beyond the structure of such solutions and surpass their limitations. To this end, residual policy learning (RPL) offers a formulation to improve existing controllers with reinforcement learning (RL) by learning an additive "residual" to the output of a given controller. However, the applicability of such an approach highly depends on the structure of the controller. Often, internal feedback signals of the controller limit an RL algorithm to adequately change the policy and, hence, learn the task. We propose a new formulation that addresses these limitations by also modifying the feedback signals to the controller with an RL policy and show superior performance of our approach on a contact-rich peg-insertion task under position and orientation uncertainty. In addition, we use a recent Cartesian impedance control architecture as the control framework which can be available to us as a black-box while assuming no knowledge about its input/output structure, and show the difficulties of standard RPL. Furthermore, we introduce an adaptive curriculum for the given task to gradually increase the task difficulty in terms of position and orientation uncertainty. A video showing the results can be found at https://youtu.be/SAZm_Krze7U .
△ Less
Submitted 6 August, 2021; v1 submitted 8 June, 2021;
originally announced June 2021.
-
Differentiable Trust Region Layers for Deep Reinforcement Learning
Authors:
Fabian Otto,
Philipp Becker,
Ngo Anh Vien,
Hanna Carolin Ziesche,
Gerhard Neumann
Abstract:
Trust region methods are a popular tool in reinforcement learning as they yield robust policy updates in continuous and discrete action spaces. However, enforcing such trust regions in deep reinforcement learning is difficult. Hence, many approaches, such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), are based on approximations. Due to those approximations, the…
▽ More
Trust region methods are a popular tool in reinforcement learning as they yield robust policy updates in continuous and discrete action spaces. However, enforcing such trust regions in deep reinforcement learning is difficult. Hence, many approaches, such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO), are based on approximations. Due to those approximations, they violate the constraints or fail to find the optimal solution within the trust region. Moreover, they are difficult to implement, often lack sufficient exploration, and have been shown to depend on seemingly unrelated implementation choices. In this work, we propose differentiable neural network layers to enforce trust regions for deep Gaussian policies via closed-form projections. Unlike existing methods, those layers formalize trust regions for each state individually and can complement existing reinforcement learning algorithms. We derive trust region projections based on the Kullback-Leibler divergence, the Wasserstein L2 distance, and the Frobenius norm for Gaussian distributions. We empirically demonstrate that those projection layers achieve similar or better results than existing methods while being almost agnostic to specific implementation choices. The code is available at https://git.io/Jthb0.
△ Less
Submitted 9 March, 2021; v1 submitted 22 January, 2021;
originally announced January 2021.
-
A family of codes with locality containing optimal codes
Authors:
Bruno Andrade,
Cícero Carvalho,
Victor G. L. Neumann,
Antônio C. P. Veiga
Abstract:
Locally recoverable codes were introduced by Gopalan et al. in 2012, and in the same year Prakash et al. introduced the concept of codes with locality, which are a type of locally recoverable codes. In this work we introduce a new family of codes with locality, which are subcodes of a certain family of evaluation codes. We determine the dimension of these codes, and also bounds for the minimum dis…
▽ More
Locally recoverable codes were introduced by Gopalan et al. in 2012, and in the same year Prakash et al. introduced the concept of codes with locality, which are a type of locally recoverable codes. In this work we introduce a new family of codes with locality, which are subcodes of a certain family of evaluation codes. We determine the dimension of these codes, and also bounds for the minimum distance. We present the true values of the minimum distance in special cases, and also show that elements of this family are "optimal codes", as defined by Prakash et al.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Action-Conditional Recurrent Kalman Networks For Forward and Inverse Dynamics Learning
Authors:
Vaisakh Shaj,
Philipp Becker,
Dieter Buchler,
Harit Pandya,
Niels van Duijkeren,
C. James Taylor,
Marc Hanheide,
Gerhard Neumann
Abstract:
Estimating accurate forward and inverse dynamics models is a crucial component of model-based control for sophisticated robots such as robots driven by hydraulics, artificial muscles, or robots dealing with different contact situations. Analytic models to such processes are often unavailable or inaccurate due to complex hysteresis effects, unmodelled friction and stiction phenomena,and unknown eff…
▽ More
Estimating accurate forward and inverse dynamics models is a crucial component of model-based control for sophisticated robots such as robots driven by hydraulics, artificial muscles, or robots dealing with different contact situations. Analytic models to such processes are often unavailable or inaccurate due to complex hysteresis effects, unmodelled friction and stiction phenomena,and unknown effects during contact situations. A promising approach is to obtain spatio-temporal models in a data-driven way using recurrent neural networks, as they can overcome those issues. However, such models often do not meet accuracy demands sufficiently, degenerate in performance for the required high sampling frequencies and cannot provide uncertainty estimates. We adopt a recent probabilistic recurrent neural network architecture, called Re-current Kalman Networks (RKNs), to model learning by conditioning its transition dynamics on the control actions. RKNs outperform standard recurrent networks such as LSTMs on many state estimation tasks. Inspired by Kalman filters, the RKN provides an elegant way to achieve action conditioning within its recurrent cell by leveraging additive interactions between the current latent state and the action variables. We present two architectures, one for forward model learning and one for inverse model learning. Both architectures significantly outperform exist-ing model learning frameworks as well as analytical models in terms of prediction performance on a variety of real robot dynamics models.
△ Less
Submitted 5 November, 2020; v1 submitted 20 October, 2020;
originally announced October 2020.
-
Linguistically inspired morphological inflection with a sequence to sequence model
Authors:
Eleni Metheniti,
Guenter Neumann,
Josef van Genabith
Abstract:
Inflection is an essential part of every human language's morphology, yet little effort has been made to unify linguistic theory and computational methods in recent years. Methods of string manipulation are used to infer inflectional changes; our research question is whether a neural network would be capable of learning inflectional morphemes for inflection production in a similar way to a human i…
▽ More
Inflection is an essential part of every human language's morphology, yet little effort has been made to unify linguistic theory and computational methods in recent years. Methods of string manipulation are used to infer inflectional changes; our research question is whether a neural network would be capable of learning inflectional morphemes for inflection production in a similar way to a human in early stages of language acquisition. We are using an inflectional corpus (Metheniti and Neumann, 2020) and a single layer seq2seq model to test this hypothesis, in which the inflectional affixes are learned and predicted as a block and the word stem is modelled as a character sequence to account for infixation. Our character-morpheme-based model creates inflection by predicting the stem character-to-character and the inflectional affixes as character blocks. We conducted three experiments on creating an inflected form of a word given the lemma and a set of input and target features, comparing our architecture to a mainstream character-based model with the same hyperparameters, training and test sets. Overall for 17 languages, we noticed small improvements on inflecting known lemmas (+0.68%) but steadily better performance of our model in predicting inflected forms of unknown words (+3.7%) and small improvements on predicting in a low-resource scenario (+1.09%)
△ Less
Submitted 4 September, 2020;
originally announced September 2020.
-
LowFER: Low-rank Bilinear Pooling for Link Prediction
Authors:
Saadullah Amin,
Stalin Varanasi,
Katherine Ann Dunfield,
Günter Neumann
Abstract:
Knowledge graphs are incomplete by nature, with only a limited number of observed facts from the world knowledge being represented as structured relations between entities. To partly address this issue, an important task in statistical relational learning is that of link prediction or knowledge graph completion. Both linear and non-linear models have been proposed to solve the problem. Bilinear mo…
▽ More
Knowledge graphs are incomplete by nature, with only a limited number of observed facts from the world knowledge being represented as structured relations between entities. To partly address this issue, an important task in statistical relational learning is that of link prediction or knowledge graph completion. Both linear and non-linear models have been proposed to solve the problem. Bilinear models, while expressive, are prone to overfitting and lead to quadratic growth of parameters in number of relations. Simpler models have become more standard, with certain constraints on bilinear map as relation parameters. In this work, we propose a factorized bilinear pooling model, commonly used in multi-modal learning, for better fusion of entities and relations, leading to an efficient and constraint-free model. We prove that our model is fully expressive, providing bounds on the embedding dimensionality and factorization rank. Our model naturally generalizes Tucker decomposition based TuckER model, which has been shown to generalize other models, as efficient low-rank approximation without substantially compromising the performance. Due to low-rank approximation, the model complexity can be controlled by the factorization rank, avoiding the possible cubic growth of TuckER. Empirically, we evaluate on real-world datasets, reaching on par or state-of-the-art performance. At extreme low-ranks, model preserves the performance while staying parameter efficient.
△ Less
Submitted 25 August, 2020;
originally announced August 2020.
-
Imitation Learning for Autonomous Trajectory Learning of Robot Arms in Space
Authors:
RB Ashith Shyam,
Zhou Hao,
Umberto Montanaro,
Gerhard Neumann
Abstract:
This work adds on to the on-going efforts to provide more autonomy to space robots. Here the concept of programming by demonstration or imitation learning is used for trajectory planning of manipulators mounted on small spacecraft. For greater autonomy in future space missions and minimal human intervention through ground control, a robot arm having 7-Degrees of Freedom (DoF) is envisaged for carr…
▽ More
This work adds on to the on-going efforts to provide more autonomy to space robots. Here the concept of programming by demonstration or imitation learning is used for trajectory planning of manipulators mounted on small spacecraft. For greater autonomy in future space missions and minimal human intervention through ground control, a robot arm having 7-Degrees of Freedom (DoF) is envisaged for carrying out multiple tasks like debris removal, on-orbit servicing and assembly. Since actual hardware implementation of microgravity environment is extremely expensive, the demonstration data for trajectory learning is generated using a model predictive controller (MPC) in a physics based simulator. The data is then encoded compactly by Probabilistic Movement Primitives (ProMPs). This offline trajectory learning allows faster reproductions and also avoids any computationally expensive optimizations after deployment in a space environment. It is shown that the probabilistic distribution can be used to generate trajectories to previously unseen situations by conditioning the distribution. The motion of the robot (or manipulator) arm induces reaction forces on the spacecraft hub and hence its attitude changes prompting the Attitude Determination and Control System (ADCS) to take large corrective action that drains energy out of the system. By having a robot arm with redundant DoF helps in finding several possible trajectories from the same start to the same target. This allows the ProMP trajectory generator to sample out the trajectory which is obstacle free as well as having minimal attitudinal disturbances thereby reducing the load on ADCS.
△ Less
Submitted 10 August, 2020;
originally announced August 2020.
-
Non-Adversarial Imitation Learning and its Connections to Adversarial Methods
Authors:
Oleg Arenz,
Gerhard Neumann
Abstract:
Many modern methods for imitation learning and inverse reinforcement learning, such as GAIL or AIRL, are based on an adversarial formulation. These methods apply GANs to match the expert's distribution over states and actions with the implicit state-action distribution induced by the agent's policy. However, by framing imitation learning as a saddle point problem, adversarial methods can suffer fr…
▽ More
Many modern methods for imitation learning and inverse reinforcement learning, such as GAIL or AIRL, are based on an adversarial formulation. These methods apply GANs to match the expert's distribution over states and actions with the implicit state-action distribution induced by the agent's policy. However, by framing imitation learning as a saddle point problem, adversarial methods can suffer from unstable optimization, and convergence can only be shown for small policy updates. We address these problems by proposing a framework for non-adversarial imitation learning. The resulting algorithms are similar to their adversarial counterparts and, thus, provide insights for adversarial imitation learning methods. Most notably, we show that AIRL is an instance of our non-adversarial formulation, which enables us to greatly simplify its derivations and obtain stronger convergence guarantees. We also show that our non-adversarial formulation can be used to derive novel algorithms by presenting a method for offline imitation learning that is inspired by the recent ValueDice algorithm, but does not rely on small policy updates for convergence. In our simulated robot experiments, our offline method for non-adversarial imitation learning seems to perform best when using many updates for policy and discriminator at each iteration and outperforms behavioral cloning and ValueDice.
△ Less
Submitted 8 August, 2020;
originally announced August 2020.
-
Existence of primitive $2$-normal elements in finite fields
Authors:
Victor G. L. Neumann,
Josimar J. R. Aguirre
Abstract:
An element $α\in \mathbb{F}_{q^n}$ is normal over $\mathbb{F}_q$ if $\mathcal{B}=\{α, α^q, α^{q^2}, \cdots, α^{q^{n-1}}\}$ forms a basis of $\mathbb{F}_{q^n}$ as a vector space over $\mathbb{F}_q$. It is well known that $α\in \mathbb{F}_{q^n}$ is normal over $\mathbb{F}_q$ if and only if $g_α(x)=αx^{n-1}+α^q x^{n-2}+ \cdots + α^{q^{n-2}}x+α^{q^{n-1}}$ and $x^n-1$ are relatively prime over…
▽ More
An element $α\in \mathbb{F}_{q^n}$ is normal over $\mathbb{F}_q$ if $\mathcal{B}=\{α, α^q, α^{q^2}, \cdots, α^{q^{n-1}}\}$ forms a basis of $\mathbb{F}_{q^n}$ as a vector space over $\mathbb{F}_q$. It is well known that $α\in \mathbb{F}_{q^n}$ is normal over $\mathbb{F}_q$ if and only if $g_α(x)=αx^{n-1}+α^q x^{n-2}+ \cdots + α^{q^{n-2}}x+α^{q^{n-1}}$ and $x^n-1$ are relatively prime over $\mathbb{F}_{q^n}$, that is, the degree of their greatest common divisor in $\mathbb{F}_{q^n}[x]$ is $0$. Using this equivalence, the notion of $k$-normal elements was introduced in Huczynska et al. ($2013$): an element $α\in \mathbb{F}_{q^n}$ is $k$-normal over $\mathbb{F}_q$ if the greatest common divisor of the polynomials $g_α[x]$ and $x^n-1$ in $\mathbb{F}_{q^n}[x]$ has degree $k$; so an element which is normal in the usual sense is $0$-normal.
Huczynska et al. made the question about the pairs $(n,k)$ for which there exist primitive $k$-normal elements in $\mathbb{F}_{q^n}$ over $\mathbb{F}_q$ and they got a partial result for the case $k=1$, and later Reis and Thomson ($2018$) completed this case. The Primitive Normal Basis Theorem solves the case $k=0$. In this paper, we solve completely the case $k=2$ using estimates for Gauss sum and the use of the computer, we also obtain a new condition for the existence of $k$-normal elements in $\mathbb{F}_{q^n}$.
△ Less
Submitted 22 December, 2020; v1 submitted 21 July, 2020;
originally announced July 2020.
-
On the existence of pairs of primitive and normal elements over finite fields
Authors:
Cícero Carvalho,
João Paulo Guardieiro,
Victor G. L. Neumann,
Guilherme Tizziotti
Abstract:
Let $\mathbb{F}_{q^n}$ be a finite field with $q^n$ elements, and let $m_1$ and $m_2$ be positive integers. Given polynomials $f_1(x), f_2(x) \in \mathbb{F}_q[x]$ with $\textrm{deg}(f_i(x)) \leq m_i$, for $i = 1, 2$, and such that the rational function $f_1(x)/f_2(x)$ belongs to a certain set which we define, we present a sufficient condition for the existence of a primitive element…
▽ More
Let $\mathbb{F}_{q^n}$ be a finite field with $q^n$ elements, and let $m_1$ and $m_2$ be positive integers. Given polynomials $f_1(x), f_2(x) \in \mathbb{F}_q[x]$ with $\textrm{deg}(f_i(x)) \leq m_i$, for $i = 1, 2$, and such that the rational function $f_1(x)/f_2(x)$ belongs to a certain set which we define, we present a sufficient condition for the existence of a primitive element $α\in \mathbb{F}_{q^n}$, normal over $\mathbb{F}_q$, such that $f_1(α)/f_2(α)$ is also primitive.
△ Less
Submitted 14 March, 2021; v1 submitted 19 July, 2020;
originally announced July 2020.
-
Extended gravitational clock compass: new exact solutions and simulations
Authors:
Gerald Neumann,
Dirk Puetzfeld,
Guillermo F. Rubilar
Abstract:
By extending the framework of the gravitational clock compass we show how a suitably prepared set of clocks can be used to extract information about the gravitational field in the context of General Relativity. Conceptual differences between the extended and the standard clock compass are highlighted. Particular attention is paid to the influence of kinematic quantities on the measurement process…
▽ More
By extending the framework of the gravitational clock compass we show how a suitably prepared set of clocks can be used to extract information about the gravitational field in the context of General Relativity. Conceptual differences between the extended and the standard clock compass are highlighted. Particular attention is paid to the influence of kinematic quantities on the measurement process and the setup of the compass. Additionally, we present results of simulations of the inference process for the acceleration and the curvature components. Several examples of different strategies for the computation of the posterior probability distributions of the curvature components are discussed. This allows us to anticipate the precision with which physical quantities could be determined in a realistic measurement.
△ Less
Submitted 20 August, 2020; v1 submitted 17 June, 2020;
originally announced June 2020.
-
A Data-driven Approach for Noise Reduction in Distantly Supervised Biomedical Relation Extraction
Authors:
Saadullah Amin,
Katherine Ann Dunfield,
Anna Vechkaeva,
Günter Neumann
Abstract:
Fact triples are a common form of structured knowledge used within the biomedical domain. As the amount of unstructured scientific texts continues to grow, manual annotation of these texts for the task of relation extraction becomes increasingly expensive. Distant supervision offers a viable approach to combat this by quickly producing large amounts of labeled, but considerably noisy, data. We aim…
▽ More
Fact triples are a common form of structured knowledge used within the biomedical domain. As the amount of unstructured scientific texts continues to grow, manual annotation of these texts for the task of relation extraction becomes increasingly expensive. Distant supervision offers a viable approach to combat this by quickly producing large amounts of labeled, but considerably noisy, data. We aim to reduce such noise by extending an entity-enriched relation classification BERT model to the problem of multiple instance learning, and defining a simple data encoding scheme that significantly reduces noise, reaching state-of-the-art performance for distantly-supervised biomedical relation extraction. Our approach further encodes knowledge about the direction of relation triples, allowing for increased focus on relation learning by reducing noise and alleviating the need for joint learning with knowledge graph completion.
△ Less
Submitted 26 May, 2020;
originally announced May 2020.
-
Probabilistic approach to physical object disentangling
Authors:
Joni Pajarinen,
Oleg Arenz,
Jan Peters,
Gerhard Neumann
Abstract:
Physically disentangling entangled objects from each other is a problem encountered in waste segregation or in any task that requires disassembly of structures. Often there are no object models, and, especially with cluttered irregularly shaped objects, the robot can not create a model of the scene due to occlusion. One of our key insights is that based on previous sensory input we are only intere…
▽ More
Physically disentangling entangled objects from each other is a problem encountered in waste segregation or in any task that requires disassembly of structures. Often there are no object models, and, especially with cluttered irregularly shaped objects, the robot can not create a model of the scene due to occlusion. One of our key insights is that based on previous sensory input we are only interested in moving an object out of the disentanglement around obstacles. That is, we only need to know where the robot can successfully move in order to plan the disentangling. Due to the uncertainty we integrate information about blocked movements into a probability map. The map defines the probability of the robot successfully moving to a specific configuration. Using as cost the failure probability of a sequence of movements we can then plan and execute disentangling iteratively. Since our approach circumvents only previously encountered obstacles, new movements will yield information about unknown obstacles that block movement until the robot has learned to circumvent all obstacles and disentangling succeeds. In the experiments, we use a special probabilistic version of the Rapidly exploring Random Tree (RRT) algorithm for planning and demonstrate successful disentanglement of objects both in 2-D and 3-D simulation, and, on a KUKA LBR 7-DOF robot. Moreover, our approach outperforms baseline methods.
△ Less
Submitted 12 April, 2021; v1 submitted 26 February, 2020;
originally announced February 2020.
-
On existence of some special pair of primitive elements over finite fields
Authors:
C. Carvalho,
J. P. G. Sousa,
V. G. L. Neumann,
G. Tizziotti
Abstract:
In this paper we generalize the results of Sharma, Awasthi and Gupta (see \cite{SAG}). We work over a field of any characteristic with $q = p^k$ elements and we give a sufficient condition for the existence of a primitive element $α\in \mathbb{F}_{p^k}$ such that $f(α)$ is also primitive in $\mathbb{F}_{p^k}$, where $f(x) \in \mathbb{F}_{p^k}(x)$ is a quotient of polynomials with some restrictions…
▽ More
In this paper we generalize the results of Sharma, Awasthi and Gupta (see \cite{SAG}). We work over a field of any characteristic with $q = p^k$ elements and we give a sufficient condition for the existence of a primitive element $α\in \mathbb{F}_{p^k}$ such that $f(α)$ is also primitive in $\mathbb{F}_{p^k}$, where $f(x) \in \mathbb{F}_{p^k}(x)$ is a quotient of polynomials with some restrictions. We explicitly determine the values of $k$ for which such a pair exists for $p=2,3,5$ and $7$.
△ Less
Submitted 5 February, 2020;
originally announced February 2020.
-
Expected Information Maximization: Using the I-Projection for Mixture Density Estimation
Authors:
Philipp Becker,
Oleg Arenz,
Gerhard Neumann
Abstract:
Modelling highly multi-modal data is a challenging problem in machine learning. Most algorithms are based on maximizing the likelihood, which corresponds to the M(oment)-projection of the data distribution to the model distribution. The M-projection forces the model to average over modes it cannot represent. In contrast, the I(information)-projection ignores such modes in the data and concentrates…
▽ More
Modelling highly multi-modal data is a challenging problem in machine learning. Most algorithms are based on maximizing the likelihood, which corresponds to the M(oment)-projection of the data distribution to the model distribution. The M-projection forces the model to average over modes it cannot represent. In contrast, the I(information)-projection ignores such modes in the data and concentrates on the modes the model can represent. Such behavior is appealing whenever we deal with highly multi-modal data where modelling single modes correctly is more important than covering all the modes. Despite this advantage, the I-projection is rarely used in practice due to the lack of algorithms that can efficiently optimize it based on data. In this work, we present a new algorithm called Expected Information Maximization (EIM) for computing the I-projection solely based on samples for general latent variable models, where we focus on Gaussian mixtures models and Gaussian mixtures of experts. Our approach applies a variational upper bound to the I-projection objective which decomposes the original objective into single objectives for each mixture component as well as for the coefficients, allowing an efficient optimization. Similar to GANs, our approach employs discriminators but uses a more stable optimization procedure, using a tight upper bound. We show that our algorithm is much more effective in computing the I-projection than recent GAN approaches and we illustrate the effectiveness of our approach for modelling multi-modal behavior on two pedestrian and traffic prediction datasets.
△ Less
Submitted 23 January, 2020;
originally announced January 2020.
-
Trust-Region Variational Inference with Gaussian Mixture Models
Authors:
Oleg Arenz,
Mingjun Zhong,
Gerhard Neumann
Abstract:
Many methods for machine learning rely on approximate inference from intractable probability distributions. Variational inference approximates such distributions by tractable models that can be subsequently used for approximate inference. Learning sufficiently accurate approximations requires a rich model family and careful exploration of the relevant modes of the target distribution. We propose a…
▽ More
Many methods for machine learning rely on approximate inference from intractable probability distributions. Variational inference approximates such distributions by tractable models that can be subsequently used for approximate inference. Learning sufficiently accurate approximations requires a rich model family and careful exploration of the relevant modes of the target distribution. We propose a method for learning accurate GMM approximations of intractable probability distributions based on insights from policy search by using information-geometric trust regions for principled exploration. For efficient improvement of the GMM approximation, we derive a lower bound on the corresponding optimization objective enabling us to update the components independently. Our use of the lower bound ensures convergence to a stationary point of the original objective. The number of components is adapted online by adding new components in promising regions and by deleting components with negligible weight. We demonstrate on several domains that we can learn approximations of complex, multimodal distributions with a quality that is unmet by previous variational inference methods, and that the GMM approximation can be used for drawing samples that are on par with samples created by state-of-the-art MCMC samplers while requiring up to three orders of magnitude less computational resources.
△ Less
Submitted 4 August, 2020; v1 submitted 10 July, 2019;
originally announced July 2019.
-
Recurrent Kalman Networks: Factorized Inference in High-Dimensional Deep Feature Spaces
Authors:
Philipp Becker,
Harit Pandya,
Gregor Gebhardt,
Cheng Zhao,
James Taylor,
Gerhard Neumann
Abstract:
In order to integrate uncertainty estimates into deep time-series modelling, Kalman Filters (KFs) (Kalman et al., 1960) have been integrated with deep learning models, however, such approaches typically rely on approximate inference techniques such as variational inference which makes learning more complex and often less scalable due to approximation errors. We propose a new deep approach to Kalma…
▽ More
In order to integrate uncertainty estimates into deep time-series modelling, Kalman Filters (KFs) (Kalman et al., 1960) have been integrated with deep learning models, however, such approaches typically rely on approximate inference techniques such as variational inference which makes learning more complex and often less scalable due to approximation errors. We propose a new deep approach to Kalman filtering which can be learned directly in an end-to-end manner using backpropagation without additional approximations. Our approach uses a high-dimensional factorized latent state representation for which the Kalman updates simplify to scalar operations and thus avoids hard to backpropagate, computationally heavy and potentially unstable matrix inversions. Moreover, we use locally linear dynamic models to efficiently propagate the latent state to the next time step. The resulting network architecture, which we call Recurrent Kalman Network (RKN), can be used for any time-series data, similar to a LSTM (Hochreiter & Schmidhuber, 1997) but uses an explicit representation of uncertainty. As shown by our experiments, the RKN obtains much more accurate uncertainty estimates than an LSTM or Gated Recurrent Units (GRUs) (Cho et al., 2014) while also showing a slightly improved prediction performance and outperforms various recent generative models on an image imputation task.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
An extension of Delsarte, Goethals and Mac Williams theorem on minimal weight codewords to a class of Reed-Muller type codes
Authors:
Cicero Carvalho,
Victor G. L. Neumann
Abstract:
In 1970 Delsarte, Goethals and Mac Williams published a seminal paper on generalized Reed-Muller codes where, among many important results, they proved that the minimal weight codewords of these codes are obtained through the evaluation of certain polynomials which are a specific product of linear factors, which they describe. In the present paper we extend this result to a class of Reed-Muller ty…
▽ More
In 1970 Delsarte, Goethals and Mac Williams published a seminal paper on generalized Reed-Muller codes where, among many important results, they proved that the minimal weight codewords of these codes are obtained through the evaluation of certain polynomials which are a specific product of linear factors, which they describe. In the present paper we extend this result to a class of Reed-Muller type codes defined on a product of (possibly distinct) finite fields of the same characteristic. The paper also brings an expository section on the study of the structure of low weight codewords, not only for affine Reed-Muller type codes, but also for the projective ones.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.
-
Compatible Natural Gradient Policy Search
Authors:
Joni Pajarinen,
Hong Linh Thai,
Riad Akrour,
Jan Peters,
Gerhard Neumann
Abstract:
Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value func…
▽ More
Trust-region methods have yielded state-of-the-art results in policy search. A common approach is to use KL-divergence to bound the region of trust resulting in a natural gradient policy update. We show that the natural gradient and trust region optimization are equivalent if we use the natural parameterization of a standard exponential policy distribution in combination with compatible value function approximation. Moreover, we show that standard natural gradient updates may reduce the entropy of the policy according to a wrong schedule leading to premature convergence. To control entropy reduction we introduce a new policy search method called compatible policy search (COPOS) which bounds entropy loss. The experimental results show that COPOS yields state-of-the-art results in challenging continuous control tasks and in discrete partially observable tasks.
△ Less
Submitted 7 February, 2019;
originally announced February 2019.
-
An Algorithmic Perspective on Imitation Learning
Authors:
Takayuki Osa,
Joni Pajarinen,
Gerhard Neumann,
J. Andrew Bagnell,
Pieter Abbeel,
Jan Peters
Abstract:
As robots and other intelligent agents move from simple environments and problems to more complex, unstructured settings, manually programming their behavior has become increasingly challenging and expensive. Often, it is easier for a teacher to demonstrate a desired behavior rather than attempt to manually engineer it. This process of learning from demonstrations, and the study of algorithms to d…
▽ More
As robots and other intelligent agents move from simple environments and problems to more complex, unstructured settings, manually programming their behavior has become increasingly challenging and expensive. Often, it is easier for a teacher to demonstrate a desired behavior rather than attempt to manually engineer it. This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning. This work provides an introduction to imitation learning. It covers the underlying assumptions, approaches, and how they relate; the rich set of algorithms developed to tackle the problem; and advice on effective tools and implementation.
We intend this paper to serve two audiences. First, we want to familiarize machine learning experts with the challenges of imitation learning, particularly those arising in robotics, and the interesting theoretical and practical distinctions between it and more familiar frameworks like statistical supervised learning theory and reinforcement learning. Second, we want to give roboticists and experts in applied artificial intelligence a broader appreciation for the frameworks and tools available for imitation learning.
△ Less
Submitted 16 November, 2018;
originally announced November 2018.
-
Adaptation and Robust Learning of Probabilistic Movement Primitives
Authors:
Sebastian Gomez-Gonzalez,
Gerhard Neumann,
Bernhard Schölkopf,
Jan Peters
Abstract:
Probabilistic representations of movement primitives open important new possibilities for machine learning in robotics. These representations are able to capture the variability of the demonstrations from a teacher as a probability distribution over trajectories, providing a sensible region of exploration and the ability to adapt to changes in the robot environment. However, to be able to capture…
▽ More
Probabilistic representations of movement primitives open important new possibilities for machine learning in robotics. These representations are able to capture the variability of the demonstrations from a teacher as a probability distribution over trajectories, providing a sensible region of exploration and the ability to adapt to changes in the robot environment. However, to be able to capture variability and correlations between different joints, a probabilistic movement primitive requires the estimation of a larger number of parameters compared to their deterministic counterparts, that focus on modeling only the mean behavior. In this paper, we make use of prior distributions over the parameters of a probabilistic movement primitive to make robust estimates of the parameters with few training instances. In addition, we introduce general purpose operators to adapt movement primitives in joint and task space. The proposed training method and adaptation operators are tested in a coffee preparation and in robot table tennis task. In the coffee preparation task we evaluate the generalization performance to changes in the location of the coffee grinder and brewing chamber in a target area, achieving the desired behavior after only two demonstrations. In the table tennis task we evaluate the hit and return rates, outperforming previous approaches while using fewer task specific heuristics.
△ Less
Submitted 19 February, 2020; v1 submitted 31 August, 2018;
originally announced August 2018.
-
Towards Fine Grained Network Flow Prediction
Authors:
Patrick Jahnke,
Emmanuel Stapf,
Jonas Mieseler,
Gerhard Neumann,
Patrick Eugster
Abstract:
One main challenge for the design of networks is that traffic load is not generally known in advance. This makes it hard to adequately devote resources such as to best prevent or mitigate bottlenecks. While several authors have shown how to predict traffic in a coarse grained manner by aggregating flows, fine grained prediction of traffic at the level of individual flows, including bursty traffic,…
▽ More
One main challenge for the design of networks is that traffic load is not generally known in advance. This makes it hard to adequately devote resources such as to best prevent or mitigate bottlenecks. While several authors have shown how to predict traffic in a coarse grained manner by aggregating flows, fine grained prediction of traffic at the level of individual flows, including bursty traffic, is widely considered to be impossible. This paper shows, to the best of our knowledge, the first approach to fine grained per flow traffic prediction. In short, we introduce the Frequency-based Kernel Kalman Filter (FKKF), which predicts individual flows' behavior based on measurements. Our FKKF relies on the well known Kalman Filter in combination with a kernel to support the prediction of non linear functions. Furthermore we change the operating space from time to frequency space. In this space, into which we transform the input data via a Short-Time Fourier Transform (STFT), the peak structures of flows can be predicted after gleaning their key characteristics, with a Principal Component Analysis (PCA), from past and ongoing flows that stem from the same socket-to-socket connection. We demonstrate the effectiveness of our approach on popular benchmark traces from a university data center. Our approach predicts traffic on average across 17 out of 20 groups of flows with an average prediction error of 6.43% around 0.49 (average) seconds in advance, whilst existing coarse grained approaches exhibit prediction errors of 77% at best.
△ Less
Submitted 20 August, 2018;
originally announced August 2018.
-
Deep Reinforcement Learning for Swarm Systems
Authors:
Maximilian Hüttenrauch,
Adrian Šošić,
Gerhard Neumann
Abstract:
Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties…
▽ More
Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.
△ Less
Submitted 6 June, 2019; v1 submitted 17 July, 2018;
originally announced July 2018.
-
Agricultural Robotics: The Future of Robotic Agriculture
Authors:
Tom Duckett,
Simon Pearson,
Simon Blackmore,
Bruce Grieve,
Wen-Hua Chen,
Grzegorz Cielniak,
Jason Cleaversmith,
Jian Dai,
Steve Davis,
Charles Fox,
Pål From,
Ioannis Georgilas,
Richie Gill,
Iain Gould,
Marc Hanheide,
Alan Hunter,
Fumiya Iida,
Lyudmila Mihalyova,
Samia Nefti-Meziani,
Gerhard Neumann,
Paolo Paoletti,
Tony Pridmore,
Dave Ross,
Melvyn Smith,
Martin Stoelen
, et al. (5 additional authors not shown)
Abstract:
Agri-Food is the largest manufacturing sector in the UK. It supports a food chain that generates over £108bn p.a., with 3.9m employees in a truly international industry and exports £20bn of UK manufactured goods. However, the global food chain is under pressure from population growth, climate change, political pressures affecting migration, population drift from rural to urban regions and the demo…
▽ More
Agri-Food is the largest manufacturing sector in the UK. It supports a food chain that generates over £108bn p.a., with 3.9m employees in a truly international industry and exports £20bn of UK manufactured goods. However, the global food chain is under pressure from population growth, climate change, political pressures affecting migration, population drift from rural to urban regions and the demographics of an aging global population. These challenges are recognised in the UK Industrial Strategy white paper and backed by significant investment via a Wave 2 Industrial Challenge Fund Investment ("Transforming Food Production: from Farm to Fork"). Robotics and Autonomous Systems (RAS) and associated digital technologies are now seen as enablers of this critical food chain transformation. To meet these challenges, this white paper reviews the state of the art in the application of RAS in Agri-Food production and explores research and innovation needs to ensure these technologies reach their full potential and deliver the necessary impacts in the Agri-Food sector.
△ Less
Submitted 2 August, 2018; v1 submitted 18 June, 2018;
originally announced June 2018.
-
LightRel SemEval-2018 Task 7: Lightweight and Fast Relation Classification
Authors:
Tyler Renslow,
Günter Neumann
Abstract:
We present LightRel, a lightweight and fast relation classifier. Our goal is to develop a high baseline for different relation extraction tasks. By defining only very few data-internal, word-level features and external knowledge sources in the form of word clusters and word embeddings, we train a fast and simple linear classifier.
We present LightRel, a lightweight and fast relation classifier. Our goal is to develop a high baseline for different relation extraction tasks. By defining only very few data-internal, word-level features and external knowledge sources in the form of word clusters and word embeddings, we train a fast and simple linear classifier.
△ Less
Submitted 19 April, 2018;
originally announced April 2018.
-
Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning
Authors:
Maximilian Hüttenrauch,
Adrian Šošić,
Gerhard Neumann
Abstract:
Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given tas…
▽ More
Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.
△ Less
Submitted 18 July, 2018; v1 submitted 21 September, 2017;
originally announced September 2017.
-
Guided Deep Reinforcement Learning for Swarm Systems
Authors:
Maximilian Hüttenrauch,
Adrian Šošić,
Gerhard Neumann
Abstract:
In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability…
▽ More
In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.
△ Less
Submitted 18 September, 2017;
originally announced September 2017.
-
How Robust Are Character-Based Word Embeddings in Tagging and MT Against Wrod Scramlbing or Randdm Nouse?
Authors:
Georg Heigold,
Günter Neumann,
Josef van Genabith
Abstract:
This paper investigates the robustness of NLP against perturbed word forms. While neural approaches can achieve (almost) human-like accuracy for certain tasks and conditions, they often are sensitive to small changes in the input such as non-canonical input (e.g., typos). Yet both stability and robustness are desired properties in applications involving user-generated content, and the more as huma…
▽ More
This paper investigates the robustness of NLP against perturbed word forms. While neural approaches can achieve (almost) human-like accuracy for certain tasks and conditions, they often are sensitive to small changes in the input such as non-canonical input (e.g., typos). Yet both stability and robustness are desired properties in applications involving user-generated content, and the more as humans easily cope with such noisy or adversary conditions. In this paper, we study the impact of noisy input. We consider different noise distributions (one type of noise, combination of noise types) and mismatched noise distributions for training and testing. Moreover, we empirically evaluate the robustness of different models (convolutional neural networks, recurrent neural networks, non-neural models), different basic units (characters, byte pair encoding units), and different NLP tasks (morphological tagging, machine translation).
△ Less
Submitted 14 April, 2017;
originally announced April 2017.
-
Hybrid control trajectory optimization under uncertainty
Authors:
Joni Pajarinen,
Ville Kyrki,
Michael Koval,
Siddhartha Srinivasa,
Jan Peters,
Gerhard Neumann
Abstract:
Trajectory optimization is a fundamental problem in robotics. While optimization of continuous control trajectories is well developed, many applications require both discrete and continuous, i.e., hybrid, controls. Finding an optimal sequence of hybrid controls is challenging due to the exponential explosion of discrete control combinations. Our method, based on Differential Dynamic Programming (D…
▽ More
Trajectory optimization is a fundamental problem in robotics. While optimization of continuous control trajectories is well developed, many applications require both discrete and continuous, i.e., hybrid, controls. Finding an optimal sequence of hybrid controls is challenging due to the exponential explosion of discrete control combinations. Our method, based on Differential Dynamic Programming (DDP), circumvents this problem by incorporating discrete actions inside DDP: we first optimize continuous mixtures of discrete actions, and, subsequently force the mixtures into fully discrete actions. Moreover, we show how our approach can be extended to partially observable Markov decision processes (POMDPs) for trajectory planning under uncertainty. We validate the approach in a car driving problem where the robot has to switch discrete gears and in a box pushing application where the robot can switch the side of the box to push. The pose and the friction parameters of the pushed box are initially unknown and only indirectly observable.
△ Less
Submitted 2 March, 2017; v1 submitted 14 February, 2017;
originally announced February 2017.
-
On the next-to-minimal weight of projective Reed-Muller codes
Authors:
Cícero Carvalho,
Victor G. L. Neumann
Abstract:
In this paper we present several values for the next-to-minimal weights of projective Reed-Muller codes. We work over $\mathbb{F}_q$ with $q \geq 3$ since in IEEE-IT 62(11) p. 6300-6303 (2016) we have determined the complete values for the next-to-minimal weights of binary projective Reed-Muller codes. As in loc. cit. here we also find examples of codewords with next-to-minimal weight whose set of…
▽ More
In this paper we present several values for the next-to-minimal weights of projective Reed-Muller codes. We work over $\mathbb{F}_q$ with $q \geq 3$ since in IEEE-IT 62(11) p. 6300-6303 (2016) we have determined the complete values for the next-to-minimal weights of binary projective Reed-Muller codes. As in loc. cit. here we also find examples of codewords with next-to-minimal weight whose set of zeros is not in a hyperplane arrangement.
△ Less
Submitted 6 January, 2017;
originally announced January 2017.
-
The next-to-minimal weights of binary projective Reed-Muller codes
Authors:
Cícero Carvalho,
Victor G. L. Neumann
Abstract:
Projective Reed-Muller codes were introduced by Lachaud, in 1988 and their dimension and minimum distance were determined by Serre and Sørensen in 1991. In coding theory one is also interested in the higher Hamming weights, to study the code performance. Yet, not many values of the higher Hamming weights are known for these codes, not even the second lowest weight (also known as next-to-minimal we…
▽ More
Projective Reed-Muller codes were introduced by Lachaud, in 1988 and their dimension and minimum distance were determined by Serre and Sørensen in 1991. In coding theory one is also interested in the higher Hamming weights, to study the code performance. Yet, not many values of the higher Hamming weights are known for these codes, not even the second lowest weight (also known as next-to-minimal weight) is completely determined. In this paper we determine all the values of the next-to-minimal weight for the binary projective Reed-Muller codes, which we show to be equal to the next-to-minimal weight of Reed-Muller codes in most, but not all, cases.
△ Less
Submitted 6 January, 2017;
originally announced January 2017.
-
Policy Search with High-Dimensional Context Variables
Authors:
Voot Tangkaratt,
Herke van Hoof,
Simone Parisi,
Gerhard Neumann,
Jan Peters,
Masashi Sugiyama
Abstract:
Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, such a…
▽ More
Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, such as principal component analysis, is insufficient as task-relevant input may be ignored. In this paper, we propose a contextual policy search method in the model-based relative entropy stochastic search framework with integrated dimensionality reduction. We learn a model of the reward that is locally quadratic in both the policy parameters and the context variables. Furthermore, we perform supervised linear dimensionality reduction on the context variables by nuclear norm regularization. The experimental results show that the proposed method outperforms naive dimensionality reduction via principal component analysis and a state-of-the-art contextual policy search method.
△ Less
Submitted 10 November, 2016;
originally announced November 2016.
-
Model-Free Trajectory-based Policy Optimization with Monotonic Improvement
Authors:
Riad Akrour,
Abbas Abdolmaleki,
Hany Abdulsamad,
Jan Peters,
Gerhard Neumann
Abstract:
Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-t…
▽ More
Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control of physical systems. However, the linear approximation of the system dynamics can introduce a bias in the policy update and prevent convergence to the optimal policy. In this article, we propose a new model-free trajectory-based policy optimization algorithm with guaranteed monotonic improvement. The algorithm backpropagates a local, quadratic and time-dependent \qfunc~learned from trajectory data instead of a model of the system dynamics. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics. We experimentally demonstrate on highly non-linear control tasks the improvement in performance of our algorithm in comparison to approaches linearizing the system dynamics. In order to show the monotonic improvement of our algorithm, we additionally conduct a theoretical analysis of our policy update scheme to derive a lower bound of the change in policy return between successive iterations.
△ Less
Submitted 2 July, 2018; v1 submitted 29 June, 2016;
originally announced June 2016.