-
Beyond Optimism: Exploration With Partially Observable Rewards
Authors:
Simone Parisi,
Alireza Kazemipour,
Michael Bowling
Abstract:
Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration and reward discovery, popular algorithms rely on optimism. But what if sometimes rewards are unobservable, e.g., situations of partial monitoring in bandits and…
▽ More
Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration and reward discovery, popular algorithms rely on optimism. But what if sometimes rewards are unobservable, e.g., situations of partial monitoring in bandits and the recent formalism of monitored Markov decision process? In this case, optimism can lead to suboptimal behavior that does not explore further to collapse uncertainty. With this paper, we present a novel exploration strategy that overcomes the limitations of existing methods and guarantees convergence to an optimal policy even when rewards are not always observable. We further propose a collection of tabular environments for benchmarking exploration in RL (with and without unobservable rewards) and show that our method outperforms existing ones.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Self-Sensing Feedback Control of an Electrohydraulic Robotic Shoulder
Authors:
Clemens C. Christoph,
Amirhossein Kazemipour,
Michel R. Vogt,
Yu Zhang,
Robert K. Katzschmann
Abstract:
The human shoulder, with its glenohumeral joint, tendons, ligaments, and muscles, allows for the execution of complex tasks with precision and efficiency. However, current robotic shoulder designs lack the compliance and compactness inherent in their biological counterparts. A major limitation of these designs is their reliance on external sensors like rotary encoders, which restrict mechanical jo…
▽ More
The human shoulder, with its glenohumeral joint, tendons, ligaments, and muscles, allows for the execution of complex tasks with precision and efficiency. However, current robotic shoulder designs lack the compliance and compactness inherent in their biological counterparts. A major limitation of these designs is their reliance on external sensors like rotary encoders, which restrict mechanical joint design and introduce bulk to the system. To address this constraint, we present a bio-inspired antagonistic robotic shoulder with two degrees of freedom powered by self-sensing hydraulically amplified self-healing electrostatic actuators. Our artificial muscle design decouples the high-voltage electrostatic actuation from the pair of low-voltage self-sensing electrodes. This approach allows for proprioceptive feedback control of trajectories in the task space while eliminating the necessity for any additional sensors. We assess the platform's efficacy by comparing it to a feedback control based on position data provided by a motion capture system. The study demonstrates closed-loop controllable robotic manipulators based on an inherent self-sensing capability of electrohydraulic actuators. The proposed architecture can serve as a basis for complex musculoskeletal joint arrangements.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
High-Frequency Capacitive Sensing for Electrohydraulic Soft Actuators
Authors:
Michel R. Vogt,
Maximilian Eberlein,
Clemens C. Christoph,
Felix Baumann,
Fabrice Bourquin,
Wim Wende,
Fabio Schaub,
Amirhossein Kazemipour,
Robert K. Katzschmann
Abstract:
The need for compliant and proprioceptive actuators has grown more evident in pursuing more adaptable and versatile robotic systems. Hydraulically Amplified Self-Healing Electrostatic (HASEL) actuators offer distinctive advantages with their inherent softness and flexibility, making them promising candidates for various robotic tasks, including delicate interactions with humans and animals, biomim…
▽ More
The need for compliant and proprioceptive actuators has grown more evident in pursuing more adaptable and versatile robotic systems. Hydraulically Amplified Self-Healing Electrostatic (HASEL) actuators offer distinctive advantages with their inherent softness and flexibility, making them promising candidates for various robotic tasks, including delicate interactions with humans and animals, biomimetic locomotion, prosthetics, and exoskeletons. This has resulted in a growing interest in the capacitive self-sensing capabilities of HASEL actuators to create miniature displacement estimation circuitry that does not require external sensors. However, achieving HASEL self-sensing for actuation frequencies above 1 Hz and with miniature high-voltage power supplies has remained limited. In this paper, we introduce the F-HASEL actuator, which adds an additional electrode pair used exclusively for capacitive sensing to a Peano-HASEL actuator. We demonstrate displacement estimation of the F-HASEL during high-frequency actuation up to 20 Hz and during external loading using miniaturized circuitry comprised of low-cost off-the-shelf components and a miniature high-voltage power supply. Finally, we propose a circuitry to estimate the displacement of multiple F-HASELs and demonstrate it in a wearable application to track joint rotations of a virtual reality user in real-time.
△ Less
Submitted 8 April, 2024; v1 submitted 5 April, 2024;
originally announced April 2024.
-
Monitored Markov Decision Processes
Authors:
Simone Parisi,
Montaser Mohammedalamen,
Alireza Kazemipour,
Matthew E. Taylor,
Michael Bowling
Abstract:
In reinforcement learning (RL), an agent learns to perform a task by interacting with an environment and receiving feedback (a numerical reward) for its actions. However, the assumption that rewards are always observable is often not applicable in real-world problems. For example, the agent may need to ask a human to supervise its actions or activate a monitoring system to receive feedback. There…
▽ More
In reinforcement learning (RL), an agent learns to perform a task by interacting with an environment and receiving feedback (a numerical reward) for its actions. However, the assumption that rewards are always observable is often not applicable in real-world problems. For example, the agent may need to ask a human to supervise its actions or activate a monitoring system to receive feedback. There may even be a period of time before rewards become observable, or a period of time after which rewards are no longer given. In other words, there are cases where the environment generates rewards in response to the agent's actions but the agent cannot observe them. In this paper, we formalize a novel but general RL framework - Monitored MDPs - where the agent cannot always observe rewards. We discuss the theoretical and practical consequences of this setting, show challenges raised even in toy environments, and propose algorithms to begin to tackle this novel setting. This paper introduces a powerful new formalism that encompasses both new and existing problems and lays the foundation for future research.
△ Less
Submitted 13 February, 2024; v1 submitted 9 February, 2024;
originally announced February 2024.
-
Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World
Authors:
Nico Gürtler,
Felix Widmaier,
Cansu Sancaktar,
Sebastian Blaes,
Pavel Kolev,
Stefan Bauer,
Manuel Wüthrich,
Markus Wulfmeier,
Martin Riedmiller,
Arthur Allshire,
Qiang Wang,
Robert McCarthy,
Hangyeol Kim,
Jongchan Baek,
Wookyong Kwon,
Shanliang Qian,
Yasunori Toshimitsu,
Mike Yan Michelis,
Amirhossein Kazemipour,
Arman Raayatsanati,
Hehui Zheng,
Barnabas Gavin Cangan,
Bernhard Schölkopf,
Georg Martius
Abstract:
Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore…
▽ More
Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore served as a bridge between the RL and robotics communities by allowing participants to experiment remotely with a real robot - as easily as in simulation.
In the last years, offline reinforcement learning has matured into a promising paradigm for learning from pre-collected datasets, alleviating the reliance on expensive online interactions. We therefore asked the participants to learn two dexterous manipulation tasks involving pushing, gras**, and in-hand orientation from provided real-robot datasets. An extensive software documentation and an initial stage based on a simulation of the real set-up made the competition particularly accessible. By giving each team plenty of access budget to evaluate their offline-learned policies on a cluster of seven identical real TriFinger platforms, we organized an exciting competition for machine learners and roboticists alike.
In this work we state the rules of the competition, present the methods used by the winning teams and compare their results with a benchmark of state-of-the-art offline RL algorithms on the challenge datasets.
△ Less
Submitted 24 November, 2023; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Low Voltage Electrohydraulic Actuators for Untethered Robotics
Authors:
Stephan-Daniel Gravert,
Elia Varini,
Amirhossein Kazemipour,
Mike Y. Michelis,
Thomas Buchner,
Ronan Hinchet,
Robert K. Katzschmann
Abstract:
Rigid robots can be precise in repetitive tasks, but struggle in unstructured environments. Nature's versatility in such environments inspires researchers to develop biomimetic robots that incorporate compliant and contracting artificial muscles. Among the recently proposed artificial muscle technologies, electrohydraulic actuators are promising since they offer performance comparable to that of m…
▽ More
Rigid robots can be precise in repetitive tasks, but struggle in unstructured environments. Nature's versatility in such environments inspires researchers to develop biomimetic robots that incorporate compliant and contracting artificial muscles. Among the recently proposed artificial muscle technologies, electrohydraulic actuators are promising since they offer performance comparable to that of mammalian muscles in terms of speed and power density. However, they require high driving voltages and have safety concerns due to exposed electrodes. These high voltages lead to either bulky or inefficient driving electronics that make untethered, high-degree-of-freedom bio-inspired robots difficult to realize. Here, we present hydraulically amplified low voltage electrostatic (HALVE) actuators that match mammalian skeletal muscles in average power density (50.5 W kg-1) and peak strain rate (971 % s-1) at a driving voltage of just 1100 V. This driving voltage is approx. 5-7 times lower compared to other electrohydraulic actuators using paraelectric dielectrics. Furthermore, HALVE actuators are safe to touch, waterproof, and self-clearing, which makes them easy to implement in wearables and robotics. We characterize, model, and physically validate key performance metrics of the actuator and compare its performance to state-of-the-art electrohydraulic designs. Finally, we demonstrate the utility of our actuators on two muscle-based electrohydraulic robots: an untethered soft robotic swimmer and a robotic gripper. We foresee that HALVE actuators can become a key building block for future highly-biomimetic untethered robots and wearables with many independent artificial muscles such as biomimetic hands, faces, or exoskeletons.
△ Less
Submitted 30 August, 2023; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Kinematic Control of Redundant Robots with Online Handling of Variable Generalized Hard Constraints
Authors:
Amirhossein Kazemipour,
Maram Khatib,
Khaled Al Khudir,
Claudio Gaz,
Alessandro De Luca
Abstract:
We present a generalized version of the Saturation in the Null Space (SNS) algorithm for the task control of redundant robots when hard inequality constraints are simultaneously present both in the joint and in the Cartesian space. These hard bounds should never be violated, are treated equally and in a unified way by the algorithm, and may also be varied, inserted or deleted online. When a joint/…
▽ More
We present a generalized version of the Saturation in the Null Space (SNS) algorithm for the task control of redundant robots when hard inequality constraints are simultaneously present both in the joint and in the Cartesian space. These hard bounds should never be violated, are treated equally and in a unified way by the algorithm, and may also be varied, inserted or deleted online. When a joint/Cartesian bound saturates, the robot redundancy is exploited to continue fulfilling the primary task. If no feasible solution exists, an optimal scaling procedure is applied to enforce directional consistency with the original task. Simulation and experimental results on different robotic systems demonstrate the efficiency of the approach. The proposed algorithm can be viewed as a generic platform that is easily applicable to any robotic application in which robots operate in an unstructured environment and online handling of joint and Cartesian constraints is critical.
△ Less
Submitted 26 February, 2022;
originally announced February 2022.
-
Dynamic Task Space Control Enables Soft Manipulators to Perform Real-World Tasks
Authors:
Oliver Fischer,
Yasunori Toshimitsu,
Amirhossein Kazemipour,
Robert K. Katzschmann
Abstract:
Dynamic motions are a key feature of robotic arms, enabling them to perform tasks quickly and efficiently. Soft continuum manipulators do not currently consider dynamic parameters when operating in task space. This shortcoming makes existing soft robots slow and limits their ability to deal with external forces, especially during object manipulation. We address this issue by using dynamic operatio…
▽ More
Dynamic motions are a key feature of robotic arms, enabling them to perform tasks quickly and efficiently. Soft continuum manipulators do not currently consider dynamic parameters when operating in task space. This shortcoming makes existing soft robots slow and limits their ability to deal with external forces, especially during object manipulation. We address this issue by using dynamic operational space control. Our control approach takes into account the dynamic parameters of the 3D continuum arm and introduces new models that enable multi-segment soft manipulators to operate smoothly in task space. Advanced control methods, previously afforded only to rigid robots, are now adapted to soft robots; for example, potential field avoidance was previously only shown for rigid robots and is now extended to soft robots. Using our approach, a soft manipulator can now achieve a variety of tasks that were previously not possible: we evaluate the manipulator's performance in closed-loop controlled experiments such as pick-and-place, obstacle avoidance, throwing objects using an attached soft gripper, and deliberately applying forces to a surface by drawing with a grasped piece of chalk. Besides the newly enabled skills, our approach improves tracking accuracy by 59% and increases speed by a factor of 19.3 compared to state of the art for task space control. With these newfound abilities, soft robots can start to challenge rigid robots in the field of manipulation. Our inherently safe and compliant soft robot moves the future of robotic manipulation towards a cageless setup where humans and robots work in parallel.
△ Less
Submitted 18 October, 2022; v1 submitted 6 January, 2022;
originally announced January 2022.
-
Motion Control of Redundant Robots with Generalised Inequality Constraints
Authors:
Amirhossein Kazemipour,
Maram Khatib,
Khaled Al Khudir,
Alessandro De Luca
Abstract:
We present an improved version of the Saturation in the Null Space (SNS) algorithm for redundancy resolution at the velocity level. In addition to hard bounds on joint space motion, we consider also Cartesian box constraints that cannot be violated at any time. The modified algorithm combines all bounds into a single augmented generalised vector and gives equal, highest priority to all inequality…
▽ More
We present an improved version of the Saturation in the Null Space (SNS) algorithm for redundancy resolution at the velocity level. In addition to hard bounds on joint space motion, we consider also Cartesian box constraints that cannot be violated at any time. The modified algorithm combines all bounds into a single augmented generalised vector and gives equal, highest priority to all inequality constraints. When needed, feasibility of the original task is enforced by the SNS task scaling procedure. Simulation results are reported for a 6R planar robot.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Adaptive Dynamic Sliding Mode Control of Soft Continuum Manipulators
Authors:
Amirhossein Kazemipour,
Oliver Fischer,
Yasunori Toshimitsu,
Ki Wan Wong,
Robert K. Katzschmann
Abstract:
Soft robots are made of compliant materials and perform tasks that are challenging for rigid robots. However, their continuum nature makes it difficult to develop model-based control strategies. This work presents a robust model-based control scheme for soft continuum robots. Our dynamic model is based on the Euler-Lagrange approach, but it uses a more accurate description of the robot's inertia a…
▽ More
Soft robots are made of compliant materials and perform tasks that are challenging for rigid robots. However, their continuum nature makes it difficult to develop model-based control strategies. This work presents a robust model-based control scheme for soft continuum robots. Our dynamic model is based on the Euler-Lagrange approach, but it uses a more accurate description of the robot's inertia and does not include oversimplified assumptions. Based on this model, we introduce an adaptive sliding mode control scheme, which is robust against model parameter uncertainties and unknown input disturbances. We perform a series of experiments with a physical soft continuum arm to evaluate the effectiveness of our controller at tracking task-space trajectory under different payloads. The tracking performance of the controller is around 38\% more accurate than that of a state-of-the-art controller, i.e., the inverse dynamics method. Moreover, the proposed model-based control design is flexible and can be generalized to any continuum robotic arm with an arbitrary number of segments. With this control strategy, soft robotic object manipulation can become more accurate while remaining robust to disturbances.
△ Less
Submitted 26 February, 2022; v1 submitted 23 September, 2021;
originally announced September 2021.
-
Measurement comparison among Time-Domain, FTIR and VNA-based spectrometers in the THz frequency range
Authors:
L. Oberto,
M. Bisi,
A. Kazemipour,
A. Steiger,
T. Kleine-Ostmann,
T. Schrader
Abstract:
In this paper we present the outcome of the first international comparison in the terahertz frequency range among three different kinds of spectrometers. A Fourier-Transform infrared spectrometer, a Vector Network Analyzer and a Time-Domain Spectrometer have been employed for measuring the complex refractive index of three travelling standards made of selected dielectric materials in order to offe…
▽ More
In this paper we present the outcome of the first international comparison in the terahertz frequency range among three different kinds of spectrometers. A Fourier-Transform infrared spectrometer, a Vector Network Analyzer and a Time-Domain Spectrometer have been employed for measuring the complex refractive index of three travelling standards made of selected dielectric materials in order to offer a wide enough range of parameters to be measured. The three spectrometers have been compared in terms of measurement capability and uncertainty.
△ Less
Submitted 10 February, 2021;
originally announced February 2021.
-
Avoiding Spurious Local Minima in Deep Quadratic Networks
Authors:
Abbas Kazemipour,
Brett W. Larsen,
Shaul Druckmann
Abstract:
Despite their practical success, a theoretical understanding of the loss landscape of neural networks has proven challenging due to the high-dimensional, non-convex, and highly nonlinear structure of such models. In this paper, we characterize the training landscape of the mean squared error loss for neural networks with quadratic activation functions. We prove existence of spurious local minima a…
▽ More
Despite their practical success, a theoretical understanding of the loss landscape of neural networks has proven challenging due to the high-dimensional, non-convex, and highly nonlinear structure of such models. In this paper, we characterize the training landscape of the mean squared error loss for neural networks with quadratic activation functions. We prove existence of spurious local minima and saddle points which can be escaped easily with probability one when the number of neurons is greater than or equal to the input dimension and the norm of the training samples is used as a regressor. We prove that deep overparameterized neural networks with quadratic activations benefit from similar nice landscape properties. Our theoretical results are independent of data distribution and fill the existing gap in theory for two-layer quadratic neural networks. Finally, we empirically demonstrate convergence to a global minimum for these problems.
△ Less
Submitted 19 July, 2020; v1 submitted 31 December, 2019;
originally announced January 2020.
-
Compressed Sensing Beyond the IID and Static Domains: Theory, Algorithms and Applications
Authors:
Abbas Kazemipour
Abstract:
Sparsity is a ubiquitous feature of many real world signals such as natural images and neural spiking activities. Conventional compressed sensing utilizes sparsity to recover low dimensional signal structures in high ambient dimensions using few measurements, where i.i.d measurements are at disposal. However real world scenarios typically exhibit non i.i.d and dynamic structures and are confined b…
▽ More
Sparsity is a ubiquitous feature of many real world signals such as natural images and neural spiking activities. Conventional compressed sensing utilizes sparsity to recover low dimensional signal structures in high ambient dimensions using few measurements, where i.i.d measurements are at disposal. However real world scenarios typically exhibit non i.i.d and dynamic structures and are confined by physical constraints, preventing applicability of the theoretical guarantees of compressed sensing and limiting its applications. In this thesis we develop new theory, algorithms and applications for non i.i.d and dynamic compressed sensing by considering such constraints. In the first part of this thesis we derive new optimal sampling-complexity tradeoffs for two commonly used processes used to model dependent temporal structures: the autoregressive processes and self-exciting generalized linear models. Our theoretical results successfully recovered the temporal dependencies in neural activities, financial data and traffic data. Next, we develop a new framework for studying temporal dynamics by introducing compressible state-space models, which simultaneously utilize spatial and temporal sparsity. We develop a fast algorithm for optimal inference on such models and prove its optimal recovery guarantees. Our algorithm shows significant improvement in detecting sparse events in biological applications such as spindle detection and calcium deconvolution. Finally, we develop a sparse Poisson image reconstruction technique and the first compressive two-photon microscope which uses lines of excitation across the sample at multiple angles. We recovered diffraction-limited images from relatively few incoherently multiplexed measurements, at a rate of 1.5 billion voxels per second.
△ Less
Submitted 28 June, 2018;
originally announced June 2018.
-
Efficient Estimation of Compressible State-Space Models with Application to Calcium Signal Deconvolution
Authors:
Abbas Kazemipour,
Ji Liu,
Patrick Kanold,
Min Wu,
Behtash Babadi
Abstract:
In this paper, we consider linear state-space models with compressible innovations and convergent transition matrices in order to model spatiotemporally sparse transient events. We perform parameter and state estimation using a dynamic compressed sensing framework and develop an efficient solution consisting of two nested Expectation-Maximization (EM) algorithms. Under suitable sparsity assumption…
▽ More
In this paper, we consider linear state-space models with compressible innovations and convergent transition matrices in order to model spatiotemporally sparse transient events. We perform parameter and state estimation using a dynamic compressed sensing framework and develop an efficient solution consisting of two nested Expectation-Maximization (EM) algorithms. Under suitable sparsity assumptions on the innovations, we prove recovery guarantees and derive confidence bounds for the state estimates. We provide simulation studies as well as application to spike deconvolution from calcium imaging data which verify our theoretical results and show significant improvement over existing algorithms.
△ Less
Submitted 20 October, 2016;
originally announced October 2016.
-
Sampling Requirements for Stable Autoregressive Estimation
Authors:
Abbas Kazemipour,
Sina Miran,
Piya Pal,
Behtash Babadi,
Min Wu
Abstract:
We consider the problem of estimating the parameters of a linear univariate autoregressive model with sub-Gaussian innovations from a limited sequence of consecutive observations. Assuming that the parameters are compressible, we analyze the performance of the $\ell_1$-regularized least squares as well as a greedy estimator of the parameters and characterize the sampling trade-offs required for st…
▽ More
We consider the problem of estimating the parameters of a linear univariate autoregressive model with sub-Gaussian innovations from a limited sequence of consecutive observations. Assuming that the parameters are compressible, we analyze the performance of the $\ell_1$-regularized least squares as well as a greedy estimator of the parameters and characterize the sampling trade-offs required for stable recovery in the non-asymptotic regime. In particular, we show that for a fixed sparsity level, stable recovery of AR parameters is possible when the number of samples scale sub-linearly with the AR order. Our results improve over existing sampling complexity requirements in AR estimation using the LASSO, when the sparsity level scales faster than the square root of the model order. We further derive sufficient conditions on the sparsity level that guarantee the minimax optimality of the $\ell_1$-regularized least squares estimate. Applying these techniques to simulated data as well as real-world datasets from crude oil prices and traffic speed data confirm our predicted theoretical performance gains in terms of estimation accuracy and model selection.
△ Less
Submitted 17 January, 2017; v1 submitted 4 May, 2016;
originally announced May 2016.
-
Robust Estimation of Self-Exciting Generalized Linear Models with Application to Neuronal Modeling
Authors:
Abbas Kazemipour,
Min Wu,
Behtash Babadi
Abstract:
We consider the problem of estimating self-exciting generalized linear models from limited binary observations, where the history of the process serves as the covariate. We analyze the performance of two classes of estimators, namely the $\ell_1$-regularized maximum likelihood and greedy estimators, for a canonical self-exciting process and characterize the sampling tradeoffs required for stable r…
▽ More
We consider the problem of estimating self-exciting generalized linear models from limited binary observations, where the history of the process serves as the covariate. We analyze the performance of two classes of estimators, namely the $\ell_1$-regularized maximum likelihood and greedy estimators, for a canonical self-exciting process and characterize the sampling tradeoffs required for stable recovery in the non-asymptotic regime. Our results extend those of compressed sensing for linear and generalized linear models with i.i.d. covariates to those with highly inter-dependent covariates. We further provide simulation studies as well as application to real spiking data from the mouse's lateral geniculate nucleus and the ferret's retinal ganglion cells which agree with our theoretical predictions.
△ Less
Submitted 22 March, 2017; v1 submitted 14 July, 2015;
originally announced July 2015.