Search | arXiv e-print repository

Two-Channel Extended Kalman Filtering with Intermittent Measurements

Authors: Vicu-Mihalis Maer, Zsofia Lendek, Stefan Pirje, Domagoj Tolic, Antun Djuras, Vicko Prkacin, Ivana Palunko, Lucian Busoniu

Abstract: We consider two nonlinear state estimation problems in a setting where an extended Kalman filter receives measurements from two sets of sensors via two channels (2C). In the stochastic-2C problem, the channels drop measurements stochastically, whereas in 2C scheduling, the estimator chooses when to read each channel. In the first problem, we generalize linear-case 2C analysis to obtain -- for a gi… ▽ More We consider two nonlinear state estimation problems in a setting where an extended Kalman filter receives measurements from two sets of sensors via two channels (2C). In the stochastic-2C problem, the channels drop measurements stochastically, whereas in 2C scheduling, the estimator chooses when to read each channel. In the first problem, we generalize linear-case 2C analysis to obtain -- for a given pair of channel arrival rates -- boundedness conditions for the trace of the error covariance, as well as a worst-case upper bound. For scheduling, an optimization problem is solved to find arrival rates that balance low channel usage with low trace bounds, and channels are read deterministically with the expected periods corresponding to these arrival rates. We validate both solutions in simulations for linear and nonlinear dynamics; as well as in a real experiment with an underwater robot whose position is being intermittently found in a UAV camera image. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11424 [pdf, other]

3D exploration-based search for multiple targets using a UAV

Authors: Bilal Yousuf, Zsofia Lendek, Lucian Busoniu

Abstract: Consider an unmanned aerial vehicle (UAV) that searches for an unknown number of targets at unknown positions in 3D space. A particle filter uses imperfect measurements about the targets to update an intensity function that represents the expected number of targets. We propose a receding-horizon planner that selects the next UAV position by maximizing a joint, exploration and target-refinement obj… ▽ More Consider an unmanned aerial vehicle (UAV) that searches for an unknown number of targets at unknown positions in 3D space. A particle filter uses imperfect measurements about the targets to update an intensity function that represents the expected number of targets. We propose a receding-horizon planner that selects the next UAV position by maximizing a joint, exploration and target-refinement objective. Confidently localized targets are saved and removed from consideration. A nonlinear controller with an obstacle-avoidance component is used to reach the desired waypoints. We demonstrate the performance of our approach through a series of simulations, as well as in real-robot experiments with a Parrot Mambo drone that searches for targets from a constant altitude. The proposed planner works better than a lawnmower and a target-refinement-only method. △ Less

Submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.11410 [pdf, other]

doi 10.1109/ICSTCC59206.2023.10308458

Active search and coverage using point-cloud reinforcement learning

Authors: Matthias Rosynski, Alexandru Pop, Lucian Busoniu

Abstract: We consider a problem in which the trajectory of a mobile 3D sensor must be optimized so that certain objects are both found in the overall scene and covered by the point cloud, as fast as possible. This problem is called target search and coverage, and the paper provides an end-to-end deep reinforcement learning (RL) solution to solve it. The deep neural network combines four components: deep hie… ▽ More We consider a problem in which the trajectory of a mobile 3D sensor must be optimized so that certain objects are both found in the overall scene and covered by the point cloud, as fast as possible. This problem is called target search and coverage, and the paper provides an end-to-end deep reinforcement learning (RL) solution to solve it. The deep neural network combines four components: deep hierarchical feature learning occurs in the first stage, followed by multi-head transformers in the second, max-pooling and merging with bypassed information to preserve spatial relationships in the third, and a distributional dueling network in the last stage. To evaluate the method, a simulator is developed where cylinders must be found by a Kinect sensor. A network architecture study shows that deep hierarchical feature learning works for RL and that by using farthest point sampling (FPS) we can reduce the amount of points and achieve not only a reduction of the network size but also better results. We also show that multi-head attention for point-clouds helps to learn the agent faster but converges to the same outcome. Finally, we compare RL using the best network with a greedy baseline that maximizes immediate rewards and requires for that purpose an oracle that predicts the next observation. We decided RL achieves significantly better and more robust results than the greedy strategy. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Journal ref: Proceedings of 27th International Conference on System Theory, Control and Computing, October 11-13, 2023, Timisoara, Romania

arXiv:2312.11401 [pdf, other]

doi 10.1109/AQTR55203.2022.9802002

Underwater Robot Pose Estimation Using Acoustic Methods and Intermittent Position Measurements at the Surface

Authors: Vicu-Mihalis Maer, Levente Tamas, Lucian Busoniu

Abstract: Global positioning systems can provide sufficient positioning accuracy for large scale robotic tasks in open environments. However, in underwater environments, these systems cannot be directly used, and measuring the position of underwater robots becomes more difficult. In this paper we first evaluate the performance of existing pose estimation techniques for an underwater robot equipped with comm… ▽ More Global positioning systems can provide sufficient positioning accuracy for large scale robotic tasks in open environments. However, in underwater environments, these systems cannot be directly used, and measuring the position of underwater robots becomes more difficult. In this paper we first evaluate the performance of existing pose estimation techniques for an underwater robot equipped with commonly used sensors for underwater control and pose estimation, in a simulated environment. In our case these sensors are inertial measurement units, Doppler velocity log sensors, and ultra-short baseline sensors. Secondly, for situations in which underwater estimation suffers from drift, we investigate the benefit of intermittently correcting the position using a high-precision surface-based sensor, such as regular GPS or an assisting unmanned aerial vehicle that tracks the underwater robot from above using a camera. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Journal ref: Proceedings of the 2022 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR-22), 19-21 May 2022, Cluj-Napoca, Romania

arXiv:2312.11383 [pdf, ps, other]

doi 10.1109/CDC45484.2021.9683546

Path-aware optimistic optimization for a mobile robot

Authors: Tudor Santejudean, Lucian Busoniu

Abstract: We consider problems in which a mobile robot samples an unknown function defined over its operating space, so as to find a global optimum of this function. The path traveled by the robot matters, since it influences energy and time requirements. We consider a branch-and-bound algorithm called deterministic optimistic optimization, and extend it to the path-aware setting, obtaining path-aware optim… ▽ More We consider problems in which a mobile robot samples an unknown function defined over its operating space, so as to find a global optimum of this function. The path traveled by the robot matters, since it influences energy and time requirements. We consider a branch-and-bound algorithm called deterministic optimistic optimization, and extend it to the path-aware setting, obtaining path-aware optimistic optimization (OOPA). In this new algorithm, the robot decides how to move next via an optimal control problem that maximizes the long-term impact of the robot trajectory on lowering the upper bound, weighted by bound and function values to focus the search on the optima. An online version of value iteration is used to solve an approximate version of this optimal control problem. OOPA is evaluated in extensive experiments in two dimensions, where it does better than path-unaware and local-optimization baselines. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Journal ref: Proceedings of IEEE Conference on Decision and Control 2021, 9-13 December 2021, Austin, TX

arXiv:2305.08760 [pdf, ps, other]

Near-optimal control of nonlinear systems with hybrid inputs and dwell-time constraints

Authors: Ioana Lal, Constantin Morarescu, Jamal Daafouz, Lucian Busoniu

Abstract: We propose two new optimistic planning algorithms for nonlinear hybrid-input systems, in which the input has both a continuous and a discrete component, and the discrete component must respect a dwell-time constraint. Both algorithms select sets of input sequences for refinement at each step, along with a continuous or discrete step to refine (split). The dwell-time constraint means that the discr… ▽ More We propose two new optimistic planning algorithms for nonlinear hybrid-input systems, in which the input has both a continuous and a discrete component, and the discrete component must respect a dwell-time constraint. Both algorithms select sets of input sequences for refinement at each step, along with a continuous or discrete step to refine (split). The dwell-time constraint means that the discrete splits must keep the discrete mode constant if the required dwell-time is not yet reached. Convergence rate guarantees are provided for both algorithms, which show the dependency between the near-optimality of the sequence returned and the computational budget. The rates depend on a novel complexity measure of the dwell-time constrained problem. We present simulation results for two problems, an adaptive-quantization networked control system and a model for the COVID pandemic. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2210.07142 [pdf, ps, other]

Stability analysis of optimal control problems with time-dependent costs

Authors: Sifeddine Benahmed, Romain Postoyan, Mathieu Granzotto, Lucian Buşoniu, Jamal Daafouz, Dragan Nešić

Abstract: We present stability conditions for deterministic time-varying nonlinear discrete-time systems whose inputs aim to minimize an infinite-horizon time-dependent cost. Global asymptotic and exponential stability properties for general attractors are established. This work covers and generalizes the related results on discounted optimal control problems to more general systems and cost functions. We present stability conditions for deterministic time-varying nonlinear discrete-time systems whose inputs aim to minimize an infinite-horizon time-dependent cost. Global asymptotic and exponential stability properties for general attractors are established. This work covers and generalizes the related results on discounted optimal control problems to more general systems and cost functions. △ Less

Submitted 25 August, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

arXiv:2109.11088 [pdf, other]

Exploiting homogeneity for the optimal control of discrete-time systems: application to value iteration

Authors: Mathieu Granzotto, Romain Postoyan, Lucian Buşoniu, Dragan Nešić, Jamal Daafouz

Abstract: To investigate solutions of (near-)optimal control problems, we extend and exploit a notion of homogeneity recently proposed in the literature for discrete-time systems. Assuming the plant dynamics is homogeneous, we first derive a scaling property of its solutions along rays provided the sequence of inputs is suitably modified. We then consider homogeneous cost functions and reveal how the optima… ▽ More To investigate solutions of (near-)optimal control problems, we extend and exploit a notion of homogeneity recently proposed in the literature for discrete-time systems. Assuming the plant dynamics is homogeneous, we first derive a scaling property of its solutions along rays provided the sequence of inputs is suitably modified. We then consider homogeneous cost functions and reveal how the optimal value function scales along rays. This result can be used to construct (near-)optimal inputs on the whole state space by only solving the original problem on a given compact manifold of a smaller dimension. Compared to the related works of the literature, we impose no conditions on the homogeneity degrees. We demonstrate the strength of this new result by presenting a new approximate scheme for value iteration, which is one of the pillars of dynamic programming. The new algorithm provides guaranteed lower and upper estimates of the true value function at any iteration and has several appealing features in terms of reduced computation. A numerical case study is provided to illustrate the proposed algorithm. △ Less

Submitted 22 September, 2021; originally announced September 2021.

Comments: Long version (with proofs) of CDC 2021 paper

arXiv:2107.08690 [pdf, other]

doi 10.1109/LRA.2021.3096157

ObserveNet Control: A Vision-Dynamics Learning Approach to Predictive Control in Autonomous Vehicles

Authors: Cosmin Ginerica, Mihai Zaha, Florin Gogianu, Lucian Busoniu, Bogdan Trasnea, Sorin Grigorescu

Abstract: A key component in autonomous driving is the ability of the self-driving car to understand, track and predict the dynamics of the surrounding environment. Although there is significant work in the area of object detection, tracking and observations prediction, there is no prior work demonstrating that raw observations prediction can be used for motion planning and control. In this paper, we propos… ▽ More A key component in autonomous driving is the ability of the self-driving car to understand, track and predict the dynamics of the surrounding environment. Although there is significant work in the area of object detection, tracking and observations prediction, there is no prior work demonstrating that raw observations prediction can be used for motion planning and control. In this paper, we propose ObserveNet Control, which is a vision-dynamics approach to the predictive control problem of autonomous vehicles. Our method is composed of a: i) deep neural network able to confidently predict future sensory data on a time horizon of up to 10s and ii) a temporal planner designed to compute a safe vehicle state trajectory based on the predicted sensory data. Given the vehicle's historical state and sensing data in the form of Lidar point clouds, the method aims to learn the dynamics of the observed driving environment in a self-supervised manner, without the need to manually specify training labels. The experiments are performed both in simulation and real-life, using CARLA and RovisLab's AMTU mobile platform as a 1:4 scaled model of a car. We evaluate the capabilities of ObserveNet Control in aggressive driving contexts, such as overtaking maneuvers or side cut-off situations, while comparing the results with a baseline Dynamic Window Approach (DWA) and two state-of-the-art imitation learning systems, that is, Learning by Cheating (LBC) and World on Rails (WOR). △ Less

Submitted 19 July, 2021; originally announced July 2021.

Journal ref: IEEE Robotics and Automation Letters, 2021

arXiv:2105.05246 [pdf, other]

Spectral Normalisation for Deep Reinforcement Learning: an Optimisation Perspective

Authors: Florin Gogianu, Tudor Berariu, Mihaela Rosca, Claudia Clopath, Lucian Busoniu, Razvan Pascanu

Abstract: Most of the recent deep reinforcement learning advances take an RL-centric perspective and focus on refinements of the training objective. We diverge from this view and show we can recover the performance of these developments not by changing the objective, but by regularising the value-function estimator. Constraining the Lipschitz constant of a single layer using spectral normalisation is suffic… ▽ More Most of the recent deep reinforcement learning advances take an RL-centric perspective and focus on refinements of the training objective. We diverge from this view and show we can recover the performance of these developments not by changing the objective, but by regularising the value-function estimator. Constraining the Lipschitz constant of a single layer using spectral normalisation is sufficient to elevate the performance of a Categorical-DQN agent to that of a more elaborated \rainbow{} agent on the challenging Atari domain. We conduct ablation studies to disentangle the various effects normalisation has on the learning dynamics and show that is sufficient to modulate the parameter updates to recover most of the performance of spectral normalisation. These findings hint towards the need to also focus on the neural component and its learning dynamics to tackle the peculiarities of Deep Reinforcement Learning. △ Less

Submitted 11 May, 2021; originally announced May 2021.

Comments: Accepted at ICML2021

arXiv:2011.10167 [pdf, ps, other]

When to stop value iteration: stability and near-optimality versus computation

Authors: Mathieu Granzotto, Romain Postoyan, Dragan Nešić, Lucian Buşoniu, Jamal Daafouz

Abstract: Value iteration (VI) is a ubiquitous algorithm for optimal control, planning, and reinforcement learning schemes. Under the right assumptions, VI is a vital tool to generate inputs with desirable properties for the controlled system, like optimality and Lyapunov stability. As VI usually requires an infinite number of iterations to solve general nonlinear optimal control problems, a key question is… ▽ More Value iteration (VI) is a ubiquitous algorithm for optimal control, planning, and reinforcement learning schemes. Under the right assumptions, VI is a vital tool to generate inputs with desirable properties for the controlled system, like optimality and Lyapunov stability. As VI usually requires an infinite number of iterations to solve general nonlinear optimal control problems, a key question is when to terminate the algorithm to produce a "good" solution, with a measurable impact on optimality and stability guarantees. By carefully analysing VI under general stabilizability and detectability properties, we provide explicit and novel relationships of the stop** criterion's impact on near-optimality, stability and performance, thus allowing to tune these desirable properties against the induced computational cost. The considered class of stop** criteria encompasses those encountered in the control, dynamic programming and reinforcement learning literature and it allows considering new ones, which may be useful to further reduce the computational cost while endowing and satisfying stability and near-optimality properties. We therefore lay a foundation to endow machine learning schemes based on VI with stability and performance guarantees, while reducing computational complexity. △ Less

Submitted 19 November, 2020; originally announced November 2020.

Comments: Submitted for 3rd L4DC

arXiv:2011.09193 [pdf, ps, other]

Learning control for transmission and navigation with a mobile robot under unknown communication rates

Authors: L. Busoniu, V. S. Varma, J. Loheac, A. Codrean, O. Stefan, I. -C. Morarescu, S. Lasaulce

Abstract: In tasks such as surveying or monitoring remote regions, an autonomous robot must move while transmitting data over a wireless network with unknown, position-dependent transmission rates. For such a robot, this paper considers the problem of transmitting a data buffer in minimum time, while possibly also navigating towards a goal position. Two approaches are proposed, each consisting of a machine-… ▽ More In tasks such as surveying or monitoring remote regions, an autonomous robot must move while transmitting data over a wireless network with unknown, position-dependent transmission rates. For such a robot, this paper considers the problem of transmitting a data buffer in minimum time, while possibly also navigating towards a goal position. Two approaches are proposed, each consisting of a machine-learning component that estimates the rate function from samples; and of an optimal-control component that moves the robot given the current rate function estimate. Simple obstacle avoidance is performed for the case without a goal position. In extensive simulations, these methods achieve competitive performance compared to known-rate and unknown-rate baselines. A real indoor experiment is provided in which a Parrot AR.Drone 2 successfully learns to transmit the buffer. △ Less

Submitted 18 November, 2020; originally announced November 2020.

Comments: Control Engineering Practice, Elsevier, Vol. 100, July 2020, 104460

arXiv:2011.08639 [pdf, other]

Space-time budget allocation policy design for viral marketing

Authors: I. C. Morarescu, V. S. Varma, L. Busoniu, S. Lasaulce

Abstract: We address formally the problem of opinion dynamics when the agents of a social network (e.g., consumers) are not only influenced by their neighbors but also by an external influential entity referred to as a marketer. The influential entity tries to sway the overall opinion as close as possible to a desired opinion by using a specific influence budget. We assume that the exogenous influences of t… ▽ More We address formally the problem of opinion dynamics when the agents of a social network (e.g., consumers) are not only influenced by their neighbors but also by an external influential entity referred to as a marketer. The influential entity tries to sway the overall opinion as close as possible to a desired opinion by using a specific influence budget. We assume that the exogenous influences of the entity happen during discrete-time advertising campaigns; consequently, the overall closed-loop opinion dynamics becomes a linear-impulsive (hybrid) one. The main technical issue addressed is finding how the marketer should allocate its budget over time (through marketing campaigns) and over space (among the agents) such that the agents' opinion be as close as possible to the desired opinion. Our main results show that the marketer has to prioritize certain agents over others based on their initial condition, their influence power in the social graph and the size of the cluster they belong to. The corresponding space-time allocation problem is formulated and solved for several special cases of practical interest. Valuable insights can be extracted from our analysis. For instance, for most cases, we prove that the marketer has an interest in investing most of its budget at the beginning of the process and that budget should be shared among agents according to the famous water-filling allocation rule. Numerical examples illustrate the analysis. △ Less

Submitted 17 November, 2020; originally announced November 2020.

Comments: Journal on Nonlinear Analysis: Hybrid Systems (NAHS), Vol. 37, Aug. 2020

arXiv:1908.01404 [pdf, other]

Optimistic planning for the near-optimal control of nonlinear switched discrete-time systems with stability guarantees

Authors: Mathieu Granzotto, Romain Postoyan, Lucian Buşoniu, Dragan Nešić, Jamal Daafouz

Abstract: Originating in the artificial intelligence literature, optimistic planning (OP) is an algorithm that generates near-optimal control inputs for generic nonlinear discrete-time systems whose input set is finite. This technique is therefore relevant for the near-optimal control of nonlinear switched systems, for which the switching signal is the control. However, OP exhibits several limitations, whic… ▽ More Originating in the artificial intelligence literature, optimistic planning (OP) is an algorithm that generates near-optimal control inputs for generic nonlinear discrete-time systems whose input set is finite. This technique is therefore relevant for the near-optimal control of nonlinear switched systems, for which the switching signal is the control. However, OP exhibits several limitations, which prevent its application in a standard control context. First, it requires the stage cost to take values in [0,1], an unnatural prerequisite as it excludes, for instance, quadratic stage costs. Second, it requires the cost function to be discounted. Third, it applies for reward maximization, and not cost minimization. In this paper, we modify OP to overcome these limitations, and we call the new algorithm OPmin. We then make stabilizability and detectability assumptions, under which we derive near-optimality guarantees for OPmin and we show that the obtained bound has major advantages compared to the bound originally given by OP. In addition, we prove that a system whose inputs are generated by OPmin in a receding-horizon fashion exhibits stability properties. As a result, OPmin provides a new tool for the near-optimal, stable control of nonlinear switched discrete-time systems for generic cost functions. △ Less

Submitted 4 August, 2019; originally announced August 2019.

Comments: 8 pages, 2019 conference in decision and control, longer version submitted for reviewers

arXiv:1602.04146 [pdf, other]

Decoupled Dynamics Distributed Control for Strings of Nonlinear Autonomous Agents

Authors: Serban Sabau, Irinel-Constantin Morarescu, Lucian Busoniu, Ali Jadbabaie

Abstract: We introduce a distributed control architecture for a class of heterogeneous, nonlinear dynamical agents moving in the "string" formation, while guaranteeing trajectory tracking, collision avoidance and the preservation of the formation's topology. Each autonomous agent uses information and relative measurements only with respect to its predecessor in the string. The performance of the scheme is i… ▽ More We introduce a distributed control architecture for a class of heterogeneous, nonlinear dynamical agents moving in the "string" formation, while guaranteeing trajectory tracking, collision avoidance and the preservation of the formation's topology. Each autonomous agent uses information and relative measurements only with respect to its predecessor in the string. The performance of the scheme is independent of the number of agents in the network and also on the agent's relative position in the network. The scalability is a consequence of the "decoupling" of a certain bounded approximation of the closed--loop equations, which allows the regulation and controller design (at each agent) to be done individually, in a completely decentralized manner. A practical method for compensating communication induced delays is also presented. Numerical examples illustrate the effectiveness and the main features of the proposed approach. △ Less

Submitted 17 June, 2018; v1 submitted 12 February, 2016; originally announced February 2016.

Comments: 14 pages, 7 figures, extended and revised version

Showing 1–15 of 15 results for author: Buşoniu, L