-
Optimal Sensing via Multi-armed Bandit Relaxations in Mixed Observability Domains
Authors:
Mikko Lauri,
Risto Ritala
Abstract:
Sequential decision making under uncertainty is studied in a mixed observability domain. The goal is to maximize the amount of information obtained on a partially observable stochastic process under constraints imposed by a fully observable internal state. An upper bound for the optimal value function is derived by relaxing constraints. We identify conditions under which the relaxed problem is a m…
▽ More
Sequential decision making under uncertainty is studied in a mixed observability domain. The goal is to maximize the amount of information obtained on a partially observable stochastic process under constraints imposed by a fully observable internal state. An upper bound for the optimal value function is derived by relaxing constraints. We identify conditions under which the relaxed problem is a multi-armed bandit whose optimal policy is easily computable. The upper bound is applied to prune the search space in the original problem, and the effect on solution quality is assessed via simulation experiments. Empirical results show effective pruning of the search space in a target monitoring domain.
△ Less
Submitted 15 March, 2016;
originally announced March 2016.
-
Optimizing Gaze Direction in a Visual Navigation Task
Authors:
Tuomas Välimäki,
Risto Ritala
Abstract:
Navigation in an unknown environment consists of multiple separable subtasks, such as collecting information about the surroundings and navigating to the current goal. In the case of pure visual navigation, all these subtasks need to utilize the same vision system, and therefore a way to optimally control the direction of focus is needed. We present a case study, where we model the active sensing…
▽ More
Navigation in an unknown environment consists of multiple separable subtasks, such as collecting information about the surroundings and navigating to the current goal. In the case of pure visual navigation, all these subtasks need to utilize the same vision system, and therefore a way to optimally control the direction of focus is needed. We present a case study, where we model the active sensing problem of directing the gaze of a mobile robot with three machine vision cameras as a partially observable Markov decision process (POMDP) using a mutual information (MI) based reward function. The key aspect of the solution is that the cameras are dynamically used either in monocular or stereo configuration. The benefits of using the proposed active sensing implementation are demonstrated with simulations and experiments on a real robot.
△ Less
Submitted 16 February, 2016;
originally announced February 2016.
-
Myopic Policy Bounds for Information Acquisition POMDPs
Authors:
Mikko Lauri,
Nikolay Atanasov,
George J. Pappas,
Risto Ritala
Abstract:
This paper addresses the problem of optimal control of robotic sensing systems aimed at autonomous information gathering in scenarios such as environmental monitoring, search and rescue, and surveillance and reconnaissance. The information gathering problem is formulated as a partially observable Markov decision process (POMDP) with a reward function that captures uncertainty reduction. Unlike the…
▽ More
This paper addresses the problem of optimal control of robotic sensing systems aimed at autonomous information gathering in scenarios such as environmental monitoring, search and rescue, and surveillance and reconnaissance. The information gathering problem is formulated as a partially observable Markov decision process (POMDP) with a reward function that captures uncertainty reduction. Unlike the classical POMDP formulation, the resulting reward structure is nonlinear in the belief state and the traditional approaches do not apply directly. Instead of develo** a new approximation algorithm, we show that if attention is restricted to a class of problems with certain structural properties, one can derive (often tight) upper and lower bounds on the optimal policy via an efficient myopic computation. These policy bounds can be applied in conjunction with an online branch-and-bound algorithm to accelerate the computation of the optimal policy. We obtain informative lower and upper policy bounds with low computational effort in a target tracking domain. The performance of branch-and-bounding is demonstrated and compared with exact value iteration.
△ Less
Submitted 27 January, 2016;
originally announced January 2016.
-
Planning for robotic exploration based on forward simulation
Authors:
Mikko Lauri,
Risto Ritala
Abstract:
We address the problem of controlling a mobile robot to explore a partially known environment. The robot's objective is the maximization of the amount of information collected about the environment. We formulate the problem as a partially observable Markov decision process (POMDP) with an information-theoretic objective function, and solve it applying forward simulation algorithms with an open-loo…
▽ More
We address the problem of controlling a mobile robot to explore a partially known environment. The robot's objective is the maximization of the amount of information collected about the environment. We formulate the problem as a partially observable Markov decision process (POMDP) with an information-theoretic objective function, and solve it applying forward simulation algorithms with an open-loop approximation. We present a new sample-based approximation for mutual information useful in mobile robotics. The approximation can be seamlessly integrated with forward simulation planning algorithms. We investigate the usefulness of POMDP based planning for exploration, and to alleviate some of its weaknesses propose a combination with frontier based exploration. Experimental results in simulated and real environments show that, depending on the environment, applying POMDP based planning for exploration can improve performance over frontier exploration.
△ Less
Submitted 29 June, 2016; v1 submitted 9 February, 2015;
originally announced February 2015.