-
Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains
Authors:
Till Hofmann,
Hector Geffner
Abstract:
General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning ge…
▽ More
General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning general policies over fully observable, non-deterministic (FOND) domains. We also evaluate the resulting approach experimentally over a number of benchmark domains in FOND planning, present the general policies that result in some of these domains, and prove their correctness. The method for learning general policies for FOND planning can actually be seen as an alternative FOND planning method that searches for solutions, not in the given state space but in an abstract space defined by features that must be learned as well.
△ Less
Submitted 13 May, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
On Policy Reuse: An Expressive Language for Representing and Executing General Policies that Call Other Policies
Authors:
Blai Bonet,
Dominik Drexler,
Hector Geffner
Abstract:
Recently, a simple but powerful language for expressing and learning general policies and problem decompositions (sketches) has been introduced in terms of rules defined over a set of Boolean and numerical features. In this work, we consider three extensions of this language aimed at making policies and sketches more flexible and reusable: internal memory states, as in finite state controllers; in…
▽ More
Recently, a simple but powerful language for expressing and learning general policies and problem decompositions (sketches) has been introduced in terms of rules defined over a set of Boolean and numerical features. In this work, we consider three extensions of this language aimed at making policies and sketches more flexible and reusable: internal memory states, as in finite state controllers; indexical features, whose values are a function of the state and a number of internal registers that can be loaded with objects; and modules that wrap up policies and sketches and allow them to call each other by passing parameters. In addition, unlike general policies that select state transitions rather than ground actions, the new language allows for the selection of such actions. The expressive power of the resulting language for policies and sketches is illustrated through a number of examples.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Combined Task and Motion Planning Via Sketch Decompositions (Extended Version with Supplementary Material)
Authors:
Magí Dalmau-Moreno,
Néstor García,
Vicenç Gómez,
Héctor Geffner
Abstract:
The challenge in combined task and motion planning (TAMP) is the effective integration of a search over a combinatorial space, usually carried out by a task planner, and a search over a continuous configuration space, carried out by a motion planner. Using motion planners for testing the feasibility of task plans and filling out the details is not effective because it makes the geometrical constra…
▽ More
The challenge in combined task and motion planning (TAMP) is the effective integration of a search over a combinatorial space, usually carried out by a task planner, and a search over a continuous configuration space, carried out by a motion planner. Using motion planners for testing the feasibility of task plans and filling out the details is not effective because it makes the geometrical constraints play a passive role. This work introduces a new interleaved approach for integrating the two dimensions of TAMP that makes use of sketches, a recent simple but powerful language for expressing the decomposition of problems into subproblems. A sketch has width 1 if it decomposes the problem into subproblems that can be solved greedily in linear time. In the paper, a general sketch is introduced for several classes of TAMP problems which has width 1 under suitable assumptions. While sketch decompositions have been developed for classical planning, they offer two important benefits in the context of TAMP. First, when a task plan is found to be unfeasible due to the geometric constraints, the combinatorial search resumes in a specific sub-problem. Second, the sampling of object configurations is not done once, globally, at the start of the search, but locally, at the start of each subproblem. Optimizations of this basic setting are also considered and experimental results over existing and new pick-and-place benchmarks are reported.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Learning General Policies for Classical Planning Domains: Getting Beyond C$_2$
Authors:
Simon Ståhlberg,
Blai Bonet,
Hector Geffner
Abstract:
GNN-based approaches for learning general policies across planning domains are limited by the expressive power of $C_2$, namely; first-order logic with two variables and counting. This limitation can be overcomed by transitioning to $k$-GNNs, for $k=3$, wherein object embeddings are substituted with triplet embeddings. Yet, while $3$-GNNs have the expressive power of $C_3$, unlike $1$- and $2$-GNN…
▽ More
GNN-based approaches for learning general policies across planning domains are limited by the expressive power of $C_2$, namely; first-order logic with two variables and counting. This limitation can be overcomed by transitioning to $k$-GNNs, for $k=3$, wherein object embeddings are substituted with triplet embeddings. Yet, while $3$-GNNs have the expressive power of $C_3$, unlike $1$- and $2$-GNNs that are confined to $C_2$, they require quartic time for message exchange and cubic space for embeddings, rendering them impractical. In this work, we introduce a parameterized version of relational GNNs. When $t$ is infinity, R-GNN[$t$] approximates $3$-GNNs using only quadratic space for embeddings. For lower values of $t$, such as $t=1$ and $t=2$, R-GNN[$t$] achieves a weaker approximation by exchanging fewer messages, yet interestingly, often yield the $C_3$ features required in several planning domains. Furthermore, the new R-GNN[$t$] architecture is the original R-GNN architecture with a suitable transformation applied to the input states only. Experimental results illustrate the clear performance gains of R-GNN[$1$] and R-GNN[$2$] over plain R-GNNs, and also over edge transformers that also approximate $3$-GNNs.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
General Policies, Subgoal Structure, and Planning Width
Authors:
Blai Bonet,
Hector Geffner
Abstract:
It has been observed that many classical planning domains with atomic goals can be solved by means of a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width, which in these cases is bounded and small. Yet, while the notion of width has become part of state-of-the-art planning algorithms such as BFWS, there is no good explanation for why so many ben…
▽ More
It has been observed that many classical planning domains with atomic goals can be solved by means of a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width, which in these cases is bounded and small. Yet, while the notion of width has become part of state-of-the-art planning algorithms such as BFWS, there is no good explanation for why so many benchmark domains have bounded width when atomic goals are considered. In this work, we address this question by relating bounded width with the existence of general optimal policies that in each planning instance are represented by tuples of atoms of bounded size. We also define the notions of (explicit) serializations and serialized width that have a broader scope as many domains have a bounded serialized width but no bounded width. Such problems are solved non-optimally in polynomial time by a suitable variant of the Serialized IW algorithm. Finally, the language of general policies and the semantics of serializations are combined to yield a simple, meaningful, and expressive language for specifying serializations in compact form in the form of sketches, which can be used for encoding domain control knowledge by hand or for learning it from small examples. Sketches express general problem decompositions in terms of subgoals, and sketches of bounded width express problem decompositions that can be solved in polynomial time.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Language-Based Causal Representation Learning
Authors:
Blai Bonet,
Hector Geffner
Abstract:
Consider the finite state graph that results from a simple, discrete, dynamical system in which an agent moves in a rectangular grid picking up and drop** packages. Can the state variables of the problem, namely, the agent location and the package locations, be recovered from the structure of the state graph alone without having access to information about the objects, the structure of the state…
▽ More
Consider the finite state graph that results from a simple, discrete, dynamical system in which an agent moves in a rectangular grid picking up and drop** packages. Can the state variables of the problem, namely, the agent location and the package locations, be recovered from the structure of the state graph alone without having access to information about the objects, the structure of the states, or any background knowledge? We show that this is possible provided that the dynamics is learned over a suitable domain-independent first-order causal language that makes room for objects and relations that are not assumed to be known. The preference for the most compact representation in the language that is compatible with the data provides a strong and meaningful learning bias that makes this possible. The language of structured causal models (SCMs) is the standard language for representing (static) causal models but in dynamic worlds populated by objects, first-order causal languages such as those used in "classical AI planning" are required. While "classical AI" requires handcrafted representations, similar representations can be learned from unstructured data over the same languages. Indeed, it is the languages and the preference for compact representations in those languages that provide structure to the world, uncovering objects, relations, and causes.
△ Less
Submitted 11 July, 2022;
originally announced July 2022.
-
Learning Generalized Policies Without Supervision Using GNNs
Authors:
Simon Ståhlberg,
Blai Bonet,
Hector Geffner
Abstract:
We consider the problem of learning generalized policies for classical planning domains using graph neural networks from small instances represented in lifted STRIPS. The problem has been considered before but the proposed neural architectures are complex and the results are often mixed. In this work, we use a simple and general GNN architecture and aim at obtaining crisp experimental results and…
▽ More
We consider the problem of learning generalized policies for classical planning domains using graph neural networks from small instances represented in lifted STRIPS. The problem has been considered before but the proposed neural architectures are complex and the results are often mixed. In this work, we use a simple and general GNN architecture and aim at obtaining crisp experimental results and a deeper understanding: either the policy greedy in the learned value function achieves close to 100% generalization over instances larger than those used in training, or the failure must be understood, and possibly fixed, logically. For this, we exploit the relation established between the expressive power of GNNs and the $C_{2}$ fragment of first-order logic (namely, FOL with 2 variables and counting quantifiers). We find for example that domains with general policies that require more expressive features can be solved with GNNs once the states are extended with suitable "derived atoms" encoding role compositions and transitive closures that do not fit into $C_{2}$. The work follows the GNN approach for learning optimal general policies in a supervised fashion (Stahlberg, Bonet, Geffner, 2022); but the learned policies are no longer required to be optimal (which expands the scope, as many planning domains do not have general optimal policies) and are learned without supervision. Interestingly, value-based reinforcement learning methods that aim to produce optimal policies, do not always yield policies that generalize, as the goals of optimality and generality are in conflict in domains where optimal planning is NP-hard.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Learning First-Order Symbolic Planning Representations That Are Grounded
Authors:
Andrés Occhipinti Liberman,
Blai Bonet,
Hector Geffner
Abstract:
Two main approaches have been developed for learning first-order planning (action) models from unstructured data: combinatorial approaches that yield crisp action schemas from the structure of the state space, and deep learning approaches that produce action schemas from states represented by images. A benefit of the former approach is that the learned action schemas are similar to those that can…
▽ More
Two main approaches have been developed for learning first-order planning (action) models from unstructured data: combinatorial approaches that yield crisp action schemas from the structure of the state space, and deep learning approaches that produce action schemas from states represented by images. A benefit of the former approach is that the learned action schemas are similar to those that can be written by hand; a benefit of the latter is that the learned representations (predicates) are grounded on the images, and as a result, new instances can be given in terms of images. In this work, we develop a new formulation for learning crisp first-order planning models that are grounded on parsed images, a step to combine the benefits of the two approaches. Parsed images are assumed to be given in a simple O2D language (objects in 2D) that involves a small number of unary and binary predicates like "left", "above", "shape", etc. After learning, new planning instances can be given in terms of pairs of parsed images, one for the initial situation and the other for the goal. Learning and planning experiments are reported for several domains including Blocks, Sokoban, IPC Grid, and Hanoi.
△ Less
Submitted 30 April, 2022; v1 submitted 25 April, 2022;
originally announced April 2022.
-
Learning Sketches for Decomposing Planning Problems into Subproblems of Bounded Width: Extended Version
Authors:
Dominik Drexler,
Jendrik Seipp,
Hector Geffner
Abstract:
Recently, sketches have been introduced as a general language for representing the subgoal structure of instances drawn from the same domain. Sketches are collections of rules of the form C -> E over a given set of features where C expresses Boolean conditions and E expresses qualitative changes. Each sketch rule defines a subproblem: going from a state that satisfies C to a state that achieves th…
▽ More
Recently, sketches have been introduced as a general language for representing the subgoal structure of instances drawn from the same domain. Sketches are collections of rules of the form C -> E over a given set of features where C expresses Boolean conditions and E expresses qualitative changes. Each sketch rule defines a subproblem: going from a state that satisfies C to a state that achieves the change expressed by E or a goal state. Sketches can encode simple goal serializations, general policies, or decompositions of bounded width that can be solved greedily, in polynomial time, by the SIW_R variant of the SIW algorithm. Previous work has shown the computational value of sketches over benchmark domains that, while tractable, are challenging for domain-independent planners. In this work, we address the problem of learning sketches automatically given a planning domain, some instances of the target class of problems, and the desired bound on the sketch width. We present a logical formulation of the problem, an implementation using the ASP solver Clingo, and experimental results. The sketch learner and the SIW_R planner yield a domain-independent planner that learns and exploits domain structure in a crisp and explicit form.
△ Less
Submitted 28 March, 2022;
originally announced March 2022.
-
Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits
Authors:
Simon Ståhlberg,
Blai Bonet,
Hector Geffner
Abstract:
It has been recently shown that general policies for many classical planning domains can be expressed and learned in terms of a pool of features defined from the domain predicates using a description logic grammar. At the same time, most description logics correspond to a fragment of $k$-variable counting logic ($C_k$) for $k=2$, that has been shown to provide a tight characterization of the expre…
▽ More
It has been recently shown that general policies for many classical planning domains can be expressed and learned in terms of a pool of features defined from the domain predicates using a description logic grammar. At the same time, most description logics correspond to a fragment of $k$-variable counting logic ($C_k$) for $k=2$, that has been shown to provide a tight characterization of the expressive power of graph neural networks. In this work, we make use of these results to understand the power and limits of using graph neural networks (GNNs) for learning optimal general policies over a number of tractable planning domains where such policies are known to exist. For this, we train a simple GNN in a supervised manner to approximate the optimal value function $V^{*}(s)$ of a number of sample states $s$. As predicted by the theory, it is observed that general optimal policies are obtained in domains where general optimal value functions can be defined with $C_2$ features but not in those requiring more expressive $C_3$ features. In addition, it is observed that the features learned are in close correspondence with the features needed to express $V^{*}$ in closed form. The theory and the analysis of the domains let us understand the features that are actually learned as well as those that cannot be learned in this way, and let us move in a principled manner from a combinatorial optimization approach to learning general policies to a potentially, more robust and scalable approach based on deep learning.
△ Less
Submitted 6 May, 2022; v1 submitted 21 September, 2021;
originally announced September 2021.
-
Target Languages (vs. Inductive Biases) for Learning to Act and Plan
Authors:
Hector Geffner
Abstract:
Recent breakthroughs in AI have shown the remarkable power of deep learning and deep reinforcement learning. These developments, however, have been tied to specific tasks, and progress in out-of-distribution generalization has been limited. While it is assumed that these limitations can be overcome by incorporating suitable inductive biases, the notion of inductive biases itself is often left vagu…
▽ More
Recent breakthroughs in AI have shown the remarkable power of deep learning and deep reinforcement learning. These developments, however, have been tied to specific tasks, and progress in out-of-distribution generalization has been limited. While it is assumed that these limitations can be overcome by incorporating suitable inductive biases, the notion of inductive biases itself is often left vague and does not provide meaningful guidance. In the paper, I articulate a different learning approach where representations do not emerge from biases in a neural architecture but are learned over a given target language with a known semantics. The basic ideas are implicit in mainstream AI where representations have been encoded in languages ranging from fragments of first-order logic to probabilistic structural causal models. The challenge is to learn from data the representations that have traditionally been crafted by hand. Generalization is then a result of the semantics of the language. The goals of this paper are to make these ideas explicit, to place them in a broader context where the design of the target language is crucial, and to illustrate them in the context of learning to act and plan. For this, after a general discussion, I consider learning representations of actions, general policies, and subgoals ("intrinsic rewards"). In these cases, learning is formulated as a combinatorial problem but nothing prevents the use of deep learning techniques instead. Indeed, learning representations over languages with a known semantics provides an account of what is to be learned, while learning representations with neural nets provides a complementary account of how representations can be learned. The challenge and the opportunity is to bring the two together.
△ Less
Submitted 29 November, 2021; v1 submitted 15 September, 2021;
originally announced September 2021.
-
Learning First-Order Representations for Planning from Black-Box States: New Results
Authors:
Ivan D. Rodriguez,
Blai Bonet,
Javier Romero,
Hector Geffner
Abstract:
Recently Bonet and Geffner have shown that first-order representations for planning domains can be learned from the structure of the state space without any prior knowledge about the action schemas or domain predicates. For this, the learning problem is formulated as the search for a simplest first-order domain description D that along with information about instances I_i (number of objects and in…
▽ More
Recently Bonet and Geffner have shown that first-order representations for planning domains can be learned from the structure of the state space without any prior knowledge about the action schemas or domain predicates. For this, the learning problem is formulated as the search for a simplest first-order domain description D that along with information about instances I_i (number of objects and initial state) determine state space graphs G(P_i) that match the observed state graphs G_i where P_i = (D, I_i). The search is cast and solved approximately by means of a SAT solver that is called over a large family of propositional theories that differ just in the parameters encoding the possible number of action schemas and domain predicates, their arities, and the number of objects. In this work, we push the limits of these learners by moving to an answer set programming (ASP) encoding using the CLINGO system. The new encodings are more transparent and concise, extending the range of possible models while facilitating their exploration. We show that the domains introduced by Bonet and Geffner can be solved more efficiently in the new approach, often optimally, and furthermore, that the approach can be easily extended to handle partial information about the state graphs as well as noise that prevents some states from being distinguished.
△ Less
Submitted 22 May, 2021;
originally announced May 2021.
-
Expressing and Exploiting the Common Subgoal Structure of Classical Planning Domains Using Sketches: Extended Version
Authors:
Dominik Drexler,
Jendrik Seipp,
Hector Geffner
Abstract:
Width-based planning methods deal with conjunctive goals by decomposing problems into subproblems of low width. Algorithms like SIW thus fail when the goal is not easily serializable in this way or when some of the subproblems have a high width. In this work, we address these limitations by using a simple but powerful language for expressing finer problem decompositions introduced recently by Bone…
▽ More
Width-based planning methods deal with conjunctive goals by decomposing problems into subproblems of low width. Algorithms like SIW thus fail when the goal is not easily serializable in this way or when some of the subproblems have a high width. In this work, we address these limitations by using a simple but powerful language for expressing finer problem decompositions introduced recently by Bonet and Geffner, called policy sketches. A policy sketch over a set of Boolean and numerical features is a set of sketch rules that express how the values of these features are supposed to change. Like general policies, policy sketches are domain general, but unlike policies, the changes captured by sketch rules do not need to be achieved in a single step. We show that many planning domains that cannot be solved by SIW are provably solvable in low polynomial time with the SIW_R algorithm, the version of SIW that employs user-provided policy sketches. Policy sketches are thus shown to be a powerful language for expressing domain-specific knowledge in a simple and compact way and a convenient alternative to languages such as HTNs or temporal logics. Furthermore, they make it easy to express general problem decompositions and prove key properties of them like their width and complexity.
△ Less
Submitted 8 July, 2021; v1 submitted 10 May, 2021;
originally announced May 2021.
-
Flexible FOND Planning with Explicit Fairness Assumptions
Authors:
Ivan D. Rodriguez,
Blai Bonet,
Sebastian Sardina,
Hector Geffner
Abstract:
We consider the problem of reaching a propositional goal condition in fully-observable non-deterministic (FOND) planning under a general class of fairness assumptions that are given explicitly. The fairness assumptions are of the form A/B and say that state trajectories that contain infinite occurrences of an action a from A in a state s and finite occurrence of actions from B, must also contain i…
▽ More
We consider the problem of reaching a propositional goal condition in fully-observable non-deterministic (FOND) planning under a general class of fairness assumptions that are given explicitly. The fairness assumptions are of the form A/B and say that state trajectories that contain infinite occurrences of an action a from A in a state s and finite occurrence of actions from B, must also contain infinite occurrences of action a in s followed by each one of its possible outcomes. The infinite trajectories that violate this condition are deemed as unfair, and the solutions are policies for which all the fair trajectories reach a goal state. We show that strong and strong-cyclic FOND planning, as well as QNP planning, a planning model introduced recently for generalized planning, are all special cases of FOND planning with fairness assumptions of this form which can also be combined. FOND+ planning, as this form of planning is called, combines the syntax of FOND planning with some of the versatility of LTL for expressing fairness constraints. A new planner is implemented by reducing FOND+ planning to answer set programs, and the performance of the planner is evaluated in comparison with FOND and QNP planners, and LTL synthesis tools.
△ Less
Submitted 15 March, 2021;
originally announced March 2021.
-
Learning General Policies from Small Examples Without Supervision
Authors:
Guillem Francès,
Blai Bonet,
Hector Geffner
Abstract:
Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once. It has been recently shown that these policies can be computed in two steps: first, a suitable abstraction in the form of a qualitative numerical planning problem (QNP) is learned from sample plans, then the general policies are obtained from the learned QNP us…
▽ More
Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once. It has been recently shown that these policies can be computed in two steps: first, a suitable abstraction in the form of a qualitative numerical planning problem (QNP) is learned from sample plans, then the general policies are obtained from the learned QNP using a planner. In this work, we introduce an alternative approach for computing more expressive general policies which does not require sample plans or a QNP planner. The new formulation is very simple and can be cast in terms that are more standard in machine learning: a large but finite pool of features is defined from the predicates in the planning examples using a general grammar, and a small subset of features is sought for separating "good" from "bad" state transitions, and goals from non-goals. The problems of finding such a "separating surface" while labeling the transitions as "good" or "bad" are jointly addressed as a single combinatorial optimization problem expressed as a Weighted Max-SAT problem. The advantage of looking for the simplest policy in the given feature space that solves the given examples, possibly non-optimally, is that many domains have no general, compact policies that are optimal. The approach yields general policies for a number of benchmark domains.
△ Less
Submitted 17 February, 2021; v1 submitted 3 January, 2021;
originally announced January 2021.
-
General Policies, Serializations, and Planning Width
Authors:
Blai Bonet,
Hector Geffner
Abstract:
It has been observed that in many of the benchmark planning domains, atomic goals can be reached with a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width. Such problems have indeed a bounded width: a width that does not grow with the number of problem variables and is often no greater than two. Yet, while the notion of width has become part of t…
▽ More
It has been observed that in many of the benchmark planning domains, atomic goals can be reached with a simple polynomial exploration procedure, called IW, that runs in time exponential in the problem width. Such problems have indeed a bounded width: a width that does not grow with the number of problem variables and is often no greater than two. Yet, while the notion of width has become part of the state-of-the-art planning algorithms like BFWS, there is still no good explanation for why so many benchmark domains have bounded width. In this work, we address this question by relating bounded width and serialized width to ideas of generalized planning, where general policies aim to solve multiple instances of a planning problem all at once. We show that bounded width is a property of planning domains that admit optimal general policies in terms of features that are explicitly or implicitly represented in the domain encoding. The results are extended to much larger class of domains with bounded serialized width where the general policies do not have to be optimal. The study leads also to a new simple, meaningful, and expressive language for specifying domain serializations in the form of policy sketches which can be used for encoding domain control knowledge by hand or for learning it from traces. The use of sketches and the meaning of the theoretical results are all illustrated through a number of examples.
△ Less
Submitted 23 December, 2020; v1 submitted 14 December, 2020;
originally announced December 2020.
-
Qualitative Numeric Planning: Reductions and Complexity
Authors:
Blai Bonet,
Hector Geffner
Abstract:
Qualitative numerical planning is classical planning extended with non-negative real variables that can be increased or decreased "qualitatively", i.e., by positive indeterminate amounts. While deterministic planning with numerical variables is undecidable in general, qualitative numerical planning is decidable and provides a convenient abstract model for generalized planning. The solutions to qua…
▽ More
Qualitative numerical planning is classical planning extended with non-negative real variables that can be increased or decreased "qualitatively", i.e., by positive indeterminate amounts. While deterministic planning with numerical variables is undecidable in general, qualitative numerical planning is decidable and provides a convenient abstract model for generalized planning. The solutions to qualitative numerical problems (QNPs) were shown to correspond to the strong cyclic solutions of an associated fully observable non-deterministic (FOND) problem that terminate. This leads to a generate-and-test algorithm for solving QNPs where solutions to a FOND problem are generated one by one and tested for termination. The computational shortcomings of this approach for solving QNPs, however, are that it is not simple to amend FOND planners to generate all solutions, and that the number of solutions to check can be doubly exponential in the number of variables. In this work we address these limitations while providing additional insights on QNPs. More precisely, we introduce two polynomial-time reductions, one from QNPs to FOND problems and the other from FOND problems to QNPs both of which do not involve termination tests. A result of these reductions is that QNPs are shown to have the same expressive power and the same complexity as FOND problems.
△ Less
Submitted 26 November, 2020; v1 submitted 10 December, 2019;
originally announced December 2019.
-
Factored Probabilistic Belief Tracking
Authors:
Blai Bonet,
Hector Geffner
Abstract:
The problem of belief tracking in the presence of stochastic actions and observations is pervasive and yet computationally intractable. In this work we show however that probabilistic beliefs can be maintained in factored form exactly and efficiently across a number of causally closed beams, when the state variables that appear in more than one beam obey a form of backward determinism. Since compu…
▽ More
The problem of belief tracking in the presence of stochastic actions and observations is pervasive and yet computationally intractable. In this work we show however that probabilistic beliefs can be maintained in factored form exactly and efficiently across a number of causally closed beams, when the state variables that appear in more than one beam obey a form of backward determinism. Since computing marginals from the factors is still computationally intractable in general, and variables appearing in several beams are not always backward-deterministic, the basic formulation is extended with two approximations: forms of belief propagation for computing marginals from factors, and sampling of non-backward-deterministic variables for making such variables backward-deterministic given their sampled history. Unlike, Rao-Blackwellized particle-filtering, the sampling is not used for making inference tractable but for making the factorization sound. The resulting algorithm involves sampling and belief propagation or just one of them as determined by the structure of the model.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Causal Belief Decomposition for Planning with Sensing: Completeness Results and Practical Approximation
Authors:
Blai Bonet,
Hector Geffner
Abstract:
Belief tracking is a basic problem in planning with sensing. While the problem is intractable, it has been recently shown that for both deterministic and non-deterministic systems expressed in compact form, it can be done in time and space that are exponential in the problem width. The width measures the maximum number of state variables that are all relevant to a given precondition or goal. In th…
▽ More
Belief tracking is a basic problem in planning with sensing. While the problem is intractable, it has been recently shown that for both deterministic and non-deterministic systems expressed in compact form, it can be done in time and space that are exponential in the problem width. The width measures the maximum number of state variables that are all relevant to a given precondition or goal. In this work, we extend this result both theoretically and practically. First, we introduce an alternative decomposition scheme and algorithm with the same time complexity but different completeness guarantees, whose space complexity is much smaller: exponential in the causal width of the problem that measures the number of state variables that are causally relevant to a given precondition, goal, or observable. Second, we introduce a fast, meaningful, and powerful approximation that trades completeness by speed, and is both time and space exponential in the problem causal width. It is then shown empirically that the algorithm combined with simple heuristics yields state-of-the-art real-time performance in domains with high widths but low causal widths such as Minesweeper, Battleship, and Wumpus.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Generalized Planning: Non-Deterministic Abstractions and Trajectory Constraints
Authors:
Blai Bonet,
Giuseppe De Giacomo,
Hector Geffner,
Sasha Rubin
Abstract:
We study the characterization and computation of general policies for families of problems that share a structure characterized by a common reduction into a single abstract problem. Policies $μ$ that solve the abstract problem P have been shown to solve all problems Q that reduce to P provided that $μ$ terminates in Q. In this work, we shed light on why this termination condition is needed and how…
▽ More
We study the characterization and computation of general policies for families of problems that share a structure characterized by a common reduction into a single abstract problem. Policies $μ$ that solve the abstract problem P have been shown to solve all problems Q that reduce to P provided that $μ$ terminates in Q. In this work, we shed light on why this termination condition is needed and how it can be removed. The key observation is that the abstract problem P captures the common structure among the concrete problems Q that is local (Markovian) but misses common structure that is global. We show how such global structure can be captured by means of trajectory constraints that in many cases can be expressed as LTL formulas, thus reducing generalized planning to LTL synthesis. Moreover, for a broad class of problems that involve integer variables that can be increased or decreased, trajectory constraints can be compiled away, reducing generalized planning to fully observable non-deterministic planning.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Action Selection for MDPs: Anytime AO* vs. UCT
Authors:
Blai Bonet,
Hector Geffner
Abstract:
In the presence of non-admissible heuristics, A* and other best-first algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for best-first algorithms over AND/OR graphs, that must be able to expand leaf nodes of the explicit graph that are not necessarily part of the best…
▽ More
In the presence of non-admissible heuristics, A* and other best-first algorithms can be converted into anytime optimal algorithms over OR graphs, by simply continuing the search after the first solution is found. The same trick, however, does not work for best-first algorithms over AND/OR graphs, that must be able to expand leaf nodes of the explicit graph that are not necessarily part of the best partial solution. Anytime optimal variants of AO* must thus address an exploration-exploitation tradeoff: they cannot just "exploit", they must keep exploring as well. In this work, we develop one such variant of AO* and apply it to finite-horizon MDPs. This Anytime AO* algorithm eventually delivers an optimal policy while using non-admissible random heuristics that can be sampled, as when the heuristic is the cost of a base policy that can be sampled with rollouts. We then test Anytime AO* for action selection over large infinite-horizon MDPs that cannot be solved with existing off-line heuristic search and dynamic programming algorithms, and compare it with UCT.
△ Less
Submitted 26 September, 2019;
originally announced September 2019.
-
Learning First-Order Symbolic Representations for Planning from the Structure of the State Space
Authors:
Blai Bonet,
Hector Geffner
Abstract:
One of the main obstacles for develo** flexible AI systems is the split between data-based learners and model-based solvers. Solvers such as classical planners are very flexible and can deal with a variety of problem instances and goals but require first-order symbolic models. Data-based learners, on the other hand, are robust but do not produce such representations. In this work we address this…
▽ More
One of the main obstacles for develo** flexible AI systems is the split between data-based learners and model-based solvers. Solvers such as classical planners are very flexible and can deal with a variety of problem instances and goals but require first-order symbolic models. Data-based learners, on the other hand, are robust but do not produce such representations. In this work we address this split by showing how the first-order symbolic representations that are used by planners can be learned from non-symbolic inputs that encode the structure of the state space. The representation learning problem is formulated as the problem of inferring planning instances over a common but unknown first-order domain that account for the structure of the observed state space. This means to infer a complete first-order representation (i.e. general action schemas, relational symbols, and objects) that explains the observed state space structures. The inference problem is cast as a two-level combinatorial search where the outer level searches for values of a small set of hyperparameters and the inner level, solved via SAT, searches for a first-order symbolic model. The framework is shown to produce general and correct first-order representations for standard problems like Gripper, Blocksworld, and Hanoi from input graphs that encode the flat state-space structure of a single instance.
△ Less
Submitted 20 February, 2020; v1 submitted 12 September, 2019;
originally announced September 2019.
-
Learning Features and Abstract Actions for Computing Generalized Plans
Authors:
Blai Bonet,
Guillem Francès,
Hector Geffner
Abstract:
Generalized planning is concerned with the computation of plans that solve not one but multiple instances of a planning domain. Recently, it has been shown that generalized plans can be expressed as map**s of feature values into actions, and that they can often be computed with fully observable non-deterministic (FOND) planners. The actions in such plans, however, are not the actions in the inst…
▽ More
Generalized planning is concerned with the computation of plans that solve not one but multiple instances of a planning domain. Recently, it has been shown that generalized plans can be expressed as map**s of feature values into actions, and that they can often be computed with fully observable non-deterministic (FOND) planners. The actions in such plans, however, are not the actions in the instances themselves, which are not necessarily common to other instances, but abstract actions that are defined on a set of common features. The formulation assumes that the features and the abstract actions are given. In this work, we address this limitation by showing how to learn them automatically. The resulting account of generalized planning combines learning and planning in a novel way: a learner, based on a Max SAT formulation, yields the features and abstract actions from sampled state transitions, and a FOND planner uses this information, suitably transformed, to produce the general plans. Correctness guarantees are given and experimental results on several domains are reported.
△ Less
Submitted 17 November, 2018;
originally announced November 2018.
-
Compact Policies for Fully-Observable Non-Deterministic Planning as SAT
Authors:
Tomas Geffner,
Hector Geffner
Abstract:
Fully observable non-deterministic (FOND) planning is becoming increasingly important as an approach for computing proper policies in probabilistic planning, extended temporal plans in LTL planning, and general plans in generalized planning. In this work, we introduce a SAT encoding for FOND planning that is compact and can produce compact strong cyclic policies. Simple variations of the encodings…
▽ More
Fully observable non-deterministic (FOND) planning is becoming increasingly important as an approach for computing proper policies in probabilistic planning, extended temporal plans in LTL planning, and general plans in generalized planning. In this work, we introduce a SAT encoding for FOND planning that is compact and can produce compact strong cyclic policies. Simple variations of the encodings are also introduced for strong planning and for what we call, dual FOND planning, where some non-deterministic actions are assumed to be fair (e.g., probabilistic) and others unfair (e.g., adversarial). The resulting FOND planners are compared empirically with existing planners over existing and new benchmarks. The notion of "probabilistic interesting problems" is also revisited to yield a more comprehensive picture of the strengths and limitations of current FOND planners and the proposed SAT approach.
△ Less
Submitted 25 June, 2018;
originally announced June 2018.
-
Model-free, Model-based, and General Intelligence
Authors:
Hector Geffner
Abstract:
During the 60s and 70s, AI researchers explored intuitions about intelligence by writing programs that displayed intelligent behavior. Many good ideas came out from this work but programs written by hand were not robust or general. After the 80s, research increasingly shifted to the development of learners capable of inferring behavior and functions from experience and data, and solvers capable of…
▽ More
During the 60s and 70s, AI researchers explored intuitions about intelligence by writing programs that displayed intelligent behavior. Many good ideas came out from this work but programs written by hand were not robust or general. After the 80s, research increasingly shifted to the development of learners capable of inferring behavior and functions from experience and data, and solvers capable of tackling well-defined but intractable models like SAT, classical planning, Bayesian networks, and POMDPs. The learning approach has achieved considerable success but results in black boxes that do not have the flexibility, transparency, and generality of their model-based counterparts. Model-based approaches, on the other hand, require models and scalable algorithms. Model-free learners and model-based solvers have close parallels with Systems 1 and 2 in current theories of the human mind: the first, a fast, opaque, and inflexible intuitive mind; the second, a slow, transparent, and flexible analytical mind. In this paper, I review developments in AI and draw on these theories to discuss the gap between model-free learners and model-based solvers, a gap that needs to be bridged in order to have intelligent systems that are robust and general.
△ Less
Submitted 6 June, 2018;
originally announced June 2018.
-
Features, Projections, and Representation Change for Generalized Planning
Authors:
Blai Bonet,
Hector Geffner
Abstract:
Generalized planning is concerned with the characterization and computation of plans that solve many instances at once. In the standard formulation, a generalized plan is a map** from feature or observation histories into actions, assuming that the instances share a common pool of features and actions. This assumption, however, excludes the standard relational planning domains where actions and…
▽ More
Generalized planning is concerned with the characterization and computation of plans that solve many instances at once. In the standard formulation, a generalized plan is a map** from feature or observation histories into actions, assuming that the instances share a common pool of features and actions. This assumption, however, excludes the standard relational planning domains where actions and objects change across instances. In this work, we extend the standard formulation of generalized planning to such domains. This is achieved by projecting the actions over the features, resulting in a common set of abstract actions which can be tested for soundness and completeness, and which can be used for generating general policies such as "if the gripper is empty, pick the clear block above x and place it on the table" that achieve the goal clear(x) in any Blocksworld instance. In this policy, "pick the clear block above x" is an abstract action that may represent the action Unstack(a, b) in one situation and the action Unstack(b, c) in another. Transformations are also introduced for computing such policies by means of fully observable non-deterministic (FOND) planners. The value of generalized representations for learning general policies is also discussed.
△ Less
Submitted 14 June, 2018; v1 submitted 30 January, 2018;
originally announced January 2018.
-
Planning with Pixels in (Almost) Real Time
Authors:
Wilmer Bandres,
Blai Bonet,
Hector Geffner
Abstract:
Recently, width-based planning methods have been shown to yield state-of-the-art results in the Atari 2600 video games. For this, the states were associated with the (RAM) memory states of the simulator. In this work, we consider the same planning problem but using the screen instead. By using the same visual inputs, the planning results can be compared with those of humans and learning methods. W…
▽ More
Recently, width-based planning methods have been shown to yield state-of-the-art results in the Atari 2600 video games. For this, the states were associated with the (RAM) memory states of the simulator. In this work, we consider the same planning problem but using the screen instead. By using the same visual inputs, the planning results can be compared with those of humans and learning methods. We show that the planning approach, out of the box and without training, results in scores that compare well with those obtained by humans and learning methods, and moreover, by develo** an episodic, rollout version of the IW(k) algorithm, we show that such scores can be obtained in almost real time.
△ Less
Submitted 10 January, 2018;
originally announced January 2018.
-
Combined Task and Motion Planning as Classical AI Planning
Authors:
Jonathan Ferrer-Mestres,
Guillem Francès,
Hector Geffner
Abstract:
Planning in robotics is often split into task and motion planning. The high-level, symbolic task planner decides what needs to be done, while the motion planner checks feasibility and fills up geometric detail. It is known however that such a decomposition is not effective in general as the symbolic and geometrical components are not independent. In this work, we show that it is possible to compil…
▽ More
Planning in robotics is often split into task and motion planning. The high-level, symbolic task planner decides what needs to be done, while the motion planner checks feasibility and fills up geometric detail. It is known however that such a decomposition is not effective in general as the symbolic and geometrical components are not independent. In this work, we show that it is possible to compile task and motion planning problems into classical AI planning problems; i.e., planning problems over finite and discrete state spaces with a known initial state, deterministic actions, and goal states to be reached. The compilation is sound, meaning that classical plans are valid robot plans, and probabilistically complete, meaning that valid robot plans are classical plans when a sufficient number of configurations is sampled. In this approach, motion planners and collision checkers are used for the compilation, but not at planning time. The key elements that make the approach effective are 1) expressive classical AI planning languages for representing the compiled problems in compact form, that unlike PDDL make use of functions and state constraints, and 2) general width-based search algorithms capable of finding plans over huge combinatorial spaces using weak heuristics only. Empirical results are presented for a PR2 robot manipulating tens of objects, for which long plans are required.
△ Less
Submitted 21 June, 2017;
originally announced June 2017.
-
Heuristics for Planning, Plan Recognition and Parsing
Authors:
Miquel Ramirez,
Hector Geffner
Abstract:
In a recent paper, we have shown that Plan Recognition over STRIPS can be formulated and solved using Classical Planning heuristics and algorithms. In this work, we show that this formulation subsumes the standard formulation of Plan Recognition over libraries through a compilation of libraries into STRIPS theories. The libraries correspond to AND/OR graphs that may be cyclic and where children of…
▽ More
In a recent paper, we have shown that Plan Recognition over STRIPS can be formulated and solved using Classical Planning heuristics and algorithms. In this work, we show that this formulation subsumes the standard formulation of Plan Recognition over libraries through a compilation of libraries into STRIPS theories. The libraries correspond to AND/OR graphs that may be cyclic and where children of AND nodes may be partially ordered. These libraries include Context-Free Grammars as a special case, where the Plan Recognition problem becomes a parsing with missing tokens problem. Plan Recognition over the standard libraries become Planning problems that can be easily solved by any modern planner, while recognition over more complex libraries, including Context-Free Grammars (CFGs), illustrate limitations of current Planning heuristics and suggest improvements that may be relevant in other Planning problems too.
△ Less
Submitted 22 May, 2016; v1 submitted 19 May, 2016;
originally announced May 2016.
-
Soft Goals Can Be Compiled Away
Authors:
Emil Keyder,
Hector Geffner
Abstract:
Soft goals extend the classical model of planning with a simple model of preferences. The best plans are then not the ones with least cost but the ones with maximum utility, where the utility of a plan is the sum of the utilities of the soft goals achieved minus the plan cost. Finding plans with high utility appears to involve two linked problems: choosing a subset of soft goals to achieve and fin…
▽ More
Soft goals extend the classical model of planning with a simple model of preferences. The best plans are then not the ones with least cost but the ones with maximum utility, where the utility of a plan is the sum of the utilities of the soft goals achieved minus the plan cost. Finding plans with high utility appears to involve two linked problems: choosing a subset of soft goals to achieve and finding a low-cost plan to achieve them. New search algorithms and heuristics have been developed for planning with soft goals, and a new track has been introduced in the International Planning Competition (IPC) to test their performance. In this note, we show however that these extensions are not needed: soft goals do not increase the expressive power of the basic model of planning with action costs, as they can easily be compiled away. We apply this compilation to the problems of the net-benefit track of the most recent IPC, and show that optimal and satisficing cost-based planners do better on the compiled problems than optimal and satisficing net-benefit planners on the original problems with explicit soft goals. Furthermore, we show that penalties, or negative preferences expressing conditions to avoid, can also be compiled away using a similar idea.
△ Less
Submitted 15 January, 2014;
originally announced January 2014.
-
Compiling Uncertainty Away in Conformant Planning Problems with Bounded Width
Authors:
Hector Palacios,
Hector Geffner
Abstract:
Conformant planning is the problem of finding a sequence of actions for achieving a goal in the presence of uncertainty in the initial state or action effects. The problem has been approached as a path-finding problem in belief space where good belief representations and heuristics are critical for scaling up. In this work, a different formulation is introduced for conformant problems with deter…
▽ More
Conformant planning is the problem of finding a sequence of actions for achieving a goal in the presence of uncertainty in the initial state or action effects. The problem has been approached as a path-finding problem in belief space where good belief representations and heuristics are critical for scaling up. In this work, a different formulation is introduced for conformant problems with deterministic actions where they are automatically converted into classical ones and solved by an off-the-shelf classical planner. The translation maps literals L and sets of assumptions t about the initial situation, into new literals KL/t that represent that L must be true if t is initially true. We lay out a general translation scheme that is sound and establish the conditions under which the translation is also complete. We show that the complexity of the complete translation is exponential in a parameter of the problem called the conformant width, which for most benchmarks is bounded. The planner based on this translation exhibits good performance in comparison with existing planners, and is the basis for T0, the best performing planner in the Conformant Track of the 2006 International Planning Competition.
△ Less
Submitted 15 January, 2014;
originally announced January 2014.
-
Arguing for Decisions: A Qualitative Model of Decision Making
Authors:
Blai Bonet,
Hector Geffner
Abstract:
We develop a qualitative model of decision making with two aims: to describe how people make simple decisions and to enable computer programs to do the same. Current approaches based on Planning or Decisions Theory either ignore uncertainty and tradeoffs, or provide languages and algorithms that are too complex for this task. The proposed model provides a language based on rules, a semantics bas…
▽ More
We develop a qualitative model of decision making with two aims: to describe how people make simple decisions and to enable computer programs to do the same. Current approaches based on Planning or Decisions Theory either ignore uncertainty and tradeoffs, or provide languages and algorithms that are too complex for this task. The proposed model provides a language based on rules, a semantics based on high probabilities and lexicographical preferences, and a transparent decision procedure where reasons for and against decisions interact. The model is no substitude for Decision Theory, yet for decisions that people find easy to explain it may provide an appealing alternative.
△ Less
Submitted 13 February, 2013;
originally announced February 2013.
-
PDDL 2.1: Representation vs. Computation
Authors:
H. A. Geffner
Abstract:
I comment on the PDDL 2.1 language and its use in the planning competition, focusing on the choices made for accommodating time and concurrency. I also discuss some methodological issues that have to do with the move toward more expressive planning languages and the balance needed in planning research between semantics and computation.
I comment on the PDDL 2.1 language and its use in the planning competition, focusing on the choices made for accommodating time and concurrency. I also discuss some methodological issues that have to do with the move toward more expressive planning languages and the balance needed in planning research between semantics and computation.
△ Less
Submitted 12 October, 2011;
originally announced October 2011.
-
mGPT: A Probabilistic Planner Based on Heuristic Search
Authors:
B. Bonet,
H. Geffner
Abstract:
We describe the version of the GPT planner used in the probabilistic track of the 4th International Planning Competition (IPC-4). This version, called mGPT, solves Markov Decision Processes specified in the PPDDL language by extracting and using different classes of lower bounds along with various heuristic-search algorithms. The lower bounds are extracted from deterministic relaxations where the…
▽ More
We describe the version of the GPT planner used in the probabilistic track of the 4th International Planning Competition (IPC-4). This version, called mGPT, solves Markov Decision Processes specified in the PPDDL language by extracting and using different classes of lower bounds along with various heuristic-search algorithms. The lower bounds are extracted from deterministic relaxations where the alternative probabilistic effects of an action are mapped into different, independent, deterministic actions. The heuristic-search algorithms use these lower bounds for focusing the updates and delivering a consistent value function over all states reachable from the initial state and the greedy policy.
△ Less
Submitted 9 September, 2011;
originally announced September 2011.