-
Reinforcement Learning in System Identification
Authors:
Jose Antonio Martin H.,
Oscar Fernandez Vicente,
Sergio Perez,
Anas Belfadil,
Cristina Ibanez-Llano,
Freddy Jose Perozo Rondon,
Jose Javier Valle,
Javier Arechalde Pelaz
Abstract:
System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a map** function from current state and a…
▽ More
System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a map** function from current state and action to the next state. This problem is commonly defined as a Supervised Learning problem in a direct way. This common approach faces several difficulties due to the inherent complexities of the dynamics to learn, for example, delayed effects, high non-linearity, non-stationarity, partial observability and, more important, error accumulation when using bootstrapped predictions (predictions based on past predictions), over large time horizons. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Minimal non-extensible precolorings and implicit-relations
Authors:
José Antonio Martín H
Abstract:
In this paper I study a variant of the general vertex coloring problem called precoloring. Specifically, I study graph precolorings, by develo** new theory, for characterizing the minimal non-extensible precolorings. It is interesting per se that, for graphs of arbitrarily large chromatic number, the minimal number of colored vertices, in a non-extensible precoloring, remains constant; only two…
▽ More
In this paper I study a variant of the general vertex coloring problem called precoloring. Specifically, I study graph precolorings, by develo** new theory, for characterizing the minimal non-extensible precolorings. It is interesting per se that, for graphs of arbitrarily large chromatic number, the minimal number of colored vertices, in a non-extensible precoloring, remains constant; only two vertices $u,v$ suffice. Here, the relation between such $u,v$ is called an implicit-relation, distinguishing two cases: (i) implicit-edges where $u,v$ are precolored with the same color and (ii) implicit-identities where $u,v$ are precolored distinct.
△ Less
Submitted 4 April, 2011;
originally announced April 2011.
-
A polynomial 3-colorability algorithm with automatic generation of NO 3-colorability (i.e. Co-NP) short proofs
Authors:
Jose Antonio Martin H
Abstract:
In this paper, an algorithm for determining 3-colorability, i.e. the decision problem (YES/NO), in planar graphs is presented. The algorithm, although not exact (it could produce false positives) has two very important features: (i) it has polynomial complexity and (ii) for every "NO" answer, a "short" proof is generated, which is of much interest since 3-colorability is a NP-complete problem and…
▽ More
In this paper, an algorithm for determining 3-colorability, i.e. the decision problem (YES/NO), in planar graphs is presented. The algorithm, although not exact (it could produce false positives) has two very important features: (i) it has polynomial complexity and (ii) for every "NO" answer, a "short" proof is generated, which is of much interest since 3-colorability is a NP-complete problem and thus its complementary problem is in Co-NP. Hence the algorithm is exact when it determines that a given planar graph is not 3-colorable since this is verifiable via an automatic generation of short formal proofs (also human-readable).
△ Less
Submitted 31 January, 2011;
originally announced January 2011.
-
Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems
Authors:
Matilde Santos,
Jose Antonio Martin H.,
Victoria Lopez,
Guillermo Botella
Abstract:
In a Role-Playing Game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, in mayor degree, the game performance. When classical search…
▽ More
In a Role-Playing Game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, in mayor degree, the game performance. When classical search algorithms such as A* can be used, they are the very first option. Nevertheless, such methods rely on precise and complete models of the search space, and there are many interesting scenarios where their application is not possible. Then, model free methods for sequential decision making under uncertainty are the best choice. In this paper, we propose a heuristic planning strategy to incorporate the ability of heuristic-search in path-finding into a Dyna agent. The proposed Dyna-H algorithm, as A* does, selects branches more likely to produce outcomes than other branches. Besides, it has the advantages of being a model-free online reinforcement learning algorithm. The proposal was evaluated against the one-step Q-Learning and Dyna-Q algorithms obtaining excellent experimental results: Dyna-H significantly overcomes both methods in all experiments. We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of dreams (e.g. nightmares) in human behavior.
△ Less
Submitted 30 July, 2011; v1 submitted 20 January, 2011;
originally announced January 2011.