Search | arXiv e-print repository

Reinforcement Learning in System Identification

Authors: Jose Antonio Martin H., Oscar Fernandez Vicente, Sergio Perez, Anas Belfadil, Cristina Ibanez-Llano, Freddy Jose Perozo Rondon, Jose Javier Valle, Javier Arechalde Pelaz

Abstract: System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a map** function from current state and a… ▽ More System identification, also known as learning forward models, transfer functions, system dynamics, etc., has a long tradition both in science and engineering in different fields. Particularly, it is a recurring theme in Reinforcement Learning research, where forward models approximate the state transition function of a Markov Decision Process by learning a map** function from current state and action to the next state. This problem is commonly defined as a Supervised Learning problem in a direct way. This common approach faces several difficulties due to the inherent complexities of the dynamics to learn, for example, delayed effects, high non-linearity, non-stationarity, partial observability and, more important, error accumulation when using bootstrapped predictions (predictions based on past predictions), over large time horizons. Here we explore the use of Reinforcement Learning in this problem. We elaborate on why and how this problem fits naturally and sound as a Reinforcement Learning problem, and present some experimental results that demonstrate RL is a promising technique to solve these kind of problems. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Comments: Accepted in Neurips Deep Reinforcement Learning Workshop 2022: https://openreview.net/forum?id=fGcbpWQIJZV

arXiv:1104.0510 [pdf, other]

Minimal non-extensible precolorings and implicit-relations

Authors: José Antonio Martín H

Abstract: In this paper I study a variant of the general vertex coloring problem called precoloring. Specifically, I study graph precolorings, by develo** new theory, for characterizing the minimal non-extensible precolorings. It is interesting per se that, for graphs of arbitrarily large chromatic number, the minimal number of colored vertices, in a non-extensible precoloring, remains constant; only two… ▽ More In this paper I study a variant of the general vertex coloring problem called precoloring. Specifically, I study graph precolorings, by develo** new theory, for characterizing the minimal non-extensible precolorings. It is interesting per se that, for graphs of arbitrarily large chromatic number, the minimal number of colored vertices, in a non-extensible precoloring, remains constant; only two vertices $u,v$ suffice. Here, the relation between such $u,v$ is called an implicit-relation, distinguishing two cases: (i) implicit-edges where $u,v$ are precolored with the same color and (ii) implicit-identities where $u,v$ are precolored distinct. △ Less

Submitted 4 April, 2011; originally announced April 2011.

MSC Class: Primary 05C15; 05C75; Secondary 05C90; 05C69

arXiv:1101.6038 [pdf, other]

A polynomial 3-colorability algorithm with automatic generation of NO 3-colorability (i.e. Co-NP) short proofs

Authors: Jose Antonio Martin H

Abstract: In this paper, an algorithm for determining 3-colorability, i.e. the decision problem (YES/NO), in planar graphs is presented. The algorithm, although not exact (it could produce false positives) has two very important features: (i) it has polynomial complexity and (ii) for every "NO" answer, a "short" proof is generated, which is of much interest since 3-colorability is a NP-complete problem and… ▽ More In this paper, an algorithm for determining 3-colorability, i.e. the decision problem (YES/NO), in planar graphs is presented. The algorithm, although not exact (it could produce false positives) has two very important features: (i) it has polynomial complexity and (ii) for every "NO" answer, a "short" proof is generated, which is of much interest since 3-colorability is a NP-complete problem and thus its complementary problem is in Co-NP. Hence the algorithm is exact when it determines that a given planar graph is not 3-colorable since this is verifiable via an automatic generation of short formal proofs (also human-readable). △ Less

Submitted 31 January, 2011; originally announced January 2011.

arXiv:1101.4003 [pdf, other]

Dyna-H: a heuristic planning reinforcement learning algorithm applied to role-playing-game strategy decision systems

Authors: Matilde Santos, Jose Antonio Martin H., Victoria Lopez, Guillermo Botella

Abstract: In a Role-Playing Game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, in mayor degree, the game performance. When classical search… ▽ More In a Role-Playing Game, finding optimal trajectories is one of the most important tasks. In fact, the strategy decision system becomes a key component of a game engine. Determining the way in which decisions are taken (online, batch or simulated) and the consumed resources in decision making (e.g. execution time, memory) will influence, in mayor degree, the game performance. When classical search algorithms such as A* can be used, they are the very first option. Nevertheless, such methods rely on precise and complete models of the search space, and there are many interesting scenarios where their application is not possible. Then, model free methods for sequential decision making under uncertainty are the best choice. In this paper, we propose a heuristic planning strategy to incorporate the ability of heuristic-search in path-finding into a Dyna agent. The proposed Dyna-H algorithm, as A* does, selects branches more likely to produce outcomes than other branches. Besides, it has the advantages of being a model-free online reinforcement learning algorithm. The proposal was evaluated against the one-step Q-Learning and Dyna-Q algorithms obtaining excellent experimental results: Dyna-H significantly overcomes both methods in all experiments. We suggest also, a functional analogy between the proposed sampling from worst trajectories heuristic and the role of dreams (e.g. nightmares) in human behavior. △ Less

Submitted 30 July, 2011; v1 submitted 20 January, 2011; originally announced January 2011.

MSC Class: 68T05 ACM Class: I.2

Showing 1–4 of 4 results for author: H, J A M