-
Safe Exploration for Identifying Linear Systems via Robust Optimization
Authors:
Tyler Lu,
Martin Zinkevich,
Craig Boutilier,
Binz Roy,
Dale Schuurmans
Abstract:
Safely exploring an unknown dynamical system is critical to the deployment of reinforcement learning (RL) in physical systems where failures may have catastrophic consequences. In scenarios where one knows little about the dynamics, diverse transition data covering relevant regions of state-action space is needed to apply either model-based or model-free RL. Motivated by the cooling of Google's da…
▽ More
Safely exploring an unknown dynamical system is critical to the deployment of reinforcement learning (RL) in physical systems where failures may have catastrophic consequences. In scenarios where one knows little about the dynamics, diverse transition data covering relevant regions of state-action space is needed to apply either model-based or model-free RL. Motivated by the cooling of Google's data centers, we study how one can safely identify the parameters of a system model with a desired accuracy and confidence level. In particular, we focus on learning an unknown linear system with Gaussian noise assuming only that, initially, a nominal safe action is known. Define safety as satisfying specific linear constraints on the state space (e.g., requirements on process variable) that must hold over the span of an entire trajectory, and given a Probably Approximately Correct (PAC) style bound on the estimation error of model parameters, we show how to compute safe regions of action space by gradually growing a ball around the nominal safe action. One can apply any exploration strategy where actions are chosen from such safe regions. Experiments on a stylized model of data center cooling dynamics show how computing proper safe regions can increase the sample efficiency of safe exploration.
△ Less
Submitted 29 November, 2017;
originally announced November 2017.
-
An Efficient Optimal-Equilibrium Algorithm for Two-player Game Trees
Authors:
Michael L. Littman,
Nishkam Ravi,
Arjun Talwar,
Martin Zinkevich
Abstract:
Two-player complete-information game trees are perhaps the simplest possible setting for studying general-sum games and the computational problem of finding equilibria. These games admit a simple bottom-up algorithm for finding subgame perfect Nash equilibria efficiently. However, such an algorithm can fail to identify optimal equilibria, such as those that maximize social welfare. The reason is t…
▽ More
Two-player complete-information game trees are perhaps the simplest possible setting for studying general-sum games and the computational problem of finding equilibria. These games admit a simple bottom-up algorithm for finding subgame perfect Nash equilibria efficiently. However, such an algorithm can fail to identify optimal equilibria, such as those that maximize social welfare. The reason is that, counterintuitively, probabilistic action choices are sometimes needed to achieve maximum payoffs. We provide a novel polynomial-time algorithm for this problem that explicitly reasons about stochastic decisions and demonstrate its use in an example card game.
△ Less
Submitted 27 June, 2012;
originally announced June 2012.
-
On Local Regret
Authors:
Michael Bowling,
Martin Zinkevich
Abstract:
Online learning aims to perform nearly as well as the best hypothesis in hindsight. For some hypothesis classes, though, even finding the best hypothesis offline is challenging. In such offline cases, local search techniques are often employed and only local optimality guaranteed. For online decision-making with such hypothesis classes, we introduce local regret, a generalization of regret that ai…
▽ More
Online learning aims to perform nearly as well as the best hypothesis in hindsight. For some hypothesis classes, though, even finding the best hypothesis offline is challenging. In such offline cases, local search techniques are often employed and only local optimality guaranteed. For online decision-making with such hypothesis classes, we introduce local regret, a generalization of regret that aims to perform nearly as well as only nearby hypotheses. We then present a general algorithm to minimize local regret with arbitrary locality graphs. We also show how the graph structure can be exploited to drastically speed learning. These algorithms are then demonstrated on a diverse set of online problems: online disjunct learning, online Max-SAT, and online decision tree learning.
△ Less
Submitted 14 June, 2012;
originally announced June 2012.
-
No-Regret Learning in Extensive-Form Games with Imperfect Recall
Authors:
Marc Lanctot,
Richard Gibson,
Neil Burch,
Martin Zinkevich,
Michael Bowling
Abstract:
Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFR's regret bounds depend on the requirement of perfect recall: players always remember information that was revealed to them and the order in which it was revealed. In games without perfect recall, however, CFR's guarantees do not apply. In this paper, we presen…
▽ More
Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFR's regret bounds depend on the requirement of perfect recall: players always remember information that was revealed to them and the order in which it was revealed. In games without perfect recall, however, CFR's guarantees do not apply. In this paper, we present the first regret bound for CFR when applied to a general class of games with imperfect recall. In addition, we show that CFR applied to any abstraction belonging to our general class results in a regret bound not just for the abstract game, but for the full game as well. We verify our theory and show how imperfect recall can be used to trade a small increase in regret for a significant reduction in memory in three domains: die-roll poker, phantom tic-tac-toe, and Bluff.
△ Less
Submitted 3 May, 2012;
originally announced May 2012.
-
Slow Learners are Fast
Authors:
John Langford,
Alexander Smola,
Martin Zinkevich
Abstract:
Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online lear…
▽ More
Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning.
△ Less
Submitted 3 November, 2009;
originally announced November 2009.
-
Crystal growth of MgB2 from Mg-Cu-B melt flux and superconducting properties
Authors:
D. Souptel,
G. Behr,
W. Loser,
W. Kopylov,
M. Zinkevich
Abstract:
A new method for preparation of single crystals of the superconducting intermetallic MgB2 compound from a Mg-Cu-B melt flux is presented. The high vapour pressure of Mg at elevated temperature is a serious challenge of the preparation process. The approximate thermodynamic calculations of the ternary Mg-Cu-B phase diagram show a beneficial effect of Cu, which extends the range of formation of Mg…
▽ More
A new method for preparation of single crystals of the superconducting intermetallic MgB2 compound from a Mg-Cu-B melt flux is presented. The high vapour pressure of Mg at elevated temperature is a serious challenge of the preparation process. The approximate thermodynamic calculations of the ternary Mg-Cu-B phase diagram show a beneficial effect of Cu, which extends the range of formation of MgB2 to lower temperatures. Within the as-solidified Mg-Cu-B melt flux the MgB2 compound forms plate-like single crystals up to a size of 0.2 x 0.2 x 0.05 mm3 or alternatively rims peritectically grown around MgB4 particles. AC-susceptibility measurements were conducted with specimen selected from different parts of the as-solidified flux containing MgB2 particles. Peritectically formed MgB2-particles display the highest transition temperature of Tc = 39.2 K and a relatively narrow transition width of DTc = 1.3 K. Other sections of the sample exhibit various superconducting transitions from Tc = 39 K to 7.2 K. This variation of Tc is attributed to a finite homogeneity range of the MgB2 compound whereas significant Cu solid solubility in MgB2 can be excluded.
△ Less
Submitted 15 August, 2002;
originally announced August 2002.