-
Towards Safe Robot Use with Edged or Pointed Objects: A Surrogate Study Assembling a Human Hand Injury Protection Database
Authors:
Robin Jeanne Kirschner,
Carina M. Micheler,
Yangcan Zhou,
Sebastian Siegner,
Mazin Hamad,
Claudio Glowalla,
Jan Neumann,
Nader Rajaei,
Rainer Burgkart,
Sami Haddadin
Abstract:
The use of pointed or edged tools or objects is one of the most challenging aspects of today's application of physical human-robot interaction (pHRI). One reason for this is that the severity of harm caused by such edged or pointed impactors is less well studied than for blunt impactors. Consequently, the standards specify well-reasoned force and pressure thresholds for blunt impactors and advise…
▽ More
The use of pointed or edged tools or objects is one of the most challenging aspects of today's application of physical human-robot interaction (pHRI). One reason for this is that the severity of harm caused by such edged or pointed impactors is less well studied than for blunt impactors. Consequently, the standards specify well-reasoned force and pressure thresholds for blunt impactors and advise avoiding any edges and corners in contacts. Nevertheless, pointed or edged impactor geometries cannot be completely ruled out in real pHRI applications. For example, to allow edged or pointed tools such as screwdrivers near human operators, the knowledge of injury severity needs to be extended so that robot integrators can perform well-reasoned, time-efficient risk assessments. In this paper, we provide the initial datasets on injury prevention for the human hand based on drop tests with surrogates for the human hand, namely pig claws and chicken drumsticks. We then demonstrate the ease and efficiency of robot use using the dataset for contact on two examples. Finally, our experiments provide a set of injuries that may also be expected for human subjects under certain robot mass-velocity constellations in collisions. To extend this work, testing on human samples and a collaborative effort from research institutes worldwide is needed to create a comprehensive human injury avoidance database for any pHRI scenario and thus for safe pHRI applications including edged and pointed geometries.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
Regret Minimization via Saddle Point Optimization
Authors:
Johannes Kirschner,
Seyed Alireza Bakhtiari,
Kushagra Chandak,
Volodymyr Tkachuk,
Csaba Szepesvári
Abstract:
A long line of works characterizes the sample complexity of regret minimization in sequential decision-making by min-max programs. In the corresponding saddle-point game, the min-player optimizes the sampling distribution against an adversarial max-player that chooses confusing models leading to large regret. The most recent instantiation of this idea is the decision-estimation coefficient (DEC),…
▽ More
A long line of works characterizes the sample complexity of regret minimization in sequential decision-making by min-max programs. In the corresponding saddle-point game, the min-player optimizes the sampling distribution against an adversarial max-player that chooses confusing models leading to large regret. The most recent instantiation of this idea is the decision-estimation coefficient (DEC), which was shown to provide nearly tight lower and upper bounds on the worst-case expected regret in structured bandits and reinforcement learning. By re-parametrizing the offset DEC with the confidence radius and solving the corresponding min-max program, we derive an anytime variant of the Estimation-To-Decisions (E2D) algorithm. Importantly, the algorithm optimizes the exploration-exploitation trade-off online instead of via the analysis. Our formulation leads to a practical algorithm for finite model classes and linear feedback models. We further point out connections to the information ratio, decoupling coefficient and PAC-DEC, and numerically evaluate the performance of E2D on simple examples.
△ Less
Submitted 15 March, 2024;
originally announced March 2024.
-
Investigating a domain adaptation approach for integrating different measurement instruments in a longitudinal clinical registry
Authors:
Maren Hackenberg,
Michelle Pfaffenlehner,
Max Behrens,
Astrid Pechmann,
Janbernd Kirschner,
Harald Binder
Abstract:
In a longitudinal clinical registry, different measurement instruments might have been used for assessing individuals at different time points. To combine them, we investigate deep learning techniques for obtaining a joint latent representation, to which the items of different measurement instruments are mapped. This corresponds to domain adaptation, an established concept in computer science for…
▽ More
In a longitudinal clinical registry, different measurement instruments might have been used for assessing individuals at different time points. To combine them, we investigate deep learning techniques for obtaining a joint latent representation, to which the items of different measurement instruments are mapped. This corresponds to domain adaptation, an established concept in computer science for image data. Using the proposed approach as an example, we evaluate the potential of domain adaptation in a longitudinal cohort setting with a rather small number of time points, motivated by an application with different motor function measurement instruments in a registry of spinal muscular atrophy (SMA) patients. There, we model trajectories in the latent representation by ordinary differential equations (ODEs), where person-specific ODE parameters are inferred from baseline characteristics. The goodness of fit and complexity of the ODE solutions then allows to judge the measurement instrument map**s. We subsequently explore how alignment can be improved by incorporating corresponding penalty terms into model fitting. To systematically investigate the effect of differences between measurement instruments, we consider several scenarios based on modified SMA data, including scenarios where a map** should be feasible in principle and scenarios where no perfect map** is available. While misalignment increases in more complex scenarios, some structure is still recovered, even if the availability of measurement instruments depends on patient state. A reasonable map** is feasible also in the more complex real SMA dataset. These results indicate that domain adaptation might be more generally useful in statistical modeling for longitudinal registry data.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
A statistical approach to latent dynamic modeling with differential equations
Authors:
Maren Hackenberg,
Astrid Pechmann,
Clemens Kreutz,
Janbernd Kirschner,
Harald Binder
Abstract:
Ordinary differential equations (ODEs) can provide mechanistic models of temporally local changes of processes, where parameters are often informed by external knowledge. While ODEs are popular in systems modeling, they are less established for statistical modeling of longitudinal cohort data, e.g., in a clinical setting. Yet, modeling of local changes could also be attractive for assessing the tr…
▽ More
Ordinary differential equations (ODEs) can provide mechanistic models of temporally local changes of processes, where parameters are often informed by external knowledge. While ODEs are popular in systems modeling, they are less established for statistical modeling of longitudinal cohort data, e.g., in a clinical setting. Yet, modeling of local changes could also be attractive for assessing the trajectory of an individual in a cohort in the immediate future given its current status, where ODE parameters could be informed by further characteristics of the individual. However, several hurdles so far limit such use of ODEs, as compared to regression-based function fitting approaches. The potentially higher level of noise in cohort data might be detrimental to ODEs, as the shape of the ODE solution heavily depends on the initial value. In addition, larger numbers of variables multiply such problems and might be difficult to handle for ODEs. To address this, we propose to use each observation in the course of time as the initial value to obtain multiple local ODE solutions and build a combined estimator of the underlying dynamics. Neural networks are used for obtaining a low-dimensional latent space for dynamic modeling from a potentially large number of variables, and for obtaining patient-specific ODE parameters from baseline variables. Simultaneous identification of dynamic models and of a latent space is enabled by recently developed differentiable programming techniques. We illustrate the proposed approach in an application with spinal muscular atrophy patients and a corresponding simulation study. In particular, modeling of local changes in health status at any point in time is contrasted to the interpretation of functions obtained from a global regression. This more generally highlights how different application settings might demand different modeling strategies.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
A Concise Overview of Safety Aspects in Human-Robot Interaction
Authors:
Mazin Hamad,
Simone Nertinger,
Robin J. Kirschner,
Luis Figueredo,
Abdeldjallil Naceri,
Sami Haddadin
Abstract:
As of today, robots exhibit impressive agility but also pose potential hazards to humans using/collaborating with them. Consequently, safety is considered the most paramount factor in human-robot interaction (HRI). This paper presents a multi-layered safety architecture, integrating both physical and cognitive aspects for effective HRI. We outline critical requirements for physical safety layers a…
▽ More
As of today, robots exhibit impressive agility but also pose potential hazards to humans using/collaborating with them. Consequently, safety is considered the most paramount factor in human-robot interaction (HRI). This paper presents a multi-layered safety architecture, integrating both physical and cognitive aspects for effective HRI. We outline critical requirements for physical safety layers as service modules that can be arbitrarily queried. Further, we showcase an HRI scheme that addresses human factors and perceived safety as high-level constraints on a validated impact safety paradigm. The aim is to enable safety certification of human-friendly robots across various HRI scenarios.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Efficient Planning in Combinatorial Action Spaces with Applications to Cooperative Multi-Agent Reinforcement Learning
Authors:
Volodymyr Tkachuk,
Seyed Alireza Bakhtiari,
Johannes Kirschner,
Matej Jusup,
Ilija Bogunovic,
Csaba Szepesvári
Abstract:
A practical challenge in reinforcement learning are combinatorial action spaces that make planning computationally demanding. For example, in cooperative multi-agent reinforcement learning, a potentially large number of agents jointly optimize a global reward function, which leads to a combinatorial blow-up in the action space by the number of agents. As a minimal requirement, we assume access to…
▽ More
A practical challenge in reinforcement learning are combinatorial action spaces that make planning computationally demanding. For example, in cooperative multi-agent reinforcement learning, a potentially large number of agents jointly optimize a global reward function, which leads to a combinatorial blow-up in the action space by the number of agents. As a minimal requirement, we assume access to an argmax oracle that allows to efficiently compute the greedy policy for any Q-function in the model class. Building on recent work in planning with local access to a simulator and linear function approximation, we propose efficient algorithms for this setting that lead to polynomial compute and query complexity in all relevant problem parameters. For the special case where the feature decomposition is additive, we further improve the bounds and extend the results to the kernelized setting with an efficient algorithm.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications
Authors:
Johannes Kirschner,
Tor Lattimore,
Andreas Krause
Abstract:
Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent results on the linear formulation of partial monitoring that naturally generalizes the standard linear bandit setting. The main result is that a single algorithm,…
▽ More
Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent results on the linear formulation of partial monitoring that naturally generalizes the standard linear bandit setting. The main result is that a single algorithm, information-directed sampling (IDS), is (nearly) worst-case rate optimal in all finite-action games. We present a simple and unified analysis of stochastic partial monitoring, and further extend the model to the contextual and kernelized setting.
△ Less
Submitted 13 November, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
Near-optimal Policy Identification in Active Reinforcement Learning
Authors:
Xiang Li,
Viraj Mehta,
Johannes Kirschner,
Ian Char,
Willie Neiswanger,
Jeff Schneider,
Andreas Krause,
Ilija Bogunovic
Abstract:
Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm…
▽ More
Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required.
△ Less
Submitted 19 December, 2022;
originally announced December 2022.
-
Managing Temporal Resolution in Continuous Value Estimation: A Fundamental Trade-off
Authors:
Zichen Zhang,
Johannes Kirschner,
Junxi Zhang,
Francesco Zanini,
Alex Ayoub,
Masood Dehghan,
Dale Schuurmans
Abstract:
A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle. Yet, many applications involve continuous-time systems where the time discretization, in principle, can be managed. The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its…
▽ More
A default assumption in reinforcement learning (RL) and optimal control is that observations arrive at discrete time points on a fixed clock cycle. Yet, many applications involve continuous-time systems where the time discretization, in principle, can be managed. The impact of time discretization on RL methods has not been fully characterized in existing theory, but a more detailed analysis of its effect could reveal opportunities for improving data-efficiency. We address this gap by analyzing Monte-Carlo policy evaluation for LQR systems and uncover a fundamental trade-off between approximation and statistical error in value estimation. Importantly, these two errors behave differently to time discretization, leading to an optimal choice of temporal resolution for a given data budget. These findings show that managing the temporal resolution can provably improve policy evaluation efficiency in LQR systems with finite data. Empirically, we demonstrate the trade-off in numerical simulations of LQR instances and standard RL benchmarks for non-linear continuous control.
△ Less
Submitted 16 January, 2024; v1 submitted 17 December, 2022;
originally announced December 2022.
-
Tuning Particle Accelerators with Safety Constraints using Bayesian Optimization
Authors:
Johannes Kirschner,
Mojmir Mutný,
Andreas Krause,
Jaime Coello de Portugal,
Nicole Hiller,
Jochem Snuverink
Abstract:
Tuning machine parameters of particle accelerators is a repetitive and time-consuming task that is challenging to automate. While many off-the-shelf optimization algorithms are available, in practice their use is limited because most methods do not account for safety-critical constraints in each iteration, such as loss signals or step-size limitations. One notable exception is safe Bayesian optimi…
▽ More
Tuning machine parameters of particle accelerators is a repetitive and time-consuming task that is challenging to automate. While many off-the-shelf optimization algorithms are available, in practice their use is limited because most methods do not account for safety-critical constraints in each iteration, such as loss signals or step-size limitations. One notable exception is safe Bayesian optimization, which is a data-driven tuning approach for global optimization with noisy feedback. We propose and evaluate a step-size limited variant of safe Bayesian optimization on two research facilities of the Paul Scherrer Institut (PSI): a) the Swiss Free Electron Laser (SwissFEL) and b) the High-Intensity Proton Accelerator (HIPA). We report promising experimental results on both machines, tuning up to 16 parameters subject to 224 constraints.
△ Less
Submitted 30 June, 2022; v1 submitted 25 March, 2022;
originally announced March 2022.
-
ISO/TS 15066: How Different Interpretations Affect Risk Assessment
Authors:
Robin Jeanne Kirschner,
Nico Mansfeld,
Saeed Abdolshah,
Sami Haddadin
Abstract:
The current technical specification ISO/TS15066:2016(E) for safe human-robot interaction contains logically conflicting definitions for the contact between human and robot. This may result in different interpretations for the contact classification and thus no unique outcome can be expected, which may even cause a risk to the human. In previous work, we showed a first set of implications. This pap…
▽ More
The current technical specification ISO/TS15066:2016(E) for safe human-robot interaction contains logically conflicting definitions for the contact between human and robot. This may result in different interpretations for the contact classification and thus no unique outcome can be expected, which may even cause a risk to the human. In previous work, we showed a first set of implications. This paper addresses the possible interpretations of a collision scenario as a result of the varying interpretations for a risk assessment. With an experiment including four commercially available robot systems we demonstrate the procedure of the risk assessment following the different interpretations of the TS. The results indicate possible incorrect use of the technical specification, which we believe needs to be resolved in future revisions. For this, we suggest tools in form of a decision tree and constrained collision force maps, which enable a simple, unambiguous risk assessment for HRI.
△ Less
Submitted 5 March, 2022;
originally announced March 2022.
-
Expectable Motion Unit: Avoiding Hazards From Human Involuntary Motions in Human-Robot Interaction
Authors:
Robin Jeanne Kirschner,
Henning Mayer,
Lisa Burr,
Nico Mansfeld,
Saeed Abdolshah,
Sami Haddadin
Abstract:
In robotics, many control and planning schemes have been developed to ensure human physical safety in human-robot interaction. The human psychological state and the expectation towards the robot, however, are typically neglected. Even if the robot behaviour is regarded as biomechanically safe, humans may still react with a rapid involuntary motion (IM) caused by a startle or surprise. Such sudden,…
▽ More
In robotics, many control and planning schemes have been developed to ensure human physical safety in human-robot interaction. The human psychological state and the expectation towards the robot, however, are typically neglected. Even if the robot behaviour is regarded as biomechanically safe, humans may still react with a rapid involuntary motion (IM) caused by a startle or surprise. Such sudden, uncontrolled motions can jeopardize safety and should be prevented by any means. In this letter, we propose the Expectable Motion Unit (EMU), which ensures that a certain probability of IM occurrence is not exceeded in a typical HRI setting. Based on a model of IM occurrence generated through an experiment with 29 participants, we establish the map** between robot velocity, robot-human distance, and the relative frequency of IM occurrence. This map** is processed towards a real-time capable robot motion generator that limits the robot velocity during task execution if necessary. The EMU is combined in a holistic safety framework that integrates both the physical and psychological safety knowledge. A validation experiment showed that the EMU successfully avoids human IM in five out of six cases.
△ Less
Submitted 4 April, 2024; v1 submitted 15 September, 2021;
originally announced September 2021.
-
Bias-Robust Bayesian Optimization via Dueling Bandits
Authors:
Johannes Kirschner,
Andreas Krause
Abstract:
We consider Bayesian optimization in settings where observations can be adversarially biased, for example by an uncontrolled hidden confounder. Our first contribution is a reduction of the confounded setting to the dueling bandit model. Then we propose a novel approach for dueling bandits based on information-directed sampling (IDS). Thereby, we obtain the first efficient kernelized algorithm for…
▽ More
We consider Bayesian optimization in settings where observations can be adversarially biased, for example by an uncontrolled hidden confounder. Our first contribution is a reduction of the confounded setting to the dueling bandit model. Then we propose a novel approach for dueling bandits based on information-directed sampling (IDS). Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees. Our analysis further generalizes a previously proposed semi-parametric linear bandit model to non-linear reward functions, and uncovers interesting links to doubly-robust estimation.
△ Less
Submitted 9 June, 2021; v1 submitted 25 May, 2021;
originally announced May 2021.
-
Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback
Authors:
Marc Jourdan,
Mojmír Mutný,
Johannes Kirschner,
Andreas Krause
Abstract:
Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more ge…
▽ More
Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as forming a base of a matroid or a path in a graph. We focus on the pure-exploration problem of identifying the best arm with fixed confidence, as well as a more general setting, where the structure of the answer set differs from the one of the action set. Using the recently popularized game framework, we interpret this problem as a sequential zero-sum game and develop a CombGame meta-algorithm whose instances are asymptotically optimal algorithms with finite time guarantees. In addition to comparing two families of learners to instantiate our meta-algorithm, the main contribution of our work is a specific oracle efficient instance for best-arm identification with combinatorial actions. Based on a projection-free online learning algorithm for convex polytopes, it is the first computationally efficient algorithm which is asymptotically optimal and has competitive empirical performance.
△ Less
Submitted 21 January, 2021;
originally announced January 2021.
-
Deep dynamic modeling with just two time points: Can we still allow for individual trajectories?
Authors:
Maren Hackenberg,
Philipp Harms,
Michelle Pfaffenlehner,
Astrid Pechmann,
Janbernd Kirschner,
Thorsten Schmidt,
Harald Binder
Abstract:
Longitudinal biomedical data are often characterized by a sparse time grid and individual-specific development patterns. Specifically, in epidemiological cohort studies and clinical registries we are facing the question of what can be learned from the data in an early phase of the study, when only a baseline characterization and one follow-up measurement are available. Inspired by recent advances…
▽ More
Longitudinal biomedical data are often characterized by a sparse time grid and individual-specific development patterns. Specifically, in epidemiological cohort studies and clinical registries we are facing the question of what can be learned from the data in an early phase of the study, when only a baseline characterization and one follow-up measurement are available. Inspired by recent advances that allow to combine deep learning with dynamic modeling, we investigate whether such approaches can be useful for uncovering complex structure, in particular for an extreme small data setting with only two observations time points for each individual. Irregular spacing in time could then be used to gain more information on individual dynamics by leveraging similarity of individuals. We provide a brief overview of how variational autoencoders (VAEs), as a deep learning approach, can be linked to ordinary differential equations (ODEs) for dynamic modeling, and then specifically investigate the feasibility of such an approach that infers individual-specific latent trajectories by including regularity assumptions and individuals' similarity. We also provide a description of this deep learning approach as a filtering task to give a statistical perspective. Using simulated data, we show to what extent the approach can recover individual trajectories from ODE systems with two and four unknown parameters and infer groups of individuals with similar trajectories, and where it breaks down. The results show that such dynamic deep learning approaches can be useful even in extreme small data settings, but need to be carefully adapted.
△ Less
Submitted 20 December, 2021; v1 submitted 1 December, 2020;
originally announced December 2020.
-
Asymptotically Optimal Information-Directed Sampling
Authors:
Johannes Kirschner,
Tor Lattimore,
Claire Vernade,
Csaba Szepesvári
Abstract:
We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Ou…
▽ More
We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.
△ Less
Submitted 2 July, 2021; v1 submitted 11 November, 2020;
originally announced November 2020.
-
A user-centered approach to designing an experimental laboratory data platform
Authors:
Ha-Kyung Kwon,
Chirranjeevi Balaji Gopal,
Jared Kirschner,
Santiago Caicedo,
Brian D. Storey
Abstract:
While automated experiments and high-throughput methods are becoming more mainstream in the age of data, empowering individual researchers to capture, collate, and contextualize their data faster and more reproducibly still remains a challenge in science. Despite the abundance of software products to help digitize and organize scientific information, their broader adoption in the scientific commun…
▽ More
While automated experiments and high-throughput methods are becoming more mainstream in the age of data, empowering individual researchers to capture, collate, and contextualize their data faster and more reproducibly still remains a challenge in science. Despite the abundance of software products to help digitize and organize scientific information, their broader adoption in the scientific community has been hindered by the lack of a holistic understanding of the diverse needs of researchers and their experimental processes. In this work, we take a user-centered approach to understand what essential elements of design and functionality researchers (in chemical and materials science) want in an experimental data platform to address the problem of data capture in their experimental processes. We found that having the capability to contextualize rich, complex experimental datasets is the primary user requirement. We synthesize this and other key findings into design criteria for a potential solution.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
Information Directed Sampling for Linear Partial Monitoring
Authors:
Johannes Kirschner,
Tor Lattimore,
Andreas Krause
Abstract:
Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observabili…
▽ More
Partial monitoring is a rich framework for sequential decision making under uncertainty that generalizes many well known bandit models, including linear, combinatorial and dueling bandits. We introduce information directed sampling (IDS) for stochastic partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. Moreover, we prove lower bounds that classify the minimax regret of all finite games into four possible regimes. IDS achieves the optimal rate in all cases up to logarithmic factors, without tuning any hyper-parameters. We further extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.
△ Less
Submitted 25 February, 2020;
originally announced February 2020.
-
Distributionally Robust Bayesian Optimization
Authors:
Johannes Kirschner,
Ilija Bogunovic,
Stefanie Jegelka,
Andreas Krause
Abstract:
Robustness to distributional shift is one of the key challenges of contemporary machine learning. Attaining such robustness is the goal of distributionally robust optimization, which seeks a solution to an optimization problem that is worst-case robust under a specified distributional shift of an uncontrolled covariate. In this paper, we study such a problem when the distributional shift is measur…
▽ More
Robustness to distributional shift is one of the key challenges of contemporary machine learning. Attaining such robustness is the goal of distributionally robust optimization, which seeks a solution to an optimization problem that is worst-case robust under a specified distributional shift of an uncontrolled covariate. In this paper, we study such a problem when the distributional shift is measured via the maximum mean discrepancy (MMD). For the setting of zeroth-order, noisy optimization, we present a novel distributionally robust Bayesian optimization algorithm (DRBO). Our algorithm provably obtains sub-linear robust regret in various settings that differ in how the uncertain covariate is observed. We demonstrate the robust performance of our method on both synthetic and real-world benchmarks.
△ Less
Submitted 22 March, 2020; v1 submitted 20 February, 2020;
originally announced February 2020.
-
Stochastic Bandits with Context Distributions
Authors:
Johannes Kirschner,
Andreas Krause
Abstract:
We introduce a stochastic contextual bandit model where at each time step the environment chooses a distribution over a context set and samples the context from this distribution. The learner observes only the context distribution while the exact context realization remains hidden. This allows for a broad range of applications where the context is stochastic or when the learner needs to predict th…
▽ More
We introduce a stochastic contextual bandit model where at each time step the environment chooses a distribution over a context set and samples the context from this distribution. The learner observes only the context distribution while the exact context realization remains hidden. This allows for a broad range of applications where the context is stochastic or when the learner needs to predict the context. We leverage the UCB algorithm to this setting and show that it achieves an order-optimal high-probability bound on the cumulative regret for linear and kernelized reward functions. Our results strictly generalize previous work in the sense that both our model and the algorithm reduce to the standard setting when the environment chooses only Dirac delta distributions and therefore provides the exact context to the learner. We further analyze a variant where the learner observes the realized context after choosing the action. Finally, we demonstrate the proposed method on synthetic and real-world datasets.
△ Less
Submitted 14 November, 2019; v1 submitted 6 June, 2019;
originally announced June 2019.
-
Adaptive and Safe Bayesian Optimization in High Dimensions via One-Dimensional Subspaces
Authors:
Johannes Kirschner,
Mojmír Mutný,
Nicole Hiller,
Rasmus Ischebeck,
Andreas Krause
Abstract:
Bayesian optimization is known to be difficult to scale to high dimensions, because the acquisition step requires solving a non-convex optimization problem in the same search space. In order to scale the method and keep its benefits, we propose an algorithm (LineBO) that restricts the problem to a sequence of iteratively chosen one-dimensional sub-problems that can be solved efficiently. We show t…
▽ More
Bayesian optimization is known to be difficult to scale to high dimensions, because the acquisition step requires solving a non-convex optimization problem in the same search space. In order to scale the method and keep its benefits, we propose an algorithm (LineBO) that restricts the problem to a sequence of iteratively chosen one-dimensional sub-problems that can be solved efficiently. We show that our algorithm converges globally and obtains a fast local rate when the function is strongly convex. Further, if the objective has an invariant subspace, our method automatically adapts to the effective dimension without changing the algorithm. When combined with the SafeOpt algorithm to solve the sub-problems, we obtain the first safe Bayesian optimization algorithm with theoretical guarantees applicable in high-dimensional settings. We evaluate our method on multiple synthetic benchmarks, where we obtain competitive performance. Further, we deploy our algorithm to optimize the beam intensity of the Swiss Free Electron Laser with up to 40 parameters while satisfying safe operation constraints.
△ Less
Submitted 28 May, 2019; v1 submitted 8 February, 2019;
originally announced February 2019.
-
Information-Directed Exploration for Deep Reinforcement Learning
Authors:
Nikolay Nikolov,
Johannes Kirschner,
Felix Berkenkamp,
Andreas Krause
Abstract:
Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by r…
▽ More
Efficient exploration remains a major challenge for reinforcement learning. One reason is that the variability of the returns often depends on the current state and action, and is therefore heteroscedastic. Classical exploration strategies such as upper confidence bound algorithms and Thompson sampling fail to appropriately account for heteroscedasticity, even in the bandit setting. Motivated by recent findings that address this issue in bandits, we propose to use Information-Directed Sampling (IDS) for exploration in reinforcement learning. As our main contribution, we build on recent advances in distributional reinforcement learning and propose a novel, tractable approximation of IDS for deep Q-learning. The resulting exploration strategy explicitly accounts for both parametric uncertainty and heteroscedastic observation noise. We evaluate our method on Atari games and demonstrate a significant improvement over alternative approaches.
△ Less
Submitted 24 March, 2019; v1 submitted 18 December, 2018;
originally announced December 2018.