Skip to main content

Showing 1–11 of 11 results for author: Simão, T D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.11434  [pdf, ps, other

    cs.AI cs.MA

    Factored Online Planning in Many-Agent POMDPs

    Authors: Maris F. L. Galesloot, Thiago D. Simão, Sebastian Junges, Nils Jansen

    Abstract: In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so… ▽ More

    Submitted 23 February, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: Extended version (includes the Appendix) of the paper accepted at AAAI-24

  2. arXiv:2312.11227  [pdf, other

    cs.LG cs.AI

    Robust Active Measuring under Model Uncertainty

    Authors: Merlijn Krale, Thiago D. Simão, Jana Tumova, Nils Jansen

    Abstract: Partial observability and uncertainty are common problems in sequential decision-making that particularly impede the use of formal models such as Markov decision processes (MDPs). However, in practice, agents may be able to employ costly sensors to measure their environment and resolve partial observability by gathering information. Moreover, imprecise transition functions can capture model uncert… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI 2024

  3. Reinforcement Learning by Guided Safe Exploration

    Authors: Qisong Yang, Thiago D. Simão, Nils Jansen, Simon H. Tindemans, Matthijs T. J. Spaan

    Abstract: Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once the reward is revealed. We consider the constrained rew… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accecpted at ECAI 2023

  4. arXiv:2305.07958  [pdf, other

    cs.LG cs.AI

    More for Less: Safe Policy Improvement With Stronger Performance Guarantees

    Authors: Patrick Wienhöft, Marnix Suilen, Thiago D. Simão, Clemens Dubslaff, Christel Baier, Nils Jansen

    Abstract: In an offline reinforcement learning setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated. State-of-the-art approaches to SPI require a high number of samples to provide practical probabilistic guarantees on the improved policy's performance. We present a novel approach to the SPI problem that prov… ▽ More

    Submitted 13 May, 2023; originally announced May 2023.

    Comments: Accecpted at IJCAI 2023

  5. arXiv:2303.08271  [pdf, other

    cs.AI cs.LG

    Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

    Authors: Merlijn Krale, Thiago D. Simão, Nils Jansen

    Abstract: We study Markov decision processes (MDPs), where agents have direct control over when and how they gather information, as formalized by action-contingent noiselessly observable MDPs (ACNO-MPDs). In these models, actions consist of two components: a control action that affects the environment, and a measurement action that affects what the agent can observe. To solve ACNO-MDPs, we introduce the act… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accecpted at ICAPS 2023

  6. arXiv:2303.05848  [pdf, ps, other

    cs.AI cs.LG cs.RO eess.SY

    Decision-Making Under Uncertainty: Beyond Probabilities

    Authors: Thom Badings, Thiago D. Simão, Marnix Suilen, Nils Jansen

    Abstract: This position paper reflects on the state-of-the-art in decision-making under uncertainty. A classical assumption is that probabilities can sufficiently capture all uncertainty in a system. In this paper, the focus is on the uncertainty that goes beyond this classical interpretation, particularly by employing a clear distinction between aleatoric and epistemic uncertainty. The paper features an ov… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

  7. arXiv:2301.04939  [pdf, other

    cs.AI cs.LG

    Safe Policy Improvement for POMDPs via Finite-State Controllers

    Authors: Thiago D. Simão, Marnix Suilen, Nils Jansen

    Abstract: We study safe policy improvement (SPI) for partially observable Markov decision processes (POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) historical data about an environment, and (2) the so-called behavior policy that previously generated this data by interacting with the environment. SPI methods neither require access to a model nor the environment itse… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

    Comments: Accecpted at AAAI 2023

  8. arXiv:2212.05337  [pdf, ps, other

    cs.LG

    Targeted Adversarial Attacks on Deep Reinforcement Learning Policies via Model Checking

    Authors: Dennis Gross, Thiago D. Simao, Nils Jansen, Guillermo A. Perez

    Abstract: Deep Reinforcement Learning (RL) agents are susceptible to adversarial noise in their observations that can mislead their policies and decrease their performance. However, an adversary may be interested not only in decreasing the reward, but also in modifying specific temporal logic properties of the policy. This paper presents a metric that measures the exact impact of adversarial attacks against… ▽ More

    Submitted 10 December, 2022; originally announced December 2022.

    Comments: ICAART 2023 Paper (Technical Report)

  9. arXiv:2210.01801  [pdf, other

    cs.LG cs.AI

    Safe Reinforcement Learning From Pixels Using a Stochastic Latent Representation

    Authors: Yannick Hogewind, Thiago D. Simao, Tal Kachman, Nils Jansen

    Abstract: We address the problem of safe reinforcement learning from pixel observations. Inherent challenges in such settings are (1) a trade-off between reward optimization and adhering to safety constraints, (2) partial observability, and (3) high-dimensional observations. We formalize the problem in a constrained, partially observable Markov decision process framework, where an agent obtains distinct rew… ▽ More

    Submitted 2 October, 2022; originally announced October 2022.

  10. arXiv:2205.15827  [pdf, other

    cs.AI cs.LG

    Robust Anytime Learning of Markov Decision Processes

    Authors: Marnix Suilen, Thiago D. Simão, David Parker, Nils Jansen

    Abstract: Markov decision processes (MDPs) are formal models commonly used in sequential decision-making. MDPs capture the stochasticity that may arise, for instance, from imprecise actuators via probabilities in the transition function. However, in data-driven applications, deriving precise probabilities from (limited) data introduces statistical errors that may lead to unexpected or undesirable outcomes.… ▽ More

    Submitted 19 June, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: Accepted at NeurIPS 2022

  11. arXiv:1909.05236  [pdf, other

    cs.LG cs.AI stat.ML

    Safe Policy Improvement with an Estimated Baseline Policy

    Authors: Thiago D. Simão, Romain Laroche, Rémi Tachet des Combes

    Abstract: Previous work has shown the unreliability of existing algorithms in the batch Reinforcement Learning setting, and proposed the theoretically-grounded Safe Policy Improvement with Baseline Bootstrap** (SPIBB) fix: reproduce the baseline policy in the uncertain state-action pairs, in order to control the variance on the trained policy performance. However, in many real-world applications such as d… ▽ More

    Submitted 28 December, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

    Comments: Published at AAMAS 2020