Skip to main content

Showing 1–18 of 18 results for author: da Silva, B C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.16241  [pdf, other

    cs.LG stat.ME

    Position: Benchmarking is Limited in Reinforcement Learning Research

    Authors: Scott M. Jordan, Adam White, Bruno Castro da Silva, Martha White, Philip S. Thomas

    Abstract: Novel reinforcement learning algorithms, or improvements on existing ones, are commonly justified by evaluating their performance on benchmark environments and are compared to an ever-changing set of standard algorithms. However, despite numerous calls for improvements, experimental practices continue to produce misleading or unsupported claims. One reason for the ongoing substandard practices is… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 19 pages, 13 figures, The Forty-first International Conference on Machine Learning (ICML 2024)

  2. arXiv:2404.08555  [pdf, other

    cs.LG cs.AI cs.CL

    RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs

    Authors: Shreyas Chaudhari, Pranjal Aggarwal, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, Karthik Narasimhan, Ameet Deshpande, Bruno Castro da Silva

    Abstract: State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement learning from human feedback (RLHF), which leverages human feedback to update the model in accordance with human preferences and mitigate issues like toxicity and hal… ▽ More

    Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  3. arXiv:2312.12972  [pdf, other

    cs.LG

    From Past to Future: Rethinking Eligibility Traces

    Authors: Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva

    Abstract: In this paper, we introduce a fresh perspective on the challenges of credit assignment and policy evaluation. First, we delve into the nuances of eligibility traces and explore instances where their updates may result in unexpected credit assignment to preceding states. From this investigation emerges the concept of a novel value function, which we refer to as the \emph{bidirectional value functio… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted in The 38th Annual AAAI Conference on Artificial Intelligence

  4. arXiv:2310.19007  [pdf, other

    cs.LG

    Behavior Alignment via Reward Function Optimization

    Authors: Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno Castro da Silva

    Abstract: Designing reward functions for efficiently guiding reinforcement learning (RL) agents toward specific behaviors is a complex task. This is challenging since it requires the identification of reward structures that are not sparse and that avoid inadvertently inducing undesirable behaviors. Naively modifying the reward structure to offer denser and more frequent feedback can lead to unintended outco… ▽ More

    Submitted 31 October, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: (Spotlight) Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

  5. arXiv:2305.09838  [pdf, other

    cs.LG cs.AI

    Coagent Networks: Generalized and Scaled

    Authors: James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas

    Abstract: Coagent networks for reinforcement learning (RL) [Thomas and Barto, 2011] provide a powerful and flexible framework for deriving principled learning rules for arbitrary stochastic neural networks. The coagent framework offers an alternative to backpropagation-based deep learning (BDL) that overcomes some of backpropagation's main limitations. For example, coagent networks can compute different par… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

  6. arXiv:2301.10330  [pdf, other

    cs.LG cs.AI

    Off-Policy Evaluation for Action-Dependent Non-Stationary Environments

    Authors: Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno Castro da Silva, Emma Brunskil, Philip S. Thomas

    Abstract: Methods for sequential decision-making are often built upon a foundational assumption that the underlying decision process is stationary. This limits the application of such methods because real-world problems are often subject to changes due to external factors (passive non-stationarity), changes induced by interactions with the system itself (active non-stationarity), or both (hybrid non-station… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: Accepted at Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

  7. Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization

    Authors: Lucas N. Alegre, Ana L. C. Bazzan, Diederik M. Roijers, Ann Nowé, Bruno C. da Silva

    Abstract: Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences over (possibly conflicting) reward functions. Such algorithms often learn a set of policies (each optimized for a particular agent preference) that can later be used to solve problems with novel preferences. We introduce a novel algorithm that uses Generalized Po… ▽ More

    Submitted 23 March, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: Accepted to AAMAS 2023

  8. arXiv:2208.14501  [pdf, other

    cs.LG cs.AI cs.RO eess.SY

    Model-Based Reinforcement Learning with SINDy

    Authors: Rushiv Arora, Bruno Castro da Silva, Eliot Moss

    Abstract: We draw on the latest advancements in the physics community to propose a novel method for discovering the governing non-linear dynamics of physical systems in reinforcement learning (RL). We establish that this method is capable of discovering the underlying dynamics using significantly fewer trajectories (as little as one rollout with $\leq 30$ time steps) than state of the art model learning alg… ▽ More

    Submitted 30 August, 2022; originally announced August 2022.

    Comments: 8 pages, 1 figure, 1 table, 1 algorithm, presented at the Decision Awareness in Reinforcement Learning workshop held at the International Conference on Machine Learning, 22 July 2022, Baltimore MD, USA

  9. arXiv:2208.11744  [pdf, other

    cs.LG cs.AI cs.CY

    Enforcing Delayed-Impact Fairness Guarantees

    Authors: Aline Weber, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Bruno Castro da Silva

    Abstract: Recent research has shown that seemingly fair machine learning models, when used to inform decisions that have an impact on peoples' lives or well-being (e.g., applications involving education, employment, and lending), can inadvertently increase social inequality in the long term. This is because prior fairness-aware algorithms only consider static fairness constraints, such as equal opportunity… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: 24 pages, 5 figures

  10. arXiv:2206.11326  [pdf, other

    cs.LG cs.AI

    Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

    Authors: Lucas N. Alegre, Ana L. C. Bazzan, Bruno C. da Silva

    Abstract: In many real-world applications, reinforcement learning (RL) agents might have to solve multiple tasks, each one typically modeled via a reward function. If reward functions are expressed linearly, and the agent has previously learned a set of policies for different tasks, successor features (SFs) can be exploited to combine such policies and identify reasonable solutions for new problems. However… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: Proceedings of the 39th International Conference on Machine Learning (ICML'22)

  11. arXiv:2105.09452  [pdf, other

    cs.LG cs.AI

    Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection

    Authors: Lucas N. Alegre, Ana L. C. Bazzan, Bruno C. da Silva

    Abstract: Non-stationary environments are challenging for reinforcement learning algorithms. If the state transition and/or reward functions change based on latent factors, the agent is effectively tasked with optimizing a behavior that maximizes performance over a possibly infinite random sequence of Markov Decision Processes (MDPs), each of which drawn from some unknown distribution. We call each such MDP… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: Published at Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021)

    MSC Class: 68T05

    Journal ref: Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems. 2021. 97-105

  12. arXiv:2104.12820  [pdf, other

    cs.LG

    Universal Off-Policy Evaluation

    Authors: Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik Learned-Miller, Emma Brunskill, Philip S. Thomas

    Abstract: When faced with sequential decision-making problems, it is often useful to be able to predict what would happen if decisions were made using a new policy. Those predictions must often be based on data collected under some previously used decision-making rule. Many previous methods enable such off-policy (or counterfactual) estimation of the expected value of a performance measure called the return… ▽ More

    Submitted 2 November, 2021; v1 submitted 26 April, 2021; originally announced April 2021.

    Comments: Accepted at Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)

  13. arXiv:2011.13847  [pdf, other

    cs.RO cs.LG

    Autonomous learning of multiple, context-dependent tasks

    Authors: Vieri Giuliano Santucci, Davide Montella, Bruno Castro da Silva, Gianluca Baldassarre

    Abstract: When facing the problem of autonomously learning multiple tasks with reinforcement learning systems, researchers typically focus on solutions where just one parametrised policy per task is sufficient to solve them. However, in complex environments presenting different contexts, the same task might need a set of different skills to be solved. These situations pose two challenges: (a) to recognise t… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

  14. arXiv:2004.04778  [pdf, other

    cs.AI cs.LG cs.MA

    Quantifying the Impact of Non-Stationarity in Reinforcement Learning-Based Traffic Signal Control

    Authors: Lucas N. Alegre, Ana L. C. Bazzan, Bruno C. da Silva

    Abstract: In reinforcement learning (RL), dealing with non-stationarity is a challenging issue. However, some domains such as traffic optimization are inherently non-stationary. Causes for and effects of this are manifold. In particular, when dealing with traffic signal controls, addressing non-stationarity is key since traffic conditions change over time and as a function of traffic control decisions taken… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

    Comments: 13 pages

    Report number: 7:e575

    Journal ref: PeerJ Computer Science 2021

  15. arXiv:2001.01620  [pdf, other

    cs.LG cs.AI stat.ML

    Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints

    Authors: Manuel Del Verme, Bruno Castro da Silva, Gianluca Baldassarre

    Abstract: Reinforcement learning can greatly benefit from the use of options as a way of encoding recurring behaviours and to foster exploration. An important open problem is how can an agent autonomously learn useful options when solving particular distributions of related tasks. We investigate some of the conditions that influence optimality of options, in settings where agents have a limited time budget… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

  16. arXiv:1905.02690  [pdf, other

    cs.AI cs.RO

    Autonomous Open-Ended Learning of Interdependent Tasks

    Authors: Vieri Giuliano Santucci, Emilio Cartoni, Bruno Castro da Silva, Gianluca Baldassarre

    Abstract: Autonomy is fundamental for artificial agents acting in complex real-world scenarios. The acquisition of many different skills is pivotal to foster versatile autonomous behaviour and thus a main objective for robotics and machine learning. Intrinsic motivations have proven to properly generate a task-agnostic signal to drive the autonomous acquisition of multiple policies in settings requiring the… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

  17. arXiv:1711.09048  [pdf, other

    cs.AI cs.RO eess.SY

    A Compression-Inspired Framework for Macro Discovery

    Authors: Francisco M. Garcia, Bruno C. da Silva, Philip S. Thomas

    Abstract: In this paper we consider the problem of how a reinforcement learning agent tasked with solving a set of related Markov decision processes can use knowledge acquired early in its lifetime to improve its ability to more rapidly solve novel, but related, tasks. One way of exploiting this experience is by identifying recurrent patterns in trajectories obtained from well-performing policies. We propos… ▽ More

    Submitted 22 February, 2019; v1 submitted 24 November, 2017; originally announced November 2017.

    Comments: Accepted as Extended Abstract, AAMAS, 2019

  18. arXiv:1708.05448  [pdf, other

    cs.AI

    On Ensuring that Intelligent Machines Are Well-Behaved

    Authors: Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Emma Brunskill

    Abstract: Machine learning algorithms are everywhere, ranging from simple data analysis and pattern recognition tools used across the sciences to complex systems that achieve super-human performance on various tasks. Ensuring that they are well-behaved---that they do not, for example, cause harm to humans or act in a racist or sexist way---is therefore not a hypothetical problem to be dealt with in the futu… ▽ More

    Submitted 17 August, 2017; originally announced August 2017.