Skip to main content

Showing 1–29 of 29 results for author: Radanovic, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.07933  [pdf, other

    cs.GT cs.LG

    Corruption-Robust Offline Two-Player Zero-Sum Markov Games

    Authors: Andi Nika, Debmalya Mandal, Adish Singla, Goran Radanović

    Abstract: We study data corruption robustness in offline two-player zero-sum Markov games. Given a dataset of realized trajectories of two players, an adversary is allowed to modify an $ε$-fraction of it. The learner's goal is to identify an approximate Nash Equilibrium policy pair from the corrupted data. We consider this problem in linear Markov games under different degrees of data coverage and corruptio… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  2. arXiv:2403.01857  [pdf, ps, other

    cs.LG

    Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

    Authors: Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla

    Abstract: In this paper, we take a step towards a deeper understanding of learning from human preferences by systematically comparing the paradigm of reinforcement learning from human feedback (RLHF) with the recently proposed paradigm of direct preference optimization (DPO). We focus our attention on the class of loglinear policy parametrization and linear reward functions. In order to compare the two para… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  3. arXiv:2402.15826  [pdf, other

    cs.LG cs.AI

    Reward Design for Justifiable Sequential Decision-Making

    Authors: Aleksa Sukovic, Goran Radanovic

    Abstract: Equip** agents with the capacity to justify made decisions using supporting evidence represents a cornerstone of accountable decision-making. Furthermore, ensuring that justifications are in line with human expectations and societal norms is vital, especially in high-stakes situations such as healthcare. In this work, we propose the use of a debate-based reward model for reinforcement learning a… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  4. arXiv:2402.09838  [pdf, other

    cs.LG

    Performative Reinforcement Learning in Gradually Shifting Environments

    Authors: Ben Rank, Stelios Triantafyllou, Debmalya Mandal, Goran Radanovic

    Abstract: When Reinforcement Learning (RL) agents are deployed in practice, they might impact their environment and change its dynamics. We propose a new framework to model this phenomenon, where the current environment depends on the deployed policy as well as its previous dynamics. This is a generalization of Performative RL (PRL) [Mandal et al., 2023]. Unlike PRL, our framework allows to model scenarios… ▽ More

    Submitted 31 May, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  5. arXiv:2402.06734  [pdf, ps, other

    cs.LG cs.AI

    Corruption Robust Offline Reinforcement Learning with Human Feedback

    Authors: Debmalya Mandal, Andi Nika, Parameswaran Kamalaruban, Adish Singla, Goran Radanović

    Abstract: We study data corruption robustness for reinforcement learning with human feedback (RLHF) in an offline setting. Given an offline dataset of pairs of trajectories along with feedback about human preferences, an $\varepsilon$-fraction of the pairs is corrupted (e.g., feedback flipped or trajectory features manipulated), capturing an adversarial attack or noisy human preferences. We aim to design al… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  6. arXiv:2310.11334  [pdf, other

    cs.AI

    Agent-Specific Effects: A Causal Effect Propagation Analysis in Multi-Agent MDPs

    Authors: Stelios Triantafyllou, Aleksa Sukovic, Debmalya Mandal, Goran Radanovic

    Abstract: Establishing causal relationships between actions and outcomes is fundamental for accountable multi-agent decision-making. However, interpreting and quantifying agents' contributions to such relationships pose significant challenges. These challenges are particularly prominent in the context of multi-agent sequential decision-making, where the causal effect of an agent's action on the outcome depe… ▽ More

    Submitted 10 June, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  7. Markov Decision Processes with Time-Varying Geometric Discounting

    Authors: Jiarui Gan, Annika Hennes, Rupak Majumdar, Debmalya Mandal, Goran Radanovic

    Abstract: Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

    Comments: 24 pages, 3 figures

    Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence 37(10) (2023) 11980-11988

  8. arXiv:2306.03832  [pdf, ps, other

    cs.GT cs.LG cs.MA

    Sequential Principal-Agent Problems with Communication: Efficient Computation and Learning

    Authors: Jiarui Gan, Rupak Majumdar, Debmalya Mandal, Goran Radanovic

    Abstract: We study a sequential decision making problem between a principal and an agent with incomplete information on both sides. In this model, the principal and the agent interact in a stochastic environment, and each is privy to observations about the state not available to the other. The principal has the power of commitment, both to elicit information from the agent and to provide signals about her o… ▽ More

    Submitted 17 December, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

  9. arXiv:2306.03311  [pdf, other

    cs.LG cs.AI

    Learning Embeddings for Sequential Tasks Using Population of Agents

    Authors: Mridul Mahajan, Georgios Tzannetos, Goran Radanovic, Adish Singla

    Abstract: We present an information-theoretic framework to learn fixed-dimensional embeddings for tasks in reinforcement learning. We leverage the idea that two tasks are similar if observing an agent's performance on one task reduces our uncertainty about its performance on the other. This intuition is captured by our information-theoretic criterion which uses a diverse agent population as an approximation… ▽ More

    Submitted 8 May, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: IJCAI'24 paper (longer version)

  10. arXiv:2302.13851  [pdf, other

    cs.LG cs.AI cs.CR cs.MA

    Implicit Poisoning Attacks in Two-Agent Reinforcement Learning: Adversarial Policies for Training-Time Attacks

    Authors: Mohammad Mohammadi, Jonathan Nöther, Debmalya Mandal, Adish Singla, Goran Radanovic

    Abstract: In targeted poisoning attacks, an attacker manipulates an agent-environment interaction to force the agent into adopting a policy of interest, called target policy. Prior work has primarily focused on attacks that modify standard MDP primitives, such as rewards or transitions. In this paper, we study targeted poisoning attacks in a two-agent setting where an attacker implicitly poisons the effecti… ▽ More

    Submitted 27 February, 2023; originally announced February 2023.

  11. arXiv:2302.12676  [pdf, other

    cs.AI

    Towards Computationally Efficient Responsibility Attribution in Decentralized Partially Observable MDPs

    Authors: Stelios Triantafyllou, Goran Radanovic

    Abstract: Responsibility attribution is a key concept of accountable multi-agent decision making. Given a sequence of actions, responsibility attribution mechanisms quantify the impact of each participating agent to the final outcome. One such popular mechanism is based on actual causality, and it assigns (causal) responsibility based on the actions that were found to be pivotal for the considered outcome.… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.

    Comments: AAMAS 2023

  12. arXiv:2302.03608  [pdf, other

    cs.LG

    Online Reinforcement Learning with Uncertain Episode Lengths

    Authors: Debmalya Mandal, Goran Radanovic, Jiarui Gan, Adish Singla, Rupak Majumdar

    Abstract: Existing episodic reinforcement algorithms assume that the length of an episode is fixed across time and known a priori. In this paper, we consider a general framework of episodic reinforcement learning when the length of each episode is drawn from a distribution. We first establish that this problem is equivalent to online reinforcement learning with general discounting where the learner is tryin… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: To appear at AAAI-2023

  13. arXiv:2207.00046  [pdf, other

    cs.LG cs.GT

    Performative Reinforcement Learning

    Authors: Debmalya Mandal, Stelios Triantafyllou, Goran Radanovic

    Abstract: We introduce the framework of performative reinforcement learning where the policy chosen by the learner affects the underlying reward and transition dynamics of the environment. Following the recent literature on performative prediction~\cite{Perdomo et. al., 2020}, we introduce the concept of performatively stable policy. We then consider a regularized version of the reinforcement learning probl… ▽ More

    Submitted 7 February, 2023; v1 submitted 30 June, 2022; originally announced July 2022.

  14. arXiv:2204.00302  [pdf, other

    cs.AI

    Actual Causality and Responsibility Attribution in Decentralized Partially Observable Markov Decision Processes

    Authors: Stelios Triantafyllou, Adish Singla, Goran Radanovic

    Abstract: Actual causality and a closely related concept of responsibility attribution are central to accountable decision making. Actual causality focuses on specific outcomes and aims to identify decisions (actions) that were critical in realizing an outcome of interest. Responsibility attribution is complementary and aims to identify the extent to which decision makers (agents) are responsible for this o… ▽ More

    Submitted 9 August, 2022; v1 submitted 1 April, 2022; originally announced April 2022.

    Comments: In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (AIES22)

  15. arXiv:2201.02185  [pdf, other

    cs.LG cs.AI

    Admissible Policy Teaching through Reward Design

    Authors: Kiarash Banihashem, Adish Singla, Jiarui Gan, Goran Radanovic

    Abstract: We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies. The goal of the reward designer is to modify the underlying reward function cost-efficiently while ensuring that any approximately optimal deterministic policy under the new reward function is admissible and performs well under the original reward function. This p… ▽ More

    Submitted 6 January, 2022; originally announced January 2022.

  16. arXiv:2107.11927  [pdf, other

    cs.AI

    On Blame Attribution for Accountable Multi-Agent Sequential Decision Making

    Authors: Stelios Triantafyllou, Adish Singla, Goran Radanovic

    Abstract: Blame attribution is one of the key aspects of accountable decision making, as it provides means to quantify the responsibility of an agent for a decision making outcome. In this paper, we study blame attribution in the context of cooperative multi-agent sequential decision making. As a particular setting of interest, we focus on cooperative decision making formalized by Multi-Agent Markov Decisio… ▽ More

    Submitted 25 January, 2022; v1 submitted 25 July, 2021; originally announced July 2021.

    Comments: NeurIPS 2021

  17. arXiv:2107.08828  [pdf, ps, other

    cs.LG cs.AI

    Reinforcement Learning for Education: Opportunities and Challenges

    Authors: Adish Singla, Anna N. Rafferty, Goran Radanovic, Neil T. Heffernan

    Abstract: This survey article has grown out of the RL4ED workshop organized by the authors at the Educational Data Mining (EDM) 2021 conference. We organized this workshop as part of a community-building effort to bring together researchers and practitioners interested in the broad areas of reinforcement learning (RL) and education (ED). This article aims to provide an overview of the workshop activities an… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

  18. arXiv:2106.05137  [pdf, ps, other

    cs.GT

    Bayesian Persuasion in Sequential Decision-Making

    Authors: Jiarui Gan, Rupak Majumdar, Goran Radanovic, Adish Singla

    Abstract: We study a dynamic model of Bayesian persuasion in sequential decision-making settings. An informed principal observes an external parameter of the world and advises an uninformed agent about actions to take over time. The agent takes actions in each time step based on the current state, the principal's advice/signal, and beliefs about the external parameter. The action of the agent updates the st… ▽ More

    Submitted 24 May, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

  19. arXiv:2102.05776  [pdf, other

    cs.LG cs.AI

    Defense Against Reward Poisoning Attacks in Reinforcement Learning

    Authors: Kiarash Banihashem, Adish Singla, Goran Radanovic

    Abstract: We study defense strategies against reward poisoning attacks in reinforcement learning. As a threat model, we consider attacks that minimally alter rewards to make the attacker's target policy uniquely optimal under the poisoned rewards, with the optimality gap specified by an attack parameter. Our goal is to design agents that are robust against such attacks in terms of the worst-case utility w.r… ▽ More

    Submitted 20 June, 2021; v1 submitted 10 February, 2021; originally announced February 2021.

  20. arXiv:2011.10824  [pdf, other

    cs.LG cs.AI cs.CR

    Policy Teaching in Reinforcement Learning via Environment Poisoning Attacks

    Authors: Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiao** Zhu, Adish Singla

    Abstract: We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes reward in infinite-horizon problem settings. The attacker can manipulate the rewards and the transition dynamics in the learning environ… ▽ More

    Submitted 21 November, 2020; originally announced November 2020.

    Comments: Journal version of ICML'20 paper. New theoretical results for jointly poisoning rewards and transitions

  21. Diversity in News Recommendations

    Authors: Abraham Bernstein, Claes de Vreese, Natali Helberger, Wolfgang Schulz, Katharina Zweig, Christian Baden, Michael A. Beam, Marc P. Hauer, Lucien Heitz, Pascal Jürgens, Christian Katzenbach, Benjamin Kille, Beate Klimkiewicz, Wiebke Loosen, Judith Moeller, Goran Radanovic, Guy Shani, Nava Tintarev, Suzanne Tolmeijer, Wouter van Atteveldt, Sanne Vrijenhoek, Theresa Zueger

    Abstract: News diversity in the media has for a long time been a foundational and uncontested basis for ensuring that the communicative needs of individuals and society at large are met. Today, people increasingly rely on online content and recommender systems to consume information challenging the traditional concept of news diversity. In addition, the very concept of diversity, which differs between disci… ▽ More

    Submitted 25 May, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: Published as Manifesto from Dagstuhl Perspectives Workshop 19482

    ACM Class: H.3.3

    Journal ref: Dagstuhl Perspectives Workshop: Diversity, Fairness, and Data-Drives Personalization in (News) Recommender Systems, Dagstuhl Manifestos (2021), Vol. 9, Issue 1, pp. 43-61

  22. arXiv:2003.12909  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning

    Authors: Amin Rakhsha, Goran Radanovic, Rati Devidze, Xiao** Zhu, Adish Singla

    Abstract: We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings. The attacker can manipulate the rewards or the transition dynamics in… ▽ More

    Submitted 18 August, 2020; v1 submitted 28 March, 2020; originally announced March 2020.

    Comments: ICML 2020

  23. arXiv:1901.08029  [pdf, ps, other

    cs.LG stat.ML

    Learning to Collaborate in Markov Decision Processes

    Authors: Goran Radanovic, Rati Devidze, David C. Parkes, Adish Singla

    Abstract: We consider a two-agent MDP framework where agents repeatedly solve a task in a collaborative setting. We study the problem of designing a learning algorithm for the first agent (A1) that facilitates a successful collaboration even in cases when the second agent (A2) is adapting its policy in an unknown way. The key challenge in our setting is that the first agent faces non-stationarity in rewards… ▽ More

    Submitted 19 June, 2019; v1 submitted 23 January, 2019; originally announced January 2019.

  24. arXiv:1811.03654  [pdf, other

    cs.AI cs.CY

    How Do Fairness Definitions Fare? Examining Public Attitudes Towards Algorithmic Definitions of Fairness

    Authors: Nripsuta Saxena, Karen Huang, Evan DeFilippis, Goran Radanovic, David Parkes, Yang Liu

    Abstract: What is the best way to define algorithmic fairness? While many definitions of fairness have been proposed in the computer science literature, there is no clear agreement over a particular definition. In this work, we investigate ordinary people's perceptions of three of these fairness definitions. Across two online experiments, we test which definitions people perceive to be the fairest in the co… ▽ More

    Submitted 27 January, 2019; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: To appear at AI Ethics and Society (AIES) 2019

  25. arXiv:1711.06740  [pdf, other

    cs.GT

    Information Gathering with Peers: Submodular Optimization with Peer-Prediction Constraints

    Authors: Goran Radanovic, Adish Singla, Andreas Krause, Boi Faltings

    Abstract: We study a problem of optimal information gathering from multiple data providers that need to be incentivized to provide accurate information. This problem arises in many real world applications that rely on crowdsourced data sets, but where the process of obtaining data is costly. A notable example of such a scenario is crowd sensing. To this end, we formulate the problem of optimal information g… ▽ More

    Submitted 24 November, 2017; v1 submitted 17 November, 2017; originally announced November 2017.

    Comments: Longer version of AAAI'18 paper

  26. arXiv:1711.06614  [pdf, other

    cs.GT

    Partial Truthfulness in Minimal Peer Prediction Mechanisms with Limited Knowledge

    Authors: Goran Radanovic, Boi Faltings

    Abstract: We study minimal single-task peer prediction mechanisms that have limited knowledge about agents' beliefs. Without knowing what agents' beliefs are or eliciting additional information, it is not possible to design a truthful mechanism in a Bayesian-Nash sense. We go beyond truthfulness and explore equilibrium strategy profiles that are only partially truthful. Using the results from the multi-arme… ▽ More

    Submitted 26 November, 2017; v1 submitted 17 November, 2017; originally announced November 2017.

  27. arXiv:1707.01875  [pdf, ps, other

    cs.LG

    Calibrated Fairness in Bandits

    Authors: Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal, David C. Parkes

    Abstract: We study fairness within the stochastic, \emph{multi-armed bandit} (MAB) decision making framework. We adapt the fairness framework of "treating similar individuals similarly" to this setting. Here, an `individual' corresponds to an arm and two arms are `similar' if they have a similar quality distribution. First, we adopt a {\em smoothness constraint} that if two arms have a similar quality distr… ▽ More

    Submitted 6 July, 2017; originally announced July 2017.

    Comments: To be presented at the FAT-ML'17 workshop

  28. arXiv:1706.00119  [pdf, ps, other

    cs.LG stat.ML

    Bayesian fairness

    Authors: Christos Dimitrakakis, Yang Liu, David Parkes, Goran Radanovic

    Abstract: We consider the problem of how decision making can be fair when the underlying probabilistic model of the world is not known with certainty. We argue that recent notions of fairness in machine learning need to explicitly incorporate parameter uncertainty, hence we introduce the notion of {\em Bayesian fairness} as a suitable candidate for fair decision rules. Using balance, a definition of fairnes… ▽ More

    Submitted 4 November, 2018; v1 submitted 31 May, 2017; originally announced June 2017.

    Comments: 13 pages, 8 figures, to appear at AAAI 2019

  29. arXiv:1704.05269  [pdf, other

    cs.GT

    Peer Truth Serum: Incentives for Crowdsourcing Measurements and Opinions

    Authors: Boi Faltings, Radu Jurca, Goran Radanovic

    Abstract: Modern decision making tools are based on statistical analysis of abundant data, which is often collected by querying multiple individuals. We consider data collection through crowdsourcing, where independent and self-interested agents, non-experts, report measurements, such as sensor readings, opinions, such as product reviews, or answers to human intelligence tasks. Since the accuracy of informa… ▽ More

    Submitted 18 April, 2017; originally announced April 2017.