Search | arXiv e-print repository

Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning

Authors: Andreas Schlaginhaufen, Maryam Kamgarpour

Abstract: Inverse reinforcement learning (IRL) aims to infer a reward from expert demonstrations, motivated by the idea that the reward, rather than the policy, is the most succinct and transferable description of a task [Ng et al., 2000]. However, the reward corresponding to an optimal policy is not unique, making it unclear if an IRL-learned reward is transferable to new transition laws in the sense that… ▽ More Inverse reinforcement learning (IRL) aims to infer a reward from expert demonstrations, motivated by the idea that the reward, rather than the policy, is the most succinct and transferable description of a task [Ng et al., 2000]. However, the reward corresponding to an optimal policy is not unique, making it unclear if an IRL-learned reward is transferable to new transition laws in the sense that its optimal policy aligns with the optimal policy corresponding to the expert's true reward. Past work has addressed this problem only under the assumption of full access to the expert's policy, guaranteeing transferability when learning from two experts with the same reward but different transition laws that satisfy a specific rank condition [Rolland et al., 2022]. In this work, we show that the conditions developed under full access to the expert's policy cannot guarantee transferability in the more practical scenario where we have access only to demonstrations of the expert. Instead of a binary rank condition, we propose principal angles as a more refined measure of similarity and dissimilarity between transition laws. Based on this, we then establish two key results: 1) a sufficient condition for transferability to any transition laws when learning from at least two experts with sufficiently different transition laws, and 2) a sufficient condition for transferability to local changes in the transition law when learning from a single expert. Furthermore, we also provide a probably approximately correct (PAC) algorithm and an end-to-end analysis for learning transferable rewards from demonstrations of multiple experts. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.15497 [pdf, ps, other]

Finite-time convergence to an $ε$-efficient Nash equilibrium in potential games

Authors: Anna Maddux, Reda Ouhamma, Maryam Kamgarpour

Abstract: This paper investigates the convergence time of log-linear learning to an $ε$-efficient Nash equilibrium (NE) in potential games. In such games, an efficient NE is defined as the maximizer of the potential function. Existing results are limited to potential games with stringent structural assumptions and entail exponential convergence times in $1/ε$. Unaddressed so far, we tackle general potential… ▽ More This paper investigates the convergence time of log-linear learning to an $ε$-efficient Nash equilibrium (NE) in potential games. In such games, an efficient NE is defined as the maximizer of the potential function. Existing results are limited to potential games with stringent structural assumptions and entail exponential convergence times in $1/ε$. Unaddressed so far, we tackle general potential games and prove the first finite-time convergence to an $ε$-efficient NE. In particular, by using a problem-dependent analysis, our bound depends polynomially on $1/ε$. Furthermore, we provide two extensions of our convergence result: first, we show that a variant of log-linear learning that requires a factor $A$ less feedback on the utility per round enjoys a similar convergence time; second, we demonstrate the robustness of our convergence guarantee if log-linear learning is subject to small perturbations such as alterations in the learning rule or noise-corrupted utilities. △ Less

Submitted 17 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 9 main pages, 25 pages, 1 Table

arXiv:2404.03314 [pdf, other]

Learning to Bid in Forward Electricity Markets Using a No-Regret Algorithm

Authors: Arega Getaneh Abate, Dorsa Majdi, Jalal Kazempour, Maryam Kamgarpour

Abstract: It is a common practice in the current literature of electricity markets to use game-theoretic approaches for strategic price bidding. However, they generally rely on the assumption that the strategic bidders have prior knowledge of rival bids, either perfectly or with some uncertainty. This is not necessarily a realistic assumption. This paper takes a different approach by relaxing such an assump… ▽ More It is a common practice in the current literature of electricity markets to use game-theoretic approaches for strategic price bidding. However, they generally rely on the assumption that the strategic bidders have prior knowledge of rival bids, either perfectly or with some uncertainty. This is not necessarily a realistic assumption. This paper takes a different approach by relaxing such an assumption and exploits a no-regret learning algorithm for repeated games. In particular, by using the \emph{a posteriori} information about rivals' bids, a learner can implement a no-regret algorithm to optimize her/his decision making. Given this information, we utilize a multiplicative weight-update algorithm, adapting bidding strategies over multiple rounds of an auction to minimize her/his regret. Our numerical results show that when the proposed learning approach is used the social cost and the market-clearing prices can be higher than those corresponding to the classical game-theoretic approaches. The takeaway for market regulators is that electricity markets might be exposed to greater market power of suppliers than what classical analysis shows. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.16829 [pdf, ps, other]

Convergence of a model-free entropy-regularized inverse reinforcement learning algorithm

Authors: Titouan Renard, Andreas Schlaginhaufen, Tingting Ni, Maryam Kamgarpour

Abstract: Given a dataset of expert demonstrations, inverse reinforcement learning (IRL) aims to recover a reward for which the expert is optimal. This work proposes a model-free algorithm to solve entropy-regularized IRL problem. In particular, we employ a stochastic gradient descent update for the reward and a stochastic soft policy iteration update for the policy. Assuming access to a generative model, w… ▽ More Given a dataset of expert demonstrations, inverse reinforcement learning (IRL) aims to recover a reward for which the expert is optimal. This work proposes a model-free algorithm to solve entropy-regularized IRL problem. In particular, we employ a stochastic gradient descent update for the reward and a stochastic soft policy iteration update for the policy. Assuming access to a generative model, we prove that our algorithm is guaranteed to recover a reward for which the expert is $\varepsilon$-optimal using $\mathcal{O}(1/\varepsilon^{2})$ samples of the Markov decision process (MDP). Furthermore, with $\mathcal{O}(1/\varepsilon^{4})$ samples we prove that the optimal policy corresponding to the recovered reward is $\varepsilon$-close to the expert policy in total variation distance. △ Less

Submitted 23 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2312.08008 [pdf, ps, other]

Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability

Authors: Reda Ouhamma, Maryam Kamgarpour

Abstract: We consider decentralized learning for zero-sum games, where players only see their payoff information and are agnostic to actions and payoffs of the opponent. Previous works demonstrated convergence to a Nash equilibrium in this setting using double time-scale algorithms under strong reachability assumptions. We address the open problem of achieving an approximate Nash equilibrium efficiently wit… ▽ More We consider decentralized learning for zero-sum games, where players only see their payoff information and are agnostic to actions and payoffs of the opponent. Previous works demonstrated convergence to a Nash equilibrium in this setting using double time-scale algorithms under strong reachability assumptions. We address the open problem of achieving an approximate Nash equilibrium efficiently with an uncoupled and single time-scale algorithm under weaker conditions. Our contribution is a rational and convergent algorithm, utilizing Tsallis-entropy regularization in a value-iteration-based approach. The algorithm learns an approximate Nash equilibrium in polynomial time, requiring only the existence of a policy pair that induces an irreducible and aperiodic Markov chain, thus considerably weakening past assumptions. Our analysis leverages negative drift inequalities and introduces novel properties of Tsallis entropy that are of independent interest. △ Less

Submitted 24 May, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: arXiv admin note: text overlap with arXiv:2303.03100 by other authors

arXiv:2312.00561 [pdf, other]

A safe exploration approach to constrained Markov decision processes

Authors: Tingting Ni, Maryam Kamgarpour

Abstract: We consider discounted infinite horizon constrained Markov decision processes (CMDPs) where the goal is to find an optimal policy that maximizes the expected cumulative reward subject to expected cumulative constraints. Motivated by the application of CMDPs in online learning of safety-critical systems, we focus on develo** a model-free and simulator-free algorithm that ensures constraint satisf… ▽ More We consider discounted infinite horizon constrained Markov decision processes (CMDPs) where the goal is to find an optimal policy that maximizes the expected cumulative reward subject to expected cumulative constraints. Motivated by the application of CMDPs in online learning of safety-critical systems, we focus on develo** a model-free and simulator-free algorithm that ensures constraint satisfaction during learning. To this end, we develop an interior point approach based on the log barrier function of the CMDP. Under the commonly assumed conditions of Fisher non-degeneracy and bounded transfer error of the policy parameterization, we establish the theoretical properties of the algorithm. In particular, in contrast to existing CMDP approaches that ensure policy feasibility only upon convergence, our algorithm guarantees the feasibility of the policies during the learning process and converges to the $\varepsilon$-optimal policy with a sample complexity of $\tilde{\mathcal{O}}(\varepsilon^{-6})$. In comparison to the state-of-the-art policy gradient-based algorithm, C-NPG-PDA, our algorithm requires an additional $\mathcal{O}(\varepsilon^{-2})$ samples to ensure policy feasibility during learning with the same Fisher non-degenerate parameterization. △ Less

Submitted 23 May, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: 37 pages, 3 figures

arXiv:2310.14685 [pdf, other]

Multi-Agent Learning in Contextual Games under Unknown Constraints

Authors: Anna M. Maddux, Maryam Kamgarpour

Abstract: We consider the problem of learning to play a repeated contextual game with unknown reward and unknown constraints functions. Such games arise in applications where each agent's action needs to belong to a feasible set, but the feasible set is a priori unknown. For example, in constrained multi-agent reinforcement learning, the constraints on the agents' policies are a function of the unknown dyna… ▽ More We consider the problem of learning to play a repeated contextual game with unknown reward and unknown constraints functions. Such games arise in applications where each agent's action needs to belong to a feasible set, but the feasible set is a priori unknown. For example, in constrained multi-agent reinforcement learning, the constraints on the agents' policies are a function of the unknown dynamics and hence, are themselves unknown. Under kernel-based regularity assumptions on the unknown functions, we develop a no-regret, no-violation approach which exploits similarities among different reward and constraint outcomes. The no-violation property ensures that the time-averaged sum of constraint violations converges to zero as the game is repeated. We show that our algorithm, referred to as c.z.AdaNormalGP, obtains kernel-dependent regret bounds and that the cumulative constraint violations have sublinear kernel-dependent upper bounds. In addition we introduce the notion of constrained contextual coarse correlated equilibria (c.z.CCE) and show that $ε$-c.z.CCEs can be approached whenever players' follow a no-regret no-violation strategy. Finally, we experimentally demonstrate the effectiveness of c.z.AdaNormalGP on an instance of multi-agent reinforcement learning. △ Less

Submitted 14 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

Journal ref: International Conference on Artificial Intelligence and Statistics 2024

arXiv:2306.00629 [pdf, other]

Identifiability and Generalizability in Constrained Inverse Reinforcement Learning

Authors: Andreas Schlaginhaufen, Maryam Kamgarpour

Abstract: Two main challenges in Reinforcement Learning (RL) are designing appropriate reward functions and ensuring the safety of the learned policy. To address these challenges, we present a theoretical framework for Inverse Reinforcement Learning (IRL) in constrained Markov decision processes. From a convex-analytic perspective, we extend prior results on reward identifiability and generalizability to bo… ▽ More Two main challenges in Reinforcement Learning (RL) are designing appropriate reward functions and ensuring the safety of the learned policy. To address these challenges, we present a theoretical framework for Inverse Reinforcement Learning (IRL) in constrained Markov decision processes. From a convex-analytic perspective, we extend prior results on reward identifiability and generalizability to both the constrained setting and a more general class of regularizations. In particular, we show that identifiability up to potential sha** (Cao et al., 2021) is a consequence of entropy regularization and may generally no longer hold for other regularizations or in the presence of safety constraints. We also show that to ensure generalizability to new transition laws and constraints, the true reward must be identified up to a constant. Additionally, we derive a finite sample guarantee for the suboptimality of the learned rewards, and validate our results in a gridworld environment. △ Less

Submitted 1 June, 2023; originally announced June 2023.

Comments: Published at ICML 2023

arXiv:2212.12724 [pdf, other]

doi 10.1109/LRA.2023.3286815

Certification of Bottleneck Task Assignment with Shortest Path Criteria

Authors: Tony A. Wood, Maryam Kamgarpour

Abstract: Minimising the longest travel distance for a group of mobile robots with interchangeable goals requires knowledge of the shortest length paths between all robots and goal destinations. Determining the exact length of the shortest paths in an environment with obstacles is NP-hard however. In this paper, we investigate when polynomial-time approximations of the shortest path search are sufficient to… ▽ More Minimising the longest travel distance for a group of mobile robots with interchangeable goals requires knowledge of the shortest length paths between all robots and goal destinations. Determining the exact length of the shortest paths in an environment with obstacles is NP-hard however. In this paper, we investigate when polynomial-time approximations of the shortest path search are sufficient to determine the optimal assignment of robots to goals. In particular, we propose an algorithm in which the accuracy of the path planning is iteratively increased. The approach provides a certificate when the uncertainties on estimates of the shortest paths become small enough to guarantee the optimality of the goal assignment. To this end, we apply results from assignment sensitivity assuming upper and lower bounds on the length of the shortest paths. We then provide polynomial-time methods to find such bounds by applying sampling-based path planning. The upper bounds are given by feasible paths, the lower bounds are obtained by expanding the sample set and leveraging the knowledge of the sample dispersion. We demonstrate the application of the proposed method with a multi-robot path-planning case study. △ Less

Submitted 8 June, 2023; v1 submitted 24 December, 2022; originally announced December 2022.

arXiv:2207.10415 [pdf, other]

Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning

Authors: Ilnura Usmanova, Yarden As, Maryam Kamgarpour, Andreas Krause

Abstract: Optimizing noisy functions online, when evaluating the objective requires experiments on a deployed system, is a crucial task arising in manufacturing, robotics and many others. Often, constraints on safe inputs are unknown ahead of time, and we only obtain noisy information, indicating how close we are to violating the constraints. Yet, safety must be guaranteed at all times, not only for the fin… ▽ More Optimizing noisy functions online, when evaluating the objective requires experiments on a deployed system, is a crucial task arising in manufacturing, robotics and many others. Often, constraints on safe inputs are unknown ahead of time, and we only obtain noisy information, indicating how close we are to violating the constraints. Yet, safety must be guaranteed at all times, not only for the final output of the algorithm. We introduce a general approach for seeking a stationary point in high dimensional non-linear stochastic optimization problems in which maintaining safety during learning is crucial. Our approach called LB-SGD is based on applying stochastic gradient descent (SGD) with a carefully chosen adaptive step size to a logarithmic barrier approximation of the original problem. We provide a complete convergence analysis of non-convex, convex, and strongly-convex smooth constrained problems, with first-order and zeroth-order feedback. Our approach yields efficient updates and scales better with dimensionality compared to existing approaches. We empirically compare the sample complexity and the computational cost of our method with existing safe learning approaches. Beyond synthetic benchmarks, we demonstrate the effectiveness of our approach on minimizing constraint violation in policy search tasks in safe reinforcement learning (RL). △ Less

Submitted 2 June, 2023; v1 submitted 21 July, 2022; originally announced July 2022.

Comments: 36 pages, 9 pages of appendix

arXiv:2203.07322 [pdf, other]

Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation

Authors: Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause

Abstract: We consider model-based multi-agent reinforcement learning, where the environment transition model is unknown and can only be learned via expensive interactions with the environment. We propose H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efficient algorithm that can efficiently balance exploration, i.e., learning about the environment, and exploitation, i.e., achieve g… ▽ More We consider model-based multi-agent reinforcement learning, where the environment transition model is unknown and can only be learned via expensive interactions with the environment. We propose H-MARL (Hallucinated Multi-Agent Reinforcement Learning), a novel sample-efficient algorithm that can efficiently balance exploration, i.e., learning about the environment, and exploitation, i.e., achieve good equilibrium performance in the underlying general-sum Markov game. H-MARL builds high-probability confidence intervals around the unknown transition model and sequentially updates them based on newly observed data. Using these, it constructs an optimistic hallucinated game for the agents for which equilibrium policies are computed at each round. We consider general statistical models (e.g., Gaussian processes, deep ensembles, etc.) and policy classes (e.g., deep neural networks), and theoretically analyze our approach by bounding the agents' dynamic regret. Moreover, we provide a convergence rate to the equilibria of the underlying Markov game. We demonstrate our approach experimentally on an autonomous driving simulation benchmark. H-MARL learns successful equilibrium policies after a few interactions with the environment and can significantly improve the performance compared to non-optimistic exploration methods. △ Less

Submitted 10 July, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

arXiv:2202.11147 [pdf, ps, other]

On the Rate of Convergence of Payoff-based Algorithms to Nash Equilibrium in Strongly Monotone Games

Authors: Tatiana Tatarenko, Maryam Kamgarpour

Abstract: We derive the rate of convergence to Nash equilibria for the payoff-based algorithm proposed in \cite{tat_kam_TAC}. These rates are achieved under the standard assumption of convexity of the game, strong monotonicity and differentiability of the pseudo-gradient. In particular, we show the algorithm achieves $O(\frac{1}{T})$ in the two-point function evaluating setting and $O(\frac{1}{\sqrt{T}})$ i… ▽ More We derive the rate of convergence to Nash equilibria for the payoff-based algorithm proposed in \cite{tat_kam_TAC}. These rates are achieved under the standard assumption of convexity of the game, strong monotonicity and differentiability of the pseudo-gradient. In particular, we show the algorithm achieves $O(\frac{1}{T})$ in the two-point function evaluating setting and $O(\frac{1}{\sqrt{T}})$ in the one-point function evaluation under additional requirement of Lipschitz continuity of the pseudo-gradient. These rates are to our knowledge the best known rates for the corresponding problem classes. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2108.02753 [pdf, other]

doi 10.1109/LCSYS.2021.3089641

Safe Motion Planning against Multimodal Distributions based on a Scenario Approach

Authors: Hee** Ahn, Colin Chen, Ian M. Mitchell, Maryam Kamgarpour

Abstract: We present the design of a motion planning algorithm that ensures safety for an autonomous vehicle. In particular, we consider a multimodal distribution over uncertainties; for example, the uncertain predictions of future trajectories of surrounding vehicles reflect discrete decisions, such as turning or going straight at intersections. We develop a computationally efficient, scenario-based approa… ▽ More We present the design of a motion planning algorithm that ensures safety for an autonomous vehicle. In particular, we consider a multimodal distribution over uncertainties; for example, the uncertain predictions of future trajectories of surrounding vehicles reflect discrete decisions, such as turning or going straight at intersections. We develop a computationally efficient, scenario-based approach that solves the motion planning problem with high confidence given a quantifiable number of samples from the multimodal distribution. Our approach is based on two preprocessing steps, which 1) separate the samples into distinct clusters and 2) compute a bounding polytope for each cluster. Then, we rewrite the motion planning problem approximately as a mixed-integer problem using the polytopes. We demonstrate via simulation on the nuScenes dataset that our approach ensures safety with high probability in the presence of multimodal uncertainties, and is computationally more efficient and less conservative than a conventional scenario approach. △ Less

Submitted 5 August, 2021; originally announced August 2021.

Comments: Published in IEEE Control Systems Letters

Journal ref: in IEEE Control Systems Letters, vol. 6, pp. 1142-1147, 2022

arXiv:2107.06327 [pdf, other]

Contextual Games: Multi-Agent Learning with Side Information

Authors: Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, Maryam Kamgarpour

Abstract: We formulate the novel class of contextual games, a type of repeated games driven by contextual information at each round. By means of kernel-based regularity assumptions, we model the correlation between different contexts and game outcomes and propose a novel online (meta) algorithm that exploits such correlations to minimize the contextual regret of individual players. We define game-theoretic… ▽ More We formulate the novel class of contextual games, a type of repeated games driven by contextual information at each round. By means of kernel-based regularity assumptions, we model the correlation between different contexts and game outcomes and propose a novel online (meta) algorithm that exploits such correlations to minimize the contextual regret of individual players. We define game-theoretic notions of contextual Coarse Correlated Equilibria (c-CCE) and optimal contextual welfare for this new class of games and show that c-CCEs and optimal welfare can be approached whenever players' contextual regrets vanish. Finally, we empirically validate our results in a traffic routing experiment, where our algorithm leads to better performance and higher welfare compared to baselines that do not exploit the available contextual information or the correlations present in the game. △ Less

Submitted 13 July, 2021; originally announced July 2021.

Journal ref: Proc. of Neural Information Processing Systems (NeurIPS), 2020

arXiv:2103.01840 [pdf, other]

Multi-robot task allocation for safe planning against stochastic hazard dynamics

Authors: Daniel Tihanyi, Yimeng Lu, Orcun Karaca, Maryam Kamgarpour

Abstract: We address multi-robot safe mission planning in uncertain dynamic environments. This problem arises in several applications including safety-critical exploration, surveillance, and emergency rescue missions. Computation of a multi-robot optimal control policy is challenging not only because of the complexity of incorporating dynamic uncertainties while planning, but also because of the exponential… ▽ More We address multi-robot safe mission planning in uncertain dynamic environments. This problem arises in several applications including safety-critical exploration, surveillance, and emergency rescue missions. Computation of a multi-robot optimal control policy is challenging not only because of the complexity of incorporating dynamic uncertainties while planning, but also because of the exponential growth in problem size as a function of number of robots. Leveraging recent works obtaining a tractable safety maximizing plan for a single robot, we propose a scalable two-stage framework to solve the problem at hand. Specifically, the problem is split into a low-level single-agent control problem and a high-level task allocation problem. The low-level problem uses an efficient approximation of stochastic reachability for a Markov decision process to derive the optimal control policy under dynamic uncertainty. The task allocation is solved using polynomial-time forward and reverse greedy heuristics and in a distributed auction-based manner. By leveraging the properties of our safety objective function, we provide provable performance bounds on the safety of the approximate solutions proposed by these two heuristics. We evaluate the theory with extensive numerical case studies. △ Less

Submitted 13 November, 2022; v1 submitted 2 March, 2021; originally announced March 2021.

arXiv:2102.08690 [pdf, ps, other]

doi 10.1016/j.orl.2021.05.009

A market-based approach for enabling inter-area reserve exchange

Authors: Orcun Karaca, Stefanos Delikaraoglou, Maryam Kamgarpour

Abstract: Considering the sequential clearing of energy and reserves in Europe, enabling inter-area reserve exchange requires optimally allocating inter-area transmission capacities between these two markets. To achieve this, we provide a market-based allocation framework and derive payments with desirable properties. The proposed min-max least core selecting payments achieve individual rationality, budget… ▽ More Considering the sequential clearing of energy and reserves in Europe, enabling inter-area reserve exchange requires optimally allocating inter-area transmission capacities between these two markets. To achieve this, we provide a market-based allocation framework and derive payments with desirable properties. The proposed min-max least core selecting payments achieve individual rationality, budget balance, and approximate incentive compatibility and coalitional stability. The results extend the works on private discrete items to a network of continuous public choices. △ Less

Submitted 17 February, 2021; originally announced February 2021.

Journal ref: Operations Research Letters, 49(4), 501-506, 2021

arXiv:2007.05271 [pdf, other]

Learning to Play Sequential Games versus Unknown Opponents

Authors: Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

Abstract: We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We seek to design strategies for the learner to successfully interact with the opponent. While most previous approaches consider known opponent models, we focus on the setting in which the opponent's model is unknown. To this end, we use kernel-based regularity assumptions… ▽ More We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We seek to design strategies for the learner to successfully interact with the opponent. While most previous approaches consider known opponent models, we focus on the setting in which the opponent's model is unknown. To this end, we use kernel-based regularity assumptions to capture and exploit the structure in the opponent's response. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. The algorithm combines ideas from bilevel optimization and online learning to effectively balance between exploration (learning about the opponent's model) and exploitation (selecting highly rewarding actions for the learner). Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response and scale sublinearly with the number of game rounds. Moreover, we specialize our approach to repeated Stackelberg games, and empirically demonstrate its effectiveness in a traffic routing and wildlife conservation task △ Less

Submitted 10 July, 2020; originally announced July 2020.

arXiv:2003.02913 [pdf, other]

Safe Mission Planning under Dynamical Uncertainties

Authors: Yimeng Lu, Maryam Kamgarpour

Abstract: This paper considers safe robot mission planning in uncertain dynamical environments. This problem arises in applications such as surveillance, emergency rescue, and autonomous driving. It is a challenging problem due to modeling and integrating dynamical uncertainties into a safe planning framework, and finding a solution in a computationally tractable way. In this work, we first develop a probab… ▽ More This paper considers safe robot mission planning in uncertain dynamical environments. This problem arises in applications such as surveillance, emergency rescue, and autonomous driving. It is a challenging problem due to modeling and integrating dynamical uncertainties into a safe planning framework, and finding a solution in a computationally tractable way. In this work, we first develop a probabilistic model for dynamical uncertainties. Then, we provide a framework to generate a path that maximizes safety for complex missions by incorporating the uncertainty model. We also devise a Monte Carlo method to obtain a safe path efficiently. Finally, we evaluate the performance of our approach and compare it to potential alternatives in several case studies. △ Less

Submitted 5 March, 2020; originally announced March 2020.

Comments: This paper appears in ICRA 2020

arXiv:2002.12613 [pdf, other]

Mixed Strategies for Robust Optimization of Unknown Objectives

Authors: Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

Abstract: We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter. For this setting, we design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations. GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes t… ▽ More We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter. For this setting, we design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations. GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes the worst-case expected objective value. To achieve this, it combines techniques from online learning with nonparametric confidence bounds from Gaussian processes. Our theoretical results characterize the number of samples required by GP-MRO to discover a robust near-optimal mixed strategy for different GP kernels of interest. We experimentally demonstrate the performance of our algorithm on synthetic datasets and on human-assisted trajectory planning tasks for autonomous vehicles. In our simulations, we show that robust deterministic strategies can be overly conservative, while the mixed strategies found by GP-MRO significantly improve the overall performance. △ Less

Submitted 2 March, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

arXiv:1912.09905 [pdf, other]

doi 10.1016/j.ifacol.2020.12.029

No-Regret Learning from Partially Observed Data in Repeated Auctions

Authors: Orcun Karaca, Pier Giuseppe Sessa, Anna Leidi, Maryam Kamgarpour

Abstract: We study a general class of repeated auctions, such as the ones found in electricity markets, as multi-agent games between the bidders. In such a repeated setting, bidders can adapt their strategies online based on the data observed in the previous auction rounds. Moreover, if no-regret algorithms are employed by the bidders to update their strategies, the game is known to converge to a coarse-cor… ▽ More We study a general class of repeated auctions, such as the ones found in electricity markets, as multi-agent games between the bidders. In such a repeated setting, bidders can adapt their strategies online based on the data observed in the previous auction rounds. Moreover, if no-regret algorithms are employed by the bidders to update their strategies, the game is known to converge to a coarse-correlated equilibrium, which generalizes the notion of Nash equilibrium to a probabilistic view of the auction state. Well-studied no-regret algorithms depend on the feedback information available at every round, and can be mainly distinguished as bandit (or payoff-based), and full-information. However, the information structure found in auctions lies in between these two models, since participants can often obtain partial observations of their utilities under different strategies. To this end, we modify existing bandit algorithms to exploit such additional information. Specifically, we utilize the feedback information that bidders can obtain when their bids are not accepted, and build a more accurate estimator of the utility vector. This results in improved regret guarantees compared to standard bandit algorithms. Moreover, we propose a heuristic method for auction settings where the proposed algorithm is not directly applicable. Finally, we demonstrate our findings on case studies based on realistic electricity market models. △ Less

Submitted 20 December, 2019; originally announced December 2019.

Journal ref: IFAC-PapersOnLine, 53(2), 14-19, 2020

arXiv:1909.08540 [pdf, other]

No-Regret Learning in Unknown Games with Correlated Payoffs

Authors: Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas Krause

Abstract: We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performanc… ▽ More We consider the problem of learning to play a repeated multi-agent game with an unknown reward function. Single player online learning algorithms attain strong regret bounds when provided with full information feedback, which unfortunately is unavailable in many real-world scenarios. Bandit feedback alone, i.e., observing outcomes only for the selected action, yields substantially worse performance. In this paper, we consider a natural model where, besides a noisy measurement of the obtained reward, the player can also observe the opponents' actions. This feedback model, together with a regularity assumption on the reward function, allows us to exploit the correlations among different game outcomes by means of Gaussian processes (GPs). We propose a novel confidence-bound based bandit algorithm GP-MW, which utilizes the GP model for the reward function and runs a multiplicative weight (MW) method. We obtain novel kernel-dependent regret bounds that are comparable to the known bounds in the full information setting, while substantially improving upon the existing bandit results. We experimentally demonstrate the effectiveness of GP-MW in random matrix games, as well as real-world problems of traffic routing and movie recommendation. In our experiments, GP-MW consistently outperforms several baselines, while its performance is often comparable to methods that have access to full information feedback. △ Less

Submitted 28 October, 2019; v1 submitted 18 September, 2019; originally announced September 2019.

arXiv:1904.01882 [pdf, ps, other]

Learning Nash Equilibria in Monotone Games

Authors: Tatiana Tatarenko, Maryam Kamgarpour

Abstract: We consider multi-agent decision making where each agent's cost function depends on all agents' strategies. We propose a distributed algorithm to learn a Nash equilibrium, whereby each agent uses only obtained values of her cost function at each joint played action, lacking any information of the functional form of her cost or other agents' costs or strategy sets. In contrast to past work where co… ▽ More We consider multi-agent decision making where each agent's cost function depends on all agents' strategies. We propose a distributed algorithm to learn a Nash equilibrium, whereby each agent uses only obtained values of her cost function at each joint played action, lacking any information of the functional form of her cost or other agents' costs or strategy sets. In contrast to past work where convergent algorithms required strong monotonicity, we prove algorithm convergence under mere monotonicity assumption. This significantly widens algorithm's applicability, such as to games with linear coupling constraints. △ Less

Submitted 3 April, 2019; originally announced April 2019.

arXiv:1903.00950 [pdf, ps, other]

Bounding Inefficiency of Equilibria in Continuous Actions Games using Submodularity and Curvature

Authors: Pier Giuseppe Sessa, Maryam Kamgarpour, Andreas Krause

Abstract: Games with continuous strategy sets arise in several machine learning problems (e.g. adversarial learning). For such games, simple no-regret learning algorithms exist in several cases and ensure convergence to coarse correlated equilibria (CCE). The efficiency of such equilibria with respect to a social function, however, is not well understood. In this paper, we define the class of valid utility… ▽ More Games with continuous strategy sets arise in several machine learning problems (e.g. adversarial learning). For such games, simple no-regret learning algorithms exist in several cases and ensure convergence to coarse correlated equilibria (CCE). The efficiency of such equilibria with respect to a social function, however, is not well understood. In this paper, we define the class of valid utility games with continuous strategies and provide efficiency bounds for their CCEs. Our bounds rely on the social function being a monotone DR-submodular function. We further refine our bounds based on the curvature of the social function. Furthermore, we extend our efficiency bounds to a class of non-submodular functions that satisfy approximate submodularity properties. Finally, we show that valid utility games with continuous strategies can be designed to maximize monotone DR-submodular functions subject to disjoint constraints with approximation guarantees. The approximation guarantees we derive are based on the efficiency of the equilibria of such games and can improve the existing ones in the literature. We illustrate and validate our results on a budget allocation game and a sensor coverage problem. △ Less

Submitted 3 March, 2019; originally announced March 2019.

arXiv:1811.09646 [pdf, other]

doi 10.1109/TSG.2019.2958710

Core-Selecting Mechanisms in Electricity Markets

Authors: Orcun Karaca, Maryam Kamgarpour

Abstract: Due to its theoretical virtues, several recent works propose the use of the incentive-compatible Vickrey-Clarke-Groves (VCG) mechanism for electricity markets. Coalitions of participants, however, can influence the VCG outcome to obtain higher collective profit. To address this issue, we propose core-selecting mechanisms for their coalition-proofness. We show that core-selecting mechanisms general… ▽ More Due to its theoretical virtues, several recent works propose the use of the incentive-compatible Vickrey-Clarke-Groves (VCG) mechanism for electricity markets. Coalitions of participants, however, can influence the VCG outcome to obtain higher collective profit. To address this issue, we propose core-selecting mechanisms for their coalition-proofness. We show that core-selecting mechanisms generalize the economic rationale of the locational marginal pricing (LMP) mechanism. Namely, these mechanisms are the exact class of mechanisms that ensure the existence of a competitive equilibrium in linear/nonlinear prices. This implies that the LMP mechanism is also core-selecting, and hence coalition-proof. In contrast to the LMP mechanism, core-selecting mechanisms exist for a broad class of electricity markets, such as ones involving nonconvex costs and nonconvex constraint sets. In addition, they can approximate truthfulness without the price-taking assumption of the LMP mechanism. Finally, we show that they are also budget-balanced. Our results are verified with case studies based on optimal power flow test systems and the Swiss reserve market. △ Less

Submitted 23 November, 2018; originally announced November 2018.

Journal ref: IEEE Transactions on Smart Grid, 11(3), 2604 - 2614, 2020

arXiv:1806.05069 [pdf, ps, other]

Minimizing Regret of Bandit Online Optimization in Unconstrained Action Spaces

Authors: Tatiana Tatarenko, Maryam Kamgarpour

Abstract: We consider online convex optimization with a zero-order oracle feedback. In particular, the decision maker does not know the explicit representation of the time-varying cost functions, or their gradients. At each time step, she observes the value of the corresponding cost function evaluated at her chosen action (zero-order oracle). The objective is to minimize the regret, that is, the difference… ▽ More We consider online convex optimization with a zero-order oracle feedback. In particular, the decision maker does not know the explicit representation of the time-varying cost functions, or their gradients. At each time step, she observes the value of the corresponding cost function evaluated at her chosen action (zero-order oracle). The objective is to minimize the regret, that is, the difference between the sum of the costs she accumulates and that of a static optimal action had she known the sequence of cost functions a priori. We present a novel algorithm to minimize regret in unconstrained action spaces. Our algorithm hinges on a classical idea of one-point estimation of the gradients of the cost functions based on their observed values. The algorithm is independent of problem parameters. Letting $T$ denote the number of queries of the zero-order oracle and $n$ the problem dimension, the regret rate achieved is $O(n^{2/3}T^{2/3})$. Moreover, we adapt the presented algorithm to the setting with two-point feedback and demonstrate that the adapted procedure achieves the theoretical lower bound on the regret of $(n^{1/2}T^{1/2})$. △ Less

Submitted 2 May, 2020; v1 submitted 13 June, 2018; originally announced June 2018.

arXiv:1803.11030 [pdf, other]

Exploiting Weak Supermodularity for Coalition-Proof Mechanisms

Authors: Orcun Karaca, Maryam Kamgarpour

Abstract: Under the incentive-compatible Vickrey-Clarke-Groves mechanism, coalitions of participants can influence the auction outcome to obtain higher collective profit. These manipulations were proven to be eliminated if and only if the market objective is supermodular. Nevertheless, several auctions do not satisfy the stringent conditions for supermodularity. These auctions include electricity markets, w… ▽ More Under the incentive-compatible Vickrey-Clarke-Groves mechanism, coalitions of participants can influence the auction outcome to obtain higher collective profit. These manipulations were proven to be eliminated if and only if the market objective is supermodular. Nevertheless, several auctions do not satisfy the stringent conditions for supermodularity. These auctions include electricity markets, which are the main motivation of our study. To characterize nonsupermodular functions, we introduce the supermodularity ratio and the weak supermodularity. We show that these concepts provide us with tight bounds on the profitability of collusion and shill bidding. We then derive an analytical lower bound on the supermodularity ratio. Our results are verified with case studies based on the IEEE test systems. △ Less

Submitted 23 November, 2018; v1 submitted 29 March, 2018; originally announced March 2018.

arXiv:1711.06774 [pdf, other]

doi 10.1109/TAC.2019.2908717

Designing Coalition-Proof Reverse Auctions over Continuous Goods

Authors: Orcun Karaca, Pier Giuseppe Sessa, Neil Walton, Maryam Kamgarpour

Abstract: This paper investigates reverse auctions that involve continuous values of different types of goods, general nonconvex constraints, and second stage costs. We seek to design the payment rules and conditions under which coalitions of participants cannot influence the auction outcome in order to obtain higher collective utility. Under the incentive-compatible Vickrey-Clarke-Groves mechanism, we show… ▽ More This paper investigates reverse auctions that involve continuous values of different types of goods, general nonconvex constraints, and second stage costs. We seek to design the payment rules and conditions under which coalitions of participants cannot influence the auction outcome in order to obtain higher collective utility. Under the incentive-compatible Vickrey-Clarke-Groves mechanism, we show that coalition-proof outcomes are achieved if the submitted bids are convex and the constraint sets are of a polymatroid-type. These conditions, however, do not capture the complexity of the general class of reverse auctions under consideration. By relaxing the property of incentive-compatibility, we investigate further payment rules that are coalition-proof without any extra conditions on the submitted bids and the constraint sets. Since calculating the payments directly for these mechanisms is computationally difficult for auctions involving many participants, we present two computationally efficient methods. Our results are verified with several case studies based on electricity market data. △ Less

Submitted 31 December, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

Journal ref: IEEE Transactions on Automatic Control, 64(11), 4803-4810, 2019

arXiv:1702.08789 [pdf, other]

doi 10.1109/TAC.2018.2849946

Nash and Wardrop equilibria in aggregative games with coupling constraints

Authors: Dario Paccagnan, Basilio Gentile, Francesca Parise, Maryam Kamgarpour, John Lygeros

Abstract: We consider the framework of aggregative games, in which the cost function of each agent depends on his own strategy and on the average population strategy. As first contribution, we investigate the relations between the concepts of Nash and Wardrop equilibrium. By exploiting a characterization of the two equilibria as solutions of variational inequalities, we bound their distance with a decreasin… ▽ More We consider the framework of aggregative games, in which the cost function of each agent depends on his own strategy and on the average population strategy. As first contribution, we investigate the relations between the concepts of Nash and Wardrop equilibrium. By exploiting a characterization of the two equilibria as solutions of variational inequalities, we bound their distance with a decreasing function of the population size. As second contribution, we propose two decentralized algorithms that converge to such equilibria and are capable of co** with constraints coupling the strategies of different agents. Finally, we study the applications of charging of electric vehicles and of route choice on a road network. △ Less

Submitted 30 April, 2018; v1 submitted 28 February, 2017; originally announced February 2017.

Comments: IEEE Trans. on Automatic Control (Accepted without changes). The first three authors contributed equally

arXiv:1611.03044 [pdf, other]

Exploring Vickrey-Clarke-Groves Mechanism for Electricity Markets

Authors: Pier Giuseppe Sessa, Neil Walton, Maryam Kamgarpour

Abstract: Control reserves are power generation or consumption entities that ensure balance of supply and demand of electricity in real-time. In many countries, they are operated through a market mechanism in which entities provide bids. The system operator determines the accepted bids based on an optimization algorithm. We develop the Vickrey-Clarke-Groves (VCG) mechanism for these electricity markets. We… ▽ More Control reserves are power generation or consumption entities that ensure balance of supply and demand of electricity in real-time. In many countries, they are operated through a market mechanism in which entities provide bids. The system operator determines the accepted bids based on an optimization algorithm. We develop the Vickrey-Clarke-Groves (VCG) mechanism for these electricity markets. We show that all advantages of the VCG mechanism including incentive compatibility of the equilibria and efficiency of the outcome can be guaranteed in these markets. Furthermore, we derive conditions to ensure collusion and shill bidding are not profitable. Our results are verified with numerical examples. △ Less

Submitted 21 November, 2016; v1 submitted 9 November, 2016; originally announced November 2016.

Showing 1–29 of 29 results for author: Kamgarpour, M