-
On (Random-order) Online Contention Resolution Schemes for the Matching Polytope of (Bipartite) Graphs
Authors:
Calum MacRury,
Will Ma,
Nathaniel Grammel
Abstract:
Online Contention Resolution Schemes (OCRS's) represent a modern tool for selecting a subset of elements, subject to resource constraints, when the elements are presented to the algorithm sequentially. OCRS's have led to some of the best-known competitive ratio guarantees for online resource allocation problems, with the added benefit of treating different online decisions -- accept/reject, probin…
▽ More
Online Contention Resolution Schemes (OCRS's) represent a modern tool for selecting a subset of elements, subject to resource constraints, when the elements are presented to the algorithm sequentially. OCRS's have led to some of the best-known competitive ratio guarantees for online resource allocation problems, with the added benefit of treating different online decisions -- accept/reject, probing, pricing -- in a unified manner. This paper analyzes OCRS's for resource constraints defined by matchings in graphs, a fundamental structure in combinatorial optimization. We consider two dimensions of variants: the elements being presented in adversarial or random order; and the graph being bipartite or general. We improve the state of the art for all combinations of variants, both in terms of algorithmic guarantees and impossibility results. Some of our algorithmic guarantees are best-known even compared to Contention Resolution Schemes that can choose the order of arrival or are offline. All in all, our results for OCRS directly improve the best-known competitive ratios for online accept/reject, probing, and pricing problems on graphs in a unified manner.
△ Less
Submitted 1 April, 2024; v1 submitted 15 September, 2022;
originally announced September 2022.
-
The Stochastic Boolean Function Evaluation Problem for Symmetric Boolean Functions
Authors:
Dimitrios Gkenosis,
Nathaniel Grammel,
Lisa Hellerstein,
Devorah Kletenik
Abstract:
We give two approximation algorithms solving the Stochastic Boolean Function Evaluation (SBFE) problem for symmetric Boolean functions. The first is an $O(\log n)$-approximation algorithm, based on the submodular goal-value approach of Deshpande, Hellerstein and Kletenik. Our second algorithm, which is simple, is based on the algorithm solving the SBFE problem for $k$-of-$n$ functions, due to Sall…
▽ More
We give two approximation algorithms solving the Stochastic Boolean Function Evaluation (SBFE) problem for symmetric Boolean functions. The first is an $O(\log n)$-approximation algorithm, based on the submodular goal-value approach of Deshpande, Hellerstein and Kletenik. Our second algorithm, which is simple, is based on the algorithm solving the SBFE problem for $k$-of-$n$ functions, due to Salloum, Breuer, and Ben-Dov. It achieves a $(B-1)$ approximation factor, where $B$ is the number of blocks of 0's and 1's in the standard vector representation of the symmetric Boolean function. As part of the design of the first algorithm, we prove that the goal value of any symmetric Boolean function is less than $n(n+1)/2$. Finally, we give an example showing that for symmetric Boolean functions, minimum expected verification cost and minimum expected evaluation cost are not necessarily equal. This contrasts with a previous result, given by Das, Jafarpour, Orlitsky, Pan and Suresh, which showed that equality holds in the unit-cost case.
△ Less
Submitted 4 January, 2022; v1 submitted 16 November, 2021;
originally announced November 2021.
-
Improved Guarantees for Offline Stochastic Matching via new Ordered Contention Resolution Schemes
Authors:
Brian Brubach,
Nathaniel Grammel,
Will Ma,
Aravind Srinivasan
Abstract:
Matching is one of the most fundamental and broadly applicable problems across many domains. In these diverse real-world applications, there is often a degree of uncertainty in the input which has led to the study of stochastic matching models. Here, each edge in the graph has a known, independent probability of existing derived from some prediction. Algorithms must probe edges to determine existe…
▽ More
Matching is one of the most fundamental and broadly applicable problems across many domains. In these diverse real-world applications, there is often a degree of uncertainty in the input which has led to the study of stochastic matching models. Here, each edge in the graph has a known, independent probability of existing derived from some prediction. Algorithms must probe edges to determine existence and match them irrevocably if they exist. Further, each vertex may have a patience constraint denoting how many of its neighboring edges can be probed. We present new ordered contention resolution schemes yielding improved approximation guarantees for some of the foundational problems studied in this area. For stochastic matching with patience constraints in general graphs, we provide a 0.382-approximate algorithm, significantly improving over the previous best 0.31-approximation of Baveja et al. (2018). When the vertices do not have patience constraints, we describe a 0.432-approximate random order probing algorithm with several corollaries such as an improved guarantee for the Prophet Secretary problem under Edge Arrivals. Finally, for the special case of bipartite graphs with unit patience constraints on one of the partitions, we show a 0.632-approximate algorithm that improves on the recent $1/3$-guarantee of Hikima et al. (2021).
△ Less
Submitted 8 August, 2022; v1 submitted 12 June, 2021;
originally announced June 2021.
-
PettingZoo: Gym for Multi-Agent Reinforcement Learning
Authors:
J. K. Terry,
Benjamin Black,
Nathaniel Grammel,
Mario Jayakumar,
Ananth Hari,
Ryan Sullivan,
Luis Santos,
Rodrigo Perez,
Caroline Horsch,
Clemens Dieffendahl,
Niall L. Williams,
Yashas Lokesh,
Praveen Ravi
Abstract:
This paper introduces the PettingZoo library and the accompanying Agent Environment Cycle ("AEC") games model. PettingZoo is a library of diverse sets of multi-agent environments with a universal, elegant Python API. PettingZoo was developed with the goal of accelerating research in Multi-Agent Reinforcement Learning ("MARL"), by making work more interchangeable, accessible and reproducible akin…
▽ More
This paper introduces the PettingZoo library and the accompanying Agent Environment Cycle ("AEC") games model. PettingZoo is a library of diverse sets of multi-agent environments with a universal, elegant Python API. PettingZoo was developed with the goal of accelerating research in Multi-Agent Reinforcement Learning ("MARL"), by making work more interchangeable, accessible and reproducible akin to what OpenAI's Gym library did for single-agent reinforcement learning. PettingZoo's API, while inheriting many features of Gym, is unique amongst MARL APIs in that it's based around the novel AEC games model. We argue, in part through case studies on major problems in popular MARL environments, that the popular game models are poor conceptual models of games commonly used in MARL and accordingly can promote confusing bugs that are hard to detect, and that the AEC games model addresses these problems.
△ Less
Submitted 26 October, 2021; v1 submitted 30 September, 2020;
originally announced September 2020.
-
Agent Environment Cycle Games
Authors:
J K Terry,
Nathaniel Grammel,
Benjamin Black,
Ananth Hari,
Caroline Horsch,
Luis Santos
Abstract:
Partially Observable Stochastic Games (POSGs) are the most general and common model of games used in Multi-Agent Reinforcement Learning (MARL). We argue that the POSG model is conceptually ill suited to software MARL environments, and offer case studies from the literature where this mismatch has led to severely unexpected behavior. In response to this, we introduce the Agent Environment Cycle G…
▽ More
Partially Observable Stochastic Games (POSGs) are the most general and common model of games used in Multi-Agent Reinforcement Learning (MARL). We argue that the POSG model is conceptually ill suited to software MARL environments, and offer case studies from the literature where this mismatch has led to severely unexpected behavior. In response to this, we introduce the Agent Environment Cycle Games (AEC Games) model, which is more representative of software implementation. We then prove it's as an equivalent model to POSGs. The AEC games model is also uniquely useful in that it can elegantly represent both all forms of MARL environments, whereas for example POSGs cannot elegantly represent strictly turn based games like chess.
△ Less
Submitted 1 May, 2021; v1 submitted 28 September, 2020;
originally announced September 2020.
-
Stochastic Optimization and Learning for Two-Stage Supplier Problems
Authors:
Brian Brubach,
Nathaniel Grammel,
David G. Harris,
Aravind Srinivasan,
Leonidas Tsepenekas,
Anil Vullikanti
Abstract:
The main focus of this paper is radius-based (supplier) clustering in the two-stage stochastic setting with recourse, where the inherent stochasticity of the model comes in the form of a budget constraint. In addition to the standard (homogeneous) setting where all clients must be within a distance $R$ of the nearest facility, we provide results for the more general problem where the radius demand…
▽ More
The main focus of this paper is radius-based (supplier) clustering in the two-stage stochastic setting with recourse, where the inherent stochasticity of the model comes in the form of a budget constraint. In addition to the standard (homogeneous) setting where all clients must be within a distance $R$ of the nearest facility, we provide results for the more general problem where the radius demands may be inhomogeneous (i.e., different for each client). We also explore a number of variants where additional constraints are imposed on the first-stage decisions, specifically matroid and multi-knapsack constraints, and provide results for these settings.
We derive results for the most general distributional setting, where there is only black-box access to the underlying distribution. To accomplish this, we first develop algorithms for the polynomial scenarios setting; we then employ a novel scenario-discarding variant of the standard Sample Average Approximation (SAA) method, which crucially exploits properties of the restricted-case algorithms. We note that the scenario-discarding modification to the SAA method is necessary in order to optimize over the radius.
△ Less
Submitted 7 April, 2024; v1 submitted 7 August, 2020;
originally announced August 2020.
-
Multi-Agent Informational Learning Processes
Authors:
J. K. Terry,
Nathaniel Grammel
Abstract:
We introduce a new mathematical model of multi-agent reinforcement learning, the Multi-Agent Informational Learning Processor "MAILP" model. The model is based on the notion that agents have policies for a certain amount of information, models how this information iteratively evolves and propagates through many agents. This model is very general, and the only meaningful assumption made is that l…
▽ More
We introduce a new mathematical model of multi-agent reinforcement learning, the Multi-Agent Informational Learning Processor "MAILP" model. The model is based on the notion that agents have policies for a certain amount of information, models how this information iteratively evolves and propagates through many agents. This model is very general, and the only meaningful assumption made is that learning for individual agents progressively slows over time.
△ Less
Submitted 25 February, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning
Authors:
J. K. Terry,
Nathaniel Grammel,
Sanghyun Son,
Benjamin Black,
Aakriti Agrawal
Abstract:
Parameter sharing, where each agent independently learns a policy with fully shared parameters between all policies, is a popular baseline method for multi-agent deep reinforcement learning. Unfortunately, since all agents share the same policy network, they cannot learn different policies or tasks. This issue has been circumvented experimentally by adding an agent-specific indicator signal to obs…
▽ More
Parameter sharing, where each agent independently learns a policy with fully shared parameters between all policies, is a popular baseline method for multi-agent deep reinforcement learning. Unfortunately, since all agents share the same policy network, they cannot learn different policies or tasks. This issue has been circumvented experimentally by adding an agent-specific indicator signal to observations, which we term "agent indication". Agent indication is limited, however, in that without modification it does not allow parameter sharing to be applied to environments where the action spaces and/or observation spaces are heterogeneous. This work formalizes the notion of agent indication and proves that it enables convergence to optimal policies for the first time. Next, we formally introduce methods to extend parameter sharing to learning in heterogeneous observation and action spaces, and prove that these methods allow for convergence to optimal policies. Finally, we experimentally confirm that the methods we introduce function empirically, and conduct a wide array of experiments studying the empirical efficacy of many different agent indication schemes for image based observation spaces.
△ Less
Submitted 31 October, 2023; v1 submitted 27 May, 2020;
originally announced May 2020.
-
Online Matching Frameworks under Stochastic Rewards, Product Ranking, and Unknown Patience
Authors:
Brian Brubach,
Nathaniel Grammel,
Will Ma,
Aravind Srinivasan
Abstract:
We study generalizations of online bipartite matching in which each arriving vertex (customer) views a ranked list of offline vertices (products) and matches to (purchases) the first one they deem acceptable. The number of products that the customer has patience to view can be stochastic and dependent on the products seen. We develop a framework that views the interaction with each customer as an…
▽ More
We study generalizations of online bipartite matching in which each arriving vertex (customer) views a ranked list of offline vertices (products) and matches to (purchases) the first one they deem acceptable. The number of products that the customer has patience to view can be stochastic and dependent on the products seen. We develop a framework that views the interaction with each customer as an abstract resource consumption process, and derive new results for these online matching problems under the adversarial, non-stationary, and IID arrival models, assuming we can (approximately) solve the product ranking problem for each single customer. To that end, we show new results for product ranking under two cascade-click models: an optimal algorithm when each item has its own hazard rate for making the customer depart, and a 1/2-approximate algorithm when the customer has a general item-independent patience distribution. We also present a constant-factor 0.027-approximate algorithm in a new model where items are not initially available and arrive over time. We complement these positive results by presenting three additional negative results relating to these problems.
△ Less
Submitted 26 June, 2023; v1 submitted 8 July, 2019;
originally announced July 2019.
-
The Stochastic Score Classification Problem
Authors:
Dimitrios Gkenosis,
Nathaniel Grammel,
Lisa Hellerstein,
Devorah Kletenik
Abstract:
Consider the following Stochastic Score Classification Problem. A doctor is assessing a patient's risk of develo** a certain disease, and can perform $n$ tests on the patient. Each test has a binary outcome, positive or negative. A positive test result is an indication of risk, and a patient's score is the total number of positive test results. The doctor needs to classify the patient into one o…
▽ More
Consider the following Stochastic Score Classification Problem. A doctor is assessing a patient's risk of develo** a certain disease, and can perform $n$ tests on the patient. Each test has a binary outcome, positive or negative. A positive test result is an indication of risk, and a patient's score is the total number of positive test results. The doctor needs to classify the patient into one of $B$ risk classes, depending on the score (e.g., LOW, MEDIUM, and HIGH risk). Each of these classes corresponds to a contiguous range of scores. Test $i$ has probability $p_i$ of being positive, and it costs $c_i$ to perform the test. To reduce costs, instead of performing all tests, the doctor will perform them sequentially and stop testing when it is possible to determine the risk category for the patient. The problem is to determine the order in which the doctor should perform the tests, so as to minimize the expected testing cost. We provide approximation algorithms for adaptive and non-adaptive versions of this problem, and pose a number of open questions.
△ Less
Submitted 27 June, 2018;
originally announced June 2018.
-
Scenario Submodular Cover
Authors:
Nathaniel Grammel,
Lisa Hellerstein,
Devorah Kletenik,
Patrick Lin
Abstract:
Many problems in Machine Learning can be modeled as submodular optimization problems. Recent work has focused on stochastic or adaptive versions of these problems. We consider the Scenario Submodular Cover problem, which is a counterpart to the Stochastic Submodular Cover problem studied by Golovin and Krause. In Scenario Submodular Cover, the goal is to produce a cover with minimum expected cost,…
▽ More
Many problems in Machine Learning can be modeled as submodular optimization problems. Recent work has focused on stochastic or adaptive versions of these problems. We consider the Scenario Submodular Cover problem, which is a counterpart to the Stochastic Submodular Cover problem studied by Golovin and Krause. In Scenario Submodular Cover, the goal is to produce a cover with minimum expected cost, where the expectation is with respect to an empirical joint distribution, given as input by a weighted sample of realizations. In contrast, in Stochastic Submodular Cover, the variables of the input distribution are assumed to be independent, and the distribution of each variable is given as input. Building on algorithms developed by Cicalese et al. and Golovin and Krause for related problems, we give two approximation algorithms for Scenario Submodular Cover over discrete distributions. The first achieves an approximation factor of O(log Qm), where m is the size of the sample and Q is the goal utility. The second, simpler algorithm achieves an approximation bound of O(log QW), where Q is the goal utility and W is the sum of the integer weights. (Both bounds assume an integer-valued utility function.) Our results yield approximation bounds for other problems involving non-independent distributions that are explicitly specified by their support.
△ Less
Submitted 10 March, 2016;
originally announced March 2016.