Skip to main content

Showing 1–50 of 62 results for author: Slivkins, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.15371  [pdf, other

    cs.LG cs.AI cs.CL

    Can large language models explore in-context?

    Authors: Akshay Krishnamurthy, Keegan Harris, Dylan J. Foster, Cyril Zhang, Aleksandrs Slivkins

    Abstract: We investigate the extent to which contemporary Large Language Models (LLMs) can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  2. arXiv:2403.00188  [pdf, ps, other

    cs.LG cs.GT

    Impact of Decentralized Learning on Player Utilities in Stackelberg Games

    Authors: Kate Donahue, Nicole Immorlica, Meena Jagadeesan, Brendan Lucier, Aleksandrs Slivkins

    Abstract: When deployed in the world, a learning agent such as a recommender system or a chatbot often repeatedly interacts with another learning agent (such as a user) over time. In many such two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. To better understand such cases, we examine the learning dynamics of the two-agent system and the implicatio… ▽ More

    Submitted 21 June, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: To appear at ICML 2024; this is the full version

  3. arXiv:2402.13338  [pdf, ps, other

    cs.LG econ.TH

    Incentivized Exploration via Filtered Posterior Sampling

    Authors: Anand Kalvit, Aleksandrs Slivkins, Yonatan Gur

    Abstract: We study "incentivized exploration" (IE) in social learning problems where the principal (a recommendation algorithm) can leverage information asymmetry to incentivize sequentially-arriving agents to take exploratory actions. We identify posterior sampling, an algorithmic approach that is well known in the multi-armed bandits literature, as a general-purpose solution for IE. In particular, we expa… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  4. arXiv:2312.07929  [pdf, other

    cs.GT cs.LG

    Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits with Strategic Agents

    Authors: Seyed A. Esmaeili, Suho Shin, Aleksandrs Slivkins

    Abstract: We consider a variant of the stochastic multi-armed bandit problem. Specifically, the arms are strategic agents who can improve their rewards or absorb them. The utility of an agent increases if she is pulled more or absorbs more of her rewards but decreases if she spends more effort improving her rewards. Agents have heterogeneous properties, specifically having different means and able to improv… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

  5. arXiv:2311.18138  [pdf, other

    cs.GT cs.AI econ.TH

    Algorithmic Persuasion Through Simulation

    Authors: Keegan Harris, Nicole Immorlica, Brendan Lucier, Aleksandrs Slivkins

    Abstract: We study a Bayesian persuasion game where a sender wants to persuade a receiver to take a binary action, such as purchasing a product. The sender is informed about the (binary) state of the world, such as whether the quality of the product is high or low, but only has limited information about the receiver's beliefs and utilities. Motivated by customer surveys, user studies, and recent advances in… ▽ More

    Submitted 11 June, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

  6. arXiv:2307.07374  [pdf, ps, other

    cs.GT econ.TH

    Strategic Budget Selection in a Competitive Autobidding World

    Authors: Yiding Feng, Brendan Lucier, Aleksandrs Slivkins

    Abstract: We study a game played between advertisers in an online ad platform. The platform sells ad impressions by first-price auction and provides autobidding algorithms that optimize bids on each advertiser's behalf, subject to advertiser constraints such as budgets. Crucially, these constraints are strategically chosen by the advertisers. The chosen constraints define an "inner'' budget-pacing game for… ▽ More

    Submitted 13 November, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

  7. arXiv:2306.07923  [pdf, other

    cs.LG

    Oracle-Efficient Pessimism: Offline Policy Optimization in Contextual Bandits

    Authors: Lequn Wang, Akshay Krishnamurthy, Aleksandrs Slivkins

    Abstract: We consider offline policy optimization (OPO) in contextual bandits, where one is given a fixed dataset of logged interactions. While pessimistic regularizers are typically used to mitigate distribution shift, prior implementations thereof are either specialized or computationally inefficient. We present the first general oracle-efficient algorithm for pessimistic OPO: it reduces to supervised lea… ▽ More

    Submitted 25 October, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

  8. arXiv:2302.07425  [pdf, other

    cs.GT cs.DS cs.LG

    Bandit Social Learning: Exploration under Myopic Behavior

    Authors: Kiarash Banihashem, MohammadTaghi Hajiaghayi, Suho Shin, Aleksandrs Slivkins

    Abstract: We study social learning dynamics motivated by reviews on online platforms. The agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We allow a wide range of myopic behaviors that are consistent with (parameterized) confidence intervals for the arms' expected rewards. We derive stark learning failures for any such behavior… ▽ More

    Submitted 3 November, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: Extended version of NeurIPS 2023 paper titled "Bandit Social Learning under Myopic Behavior"

  9. arXiv:2301.13306  [pdf, other

    cs.GT cs.LG

    Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics

    Authors: Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, Mengxiao Zhang

    Abstract: We study a game between autobidding algorithms that compete in an online advertising platform. Each autobidder is tasked with maximizing its advertiser's total value over multiple rounds of a repeated auction, subject to budget and return-on-investment constraints. We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret.… ▽ More

    Submitted 11 June, 2024; v1 submitted 30 January, 2023; originally announced January 2023.

  10. arXiv:2211.07484  [pdf, ps, other

    cs.LG stat.ML

    Contextual Bandits with Packing and Covering Constraints: A Modular Lagrangian Approach via Regression

    Authors: Aleksandrs Slivkins, Xingyu Zhou, Karthik Abinav Sankararaman, Dylan J. Foster

    Abstract: We consider contextual bandits with linear constraints (CBwLC), a variant of contextual bandits in which the algorithm consumes multiple resources subject to linear constraints on total consumption. This problem generalizes contextual bandits with knapsacks (CBwK), allowing for packing and covering constraints, as well as positive and negative resource consumption. We provide the first algorithm f… ▽ More

    Submitted 29 June, 2024; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: A preliminary version of this paper, authored by A. Slivkins, K.A. Sankararaman and D.J. Foster, has been published at COLT 2023. The present version features an important improvement, due to Xingyu Zhou. Specifically, the $\sqrt{T}$-regret result in Theorem 3.6(a) holds under a much weaker assumption, and is now positioned as the main guarantee

  11. arXiv:2206.00494  [pdf, ps, other

    cs.LG

    Incentivizing Combinatorial Bandit Exploration

    Authors: Xinyan Hu, Dung Daniel Ngo, Aleksandrs Slivkins, Zhiwei Steven Wu

    Abstract: Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. The users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users. All published work on this problem,… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: 9 pages of main text, 21 pages in total

  12. arXiv:2205.08674  [pdf, ps, other

    cs.GT

    Budget Pacing in Repeated Auctions: Regret and Efficiency without Convergence

    Authors: Jason Gaitonde, Yingkai Li, Bar Light, Brendan Lucier, Aleksandrs Slivkins

    Abstract: We study the aggregate welfare and individual regret guarantees of dynamic \emph{pacing algorithms} in the context of repeated auctions with budgets. Such algorithms are commonly used as bidding agents in Internet advertising platforms. We show that when agents simultaneously apply a natural form of gradient-based pacing, the liquid welfare obtained over the course of the learning dynamics is… ▽ More

    Submitted 8 September, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

  13. arXiv:2203.01213  [pdf, ps, other

    cs.GT cs.DS

    Truthful Online Scheduling of Cloud Workloads under Uncertainty

    Authors: Moshe Babaioff, Ronny Lempel, Brendan Lucier, Ishai Menache, Aleksandrs Slivkins, Sam Chiu-Wai Wong

    Abstract: Cloud computing customers often submit repeating jobs and computation pipelines on \emph{approximately} regular schedules, with arrival and running times that exhibit variance. This pattern, typical of training tasks in machine learning, allows customers to partially predict future job requirements. We develop a model of cloud computing platforms that receive statements of work (SoWs) in an online… ▽ More

    Submitted 2 March, 2022; originally announced March 2022.

    Comments: To appear in TheWebConf 2022

  14. arXiv:2202.06191  [pdf, other

    cs.GT econ.TH

    Exploration and Incentivizing Participation in Clinical Trials

    Authors: Yingkai Li, Aleksandrs Slivkins

    Abstract: Participation incentives a well-known issue inhibiting randomized clinical trials (RCTs). We frame this issue as a non-standard exploration-exploitation tradeoff: an RCT would like to explore as uniformly as possible, whereas each patient prefers "exploitation", i.e., treatments that seem best. We incentivize participation by leveraging information asymmetry between the trial and the patients. We… ▽ More

    Submitted 20 May, 2024; v1 submitted 12 February, 2022; originally announced February 2022.

  15. arXiv:2110.14874  [pdf, other

    cs.LG stat.ML

    Sayer: Using Implicit Feedback to Optimize System Policies

    Authors: Mathias Lécuyer, Sang Hoon Kim, Mihir Nanavati, Junchen Jiang, Siddhartha Sen, Amit Sharma, Aleksandrs Slivkins

    Abstract: We observe that many system policies that make threshold decisions involving a resource (e.g., time, memory, cores) naturally reveal additional, or implicit feedback. For example, if a system waits X min for an event to occur, then it automatically learns what would have happened if it waited <X min, because time has a cumulative property. This feedback tells us about alternative decisions, and ca… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  16. arXiv:2103.00360  [pdf, other

    cs.LG cs.GT

    Exploration and Incentives in Reinforcement Learning

    Authors: Max Simchowitz, Aleksandrs Slivkins

    Abstract: How do you incentivize self-interested agents to $\textit{explore}$ when they prefer to $\textit{exploit}$? We consider complex exploration problems, where each agent faces the same (but unknown) MDP. In contrast with traditional formulations of reinforcement learning, agents control the choice of policies, whereas an algorithm can only issue recommendations. However, the algorithm controls the fl… ▽ More

    Submitted 18 February, 2023; v1 submitted 27 February, 2021; originally announced March 2021.

  17. arXiv:2007.12653  [pdf, other

    cs.GT cs.DS

    Beating Greedy For Approximating Reserve Prices in Multi-Unit VCG Auctions

    Authors: Mahsa Derakhshan, David M. Pennock, Aleksandrs Slivkins

    Abstract: We study the problem of finding personalized reserve prices for unit-demand buyers in multi-unit eager VCG auctions with correlated buyers. The input to this problem is a dataset of submitted bids of $n$ buyers in a set of auctions. The goal is to find a vector of reserve prices, one for each buyer, that maximizes the total revenue across all auctions. Roughgarden and Wang (2016) showed that thi… ▽ More

    Submitted 24 July, 2020; originally announced July 2020.

  18. arXiv:2007.10144  [pdf, other

    cs.GT cs.LG econ.TH

    Competing Bandits: The Perils of Exploration Under Competition

    Authors: Guy Aridor, Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu

    Abstract: Most online platforms strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We study the interplay between exploration and competition: how such platforms balance the exploration for learning and the competition for users. Here users play three distinct roles: they are customers that generate r… ▽ More

    Submitted 4 December, 2022; v1 submitted 20 July, 2020; originally announced July 2020.

    Comments: merged and extended version of arXiv:1702.08533 and arXiv:1902.05590

  19. arXiv:2006.12367  [pdf, other

    cs.LG cs.DS cs.GT stat.ML

    Adaptive Discretization for Adversarial Lipschitz Bandits

    Authors: Chara Podimata, Aleksandrs Slivkins

    Abstract: Lipschitz bandits is a prominent version of multi-armed bandits that studies large, structured action spaces such as the [0,1] interval, where similar actions are guaranteed to have similar rewards. A central theme here is the adaptive discretization of the action space, which gradually ``zooms in'' on the more promising regions thereof. The goal is to take advantage of ``nicer'' problem instances… ▽ More

    Submitted 12 August, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: A short version of this paper appears in COLT21

  20. arXiv:2006.06040  [pdf, other

    cs.LG stat.ML

    Efficient Contextual Bandits with Continuous Actions

    Authors: Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins

    Abstract: We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure. Our reduction-style algorithm composes with most supervised learning representations. We prove that it works in a general sense and verify the new functionality with large-scale experiments.

    Submitted 3 December, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: To appear at NeurIPS 2020

  21. arXiv:2006.05051  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Constrained episodic reinforcement learning in concave-convex and knapsack settings

    Authors: Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

    Abstract: We propose an algorithm for tabular episodic reinforcement learning with constraints. We provide a modular analysis with strong theoretical guarantees for settings with concave rewards and convex constraints, and for settings with hard constraints (knapsacks). Most of the previous work in constrained reinforcement learning is limited to linear constraints, and the remaining work focuses on either… ▽ More

    Submitted 5 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: The NeurIPS 2020 version of this paper includes a small bug, leading to an incorrect dependence on H in Theorem 3.4. This version fixes it by adjusting Eq. (9), Theorem 3.4 and the relevant proofs. Changes in the main text are noted in red. Changes in the appendix are limited to Appendices B.1, B.5, and B.6 and the statement of Lemma F.3

  22. arXiv:2005.10624  [pdf, ps, other

    cs.LG stat.ML

    Greedy Algorithm almost Dominates in Smoothed Contextual Bandits

    Authors: Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei Steven Wu

    Abstract: Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users in order to gain information that will lead to better decisions in the future. While necessary in the worst case, explicit exploration has a number of disadvantages compared to the greedy algorithm that alway… ▽ More

    Submitted 27 December, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

    Comments: Results in this paper, without any proofs, have been announced in an extended abstract (Raghavan et al., 2018a), and fleshed out in the technical report (Raghavan et al., 2018b [arXiv:1806.00543]). This manuscript covers a subset of results from Raghavan et al. (2018a,b), focusing on the greedy algorithm, and is streamlined accordingly

  23. arXiv:2002.00558  [pdf, ps, other

    cs.GT cs.DS cs.LG

    The Price of Incentivizing Exploration: A Characterization via Thompson Sampling and Sample Complexity

    Authors: Mark Sellke, Aleksandrs Slivkins

    Abstract: We consider incentivized exploration: a version of multi-armed bandits where the choice of arms is controlled by self-interested agents, and the algorithm can only issue recommendations. The algorithm controls the flow of information, and the information asymmetry can incentivize the agents to explore. Prior work achieves optimal regret rates up to multiplicative factors that become arbitrarily la… ▽ More

    Submitted 12 June, 2022; v1 submitted 2 February, 2020; originally announced February 2020.

  24. arXiv:2002.00253  [pdf, other

    cs.LG cs.DS stat.ML

    Bandits with Knapsacks beyond the Worst-Case

    Authors: Karthik Abinav Sankararaman, Aleksandrs Slivkins

    Abstract: Bandits with Knapsacks (BwK) is a general model for multi-armed bandits under supply/budget constraints. While worst-case regret bounds for BwK are well-understood, we present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates. Second, we consider "simple regret"… ▽ More

    Submitted 28 December, 2021; v1 submitted 1 February, 2020; originally announced February 2020.

    Comments: The initial version, titled "Advances in Bandits with Knapsacks", was published on arxiv.longhoe.net in Jan'20. The present version improves both upper and lower bounds, deriving Theorem 3.2(ii) and Theorem 4.2. Moreover, it simplifies the algorithm and analysis in the main result, and fixes several issues in the lower bounds

  25. arXiv:1911.08689  [pdf, ps, other

    cs.LG cs.AI cs.DS stat.ML

    Corruption-robust exploration in episodic reinforcement learning

    Authors: Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

    Abstract: We initiate the study of multi-stage episodic reinforcement learning under adversarial corruptions in both the rewards and the transition probabilities of the underlying system extending recent results for the special case of stochastic bandits. We provide a framework which modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on "optimism in the face of u… ▽ More

    Submitted 31 October, 2023; v1 submitted 19 November, 2019; originally announced November 2019.

    Comments: Accepted in Mathematics of Operations Research. Preliminary version was accepted for presentation at COLT'21

  26. arXiv:1904.07272  [pdf, other

    cs.LG cs.AI cs.DS stat.ML

    Introduction to Multi-Armed Bandits

    Authors: Aleksandrs Slivkins

    Abstract: Multi-armed bandits a simple but very powerful framework for algorithms that make decisions over time under uncertainty. An enormous body of work has accumulated over the years, covered in several books and surveys. This book provides a more introductory, textbook-like treatment of the subject. Each chapter tackles a particular line of work, providing a self-contained, teachable technical introduc… ▽ More

    Submitted 3 April, 2024; v1 submitted 15 April, 2019; originally announced April 2019.

    Comments: Published with Foundations and Trends(R) in Machine Learning, November 2019. The present version is a revision of the "Foundations and Trends" publication. It contains numerous edits for presentation and accuracy (based in part on readers' feedback), updated and expanded literature reviews, and some new exercises

  27. arXiv:1902.07119  [pdf, ps, other

    cs.GT cs.LG

    Bayesian Exploration with Heterogeneous Agents

    Authors: Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, Zhiwei Steven Wu

    Abstract: It is common in recommendation systems that users both consume and produce information as they make strategic choices under uncertainty. While a social planner would balance "exploration" and "exploitation" using a multi-armed bandit algorithm, users' incentives may tilt this balance in favor of exploitation. We consider Bayesian Exploration: a simple model in which the recommendation system (the… ▽ More

    Submitted 19 February, 2019; originally announced February 2019.

  28. arXiv:1902.05590  [pdf, other

    cs.GT cs.LG

    The Perils of Exploration under Competition: A Computational Modeling Approach

    Authors: Guy Aridor, Kevin Liu, Aleksandrs Slivkins, Zhiwei Steven Wu

    Abstract: We empirically study the interplay between exploration and competition. Systems that learn from interactions with users often engage in exploration: making potentially suboptimal decisions in order to acquire new information for future decisions. However, when multiple systems are competing for the same market of users, exploration may hurt a system's reputation in the near term, with adverse comp… ▽ More

    Submitted 1 May, 2019; v1 submitted 14 February, 2019; originally announced February 2019.

    Comments: This is a preprint of an article accepted for EC 2019

  29. arXiv:1902.01520  [pdf, other

    stat.ML cs.LG

    Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

    Authors: Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

    Abstract: We study contextual bandit learning with an abstract policy class and continuous action space. We obtain two qualitatively different regret bounds: one competes with a smoothed version of the policy class under no continuity assumptions, while the other requires standard Lipschitz assumptions. Both bounds exhibit data-dependent "zooming" behavior and, with no tuning, yield improved guarantees for… ▽ More

    Submitted 20 June, 2020; v1 submitted 4 February, 2019; originally announced February 2019.

    Comments: 41 pages, 1 figure, preliminary version in COLT 2019

  30. arXiv:1811.11881  [pdf, other

    cs.DS cs.LG stat.ML

    Adversarial Bandits with Knapsacks

    Authors: Nicole Immorlica, Karthik Abinav Sankararaman, Robert Schapire, Aleksandrs Slivkins

    Abstract: We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions… ▽ More

    Submitted 6 March, 2023; v1 submitted 28 November, 2018; originally announced November 2018.

    Comments: The extended abstract appeared in FOCS 2019. The definitive version was published in JACM '22. V8 is the latest version with all technical changes. Subsequent versions fixes minor LATEX presentation issues

  31. arXiv:1811.06026  [pdf, other

    cs.GT cs.DS cs.LG

    Incentivizing Exploration with Selective Data Disclosure

    Authors: Nicole Immorlica, Jieming Mao, Aleksandrs Slivkins, Zhiwei Steven Wu

    Abstract: We propose and design recommendation systems that incentivize efficient exploration. Agents arrive sequentially, choose actions and receive rewards, drawn from fixed but unknown action-specific distributions. The recommendation system presents each agent with actions and rewards from a subsequence of past agents, chosen ex ante. Thus, the agents engage in sequential social learning, moderated by t… ▽ More

    Submitted 25 February, 2023; v1 submitted 14 November, 2018; originally announced November 2018.

  32. arXiv:1806.00543  [pdf, ps, other

    cs.LG cs.CY stat.ML

    The Externalities of Exploration and How Data Diversity Helps Exploitation

    Authors: Manish Raghavan, Aleksandrs Slivkins, Jennifer Wortman Vaughan, Zhiwei Steven Wu

    Abstract: Online learning algorithms, widely used to power search and content optimization on the web, must balance exploration and exploitation, potentially sacrificing the experience of current users for information that will lead to better decisions in the future. Recently, concerns have been raised about whether the process of exploration could be viewed as unfair, placing too much burden on certain ind… ▽ More

    Submitted 2 July, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

  33. arXiv:1706.05711  [pdf, other

    cs.GT

    A Polynomial Time Algorithm for Spatio-Temporal Security Games

    Authors: Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Aleksandrs Slivkins

    Abstract: An ever-important issue is protecting infrastructure and other valuable targets from a range of threats from vandalism to theft to piracy to terrorism. The "defender" can rarely afford the needed resources for a 100% protection. Thus, the key question is, how to provide the best protection using the limited available resources. We study a practically important class of security games that is playe… ▽ More

    Submitted 18 June, 2017; originally announced June 2017.

  34. arXiv:1705.08110  [pdf, other

    cs.LG

    Combinatorial Semi-Bandits with Knapsacks

    Authors: Karthik Abinav Sankararaman, Aleksandrs Slivkins

    Abstract: We unify two prominent lines of work on multi-armed bandits: bandits with knapsacks (BwK) and combinatorial semi-bandits. The former concerns limited "resources" consumed by the algorithm, e.g., limited supply in dynamic pricing. The latter allows a huge number of actions but assumes combinatorial structure and additional feedback to make the problem tractable. We define a common generalization, s… ▽ More

    Submitted 20 February, 2018; v1 submitted 23 May, 2017; originally announced May 2017.

  35. arXiv:1702.08533  [pdf, ps, other

    cs.GT cs.LG

    Competing Bandits: Learning under Competition

    Authors: Yishay Mansour, Aleksandrs Slivkins, Zhiwei Steven Wu

    Abstract: Most modern systems strive to learn from interactions with users, and many engage in exploration: making potentially suboptimal choices for the sake of acquiring new information. We initiate a study of the interplay between exploration and competition--how such systems balance the exploration for learning and the competition for users. Here the users play three distinct roles: they are customers t… ▽ More

    Submitted 19 November, 2017; v1 submitted 27 February, 2017; originally announced February 2017.

  36. arXiv:1607.05397  [pdf, ps, other

    cs.DS cs.GT cs.LG

    Multidimensional Dynamic Pricing for Welfare Maximization

    Authors: Aaron Roth, Aleksandrs Slivkins, Jonathan Ullman, Zhiwei Steven Wu

    Abstract: We study the problem of a seller dynamically pricing $d$ distinct types of indivisible goods, when faced with the online arrival of unit-demand buyers drawn independently from an unknown distribution. The goods are not in limited supply, but can only be produced at a limited rate and are costly to produce. The seller observes only the bundle of goods purchased at each day, but nothing else about t… ▽ More

    Submitted 10 June, 2017; v1 submitted 19 July, 2016; originally announced July 2016.

  37. arXiv:1606.03966  [pdf, other

    cs.LG cs.DC

    Making Contextual Decisions with Low Technical Debt

    Authors: Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, Alex Slivkins

    Abstract: Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Reinforcement-based learning algorithms such as contextual bandits can be very effective in these settings, but applying them in practice is fraught with technical debt, and no general system exists that supports them completely. We address this and create the fi… ▽ More

    Submitted 9 May, 2017; v1 submitted 13 June, 2016; originally announced June 2016.

  38. arXiv:1602.07570  [pdf, ps, other

    cs.GT cs.DS cs.LG

    Bayesian Exploration: Incentivizing Exploration in Bayesian Games

    Authors: Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis, Zhiwei Steven Wu

    Abstract: We consider a ubiquitous scenario in the Internet economy when individual decision-makers (henceforth, agents) both produce and consume information as they make strategic choices in an uncertain environment. This creates a three-way tradeoff between exploration (trying out insufficiently explored alternatives to help others in the future), exploitation (making optimal decisions given the informati… ▽ More

    Submitted 7 April, 2021; v1 submitted 24 February, 2016; originally announced February 2016.

    Comments: All revisions focused on presentation; all results (except Appendix C) have been present since the initial version

  39. Incentivizing High Quality Crowdwork

    Authors: Chien-Ju Ho, Aleksandrs Slivkins, Siddharth Suri, Jennifer Wortman Vaughan

    Abstract: We study the causal effects of financial incentives on the quality of crowdwork. We focus on performance-based payments (PBPs), bonus payments awarded to workers for producing high quality work. We design and run randomized behavioral experiments on the popular crowdsourcing platform Amazon Mechanical Turk with the goal of understanding when, where, and why PBPs help, identifying properties of the… ▽ More

    Submitted 19 March, 2015; originally announced March 2015.

    Comments: This is a preprint of an Article accepted for publication in WWW \c{opyright} 2015 International World Wide Web Conference Committee

  40. arXiv:1502.06362  [pdf, other

    cs.LG

    Contextual Dueling Bandits

    Authors: Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi

    Abstract: We consider the problem of learning to choose actions using contextual information when provided with limited feedback in the form of relative pairwise comparisons. We study this problem in the dueling-bandits framework of Yue et al. (2009), which we extend to incorporate context. Roughly, the learner's goal is to find the best policy, or way of behaving, in some space of policies, although "best"… ▽ More

    Submitted 13 June, 2015; v1 submitted 23 February, 2015; originally announced February 2015.

    Comments: 25 pages, 4 figures, Published at COLT 2015

  41. arXiv:1502.04147  [pdf, ps, other

    cs.GT

    Bayesian Incentive-Compatible Bandit Exploration

    Authors: Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis

    Abstract: Individual decision-makers consume information revealed by the previous decision makers, and produce information that may help in future decisions. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as in other domains such as medical decisions. Each decision-maker would individually prefer to "exploit": select an action with the highest expected reward given h… ▽ More

    Submitted 2 May, 2019; v1 submitted 13 February, 2015; originally announced February 2015.

    Comments: An extended abstract of this paper has been published in ACM EC 2015. This version contains complete proofs, revamped introductory sections (incl. a discussion of potential applications to medical trials), and thoroughly revised and streamlined presentation of the technical material. Two major extensions are fleshed out, whereas they were only informally described in the conference version

  42. arXiv:1411.0149  [pdf, other

    cs.AI cs.DS

    How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels

    Authors: Ittai Abraham, Omar Alonso, Vasilis Kandylas, Rajesh Patel, Steven Shelford, Aleksandrs Slivkins

    Abstract: Crowdsourcing has been part of the IR toolbox as a cheap and fast mechanism to obtain labels for system development and evaluation. Successful deployment of crowdsourcing at scale involves adjusting many variables, a very important one being the number of workers needed per human intelligence task (HIT). We consider the crowdsourcing task of learning the answer to simple multiple-choice HITs, whic… ▽ More

    Submitted 19 May, 2016; v1 submitted 1 November, 2014; originally announced November 2014.

    Comments: SIGIR 2016

  43. arXiv:1405.2875  [pdf, ps, other

    cs.DS cs.GT cs.LG

    Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems

    Authors: Chien-Ju Ho, Aleksandrs Slivkins, Jennifer Wortman Vaughan

    Abstract: Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the task's requester, and may be adjusted based on the quality of the completed work, for example, through the use of "bonus" payments. In this paper, we study the requester's problem of dynamically adjusting quality-contingent paymen… ▽ More

    Submitted 2 September, 2015; v1 submitted 12 May, 2014; originally announced May 2014.

    Comments: This is the full version of a paper in the ACM Conference on Economics and Computation (ACM-EC), 2014

  44. arXiv:1402.6779  [pdf, ps, other

    cs.LG cs.DS cs.GT

    Resourceful Contextual Bandits

    Authors: Ashwinkumar Badanidiyuru, John Langford, Aleksandrs Slivkins

    Abstract: We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items. We design the first algorithm for solving these problems that handles constrained resources other than time, and improves over a trivial reduction to the non-contextual case. We consider very general settings for both contextual bandits… ▽ More

    Submitted 31 July, 2015; v1 submitted 26 February, 2014; originally announced February 2014.

    Comments: This is the full version of a paper in COLT 2014. Version history: (v2) Added some details to one of the proofs, (v3) a big revision following comments from COLT reviewers (but no new results), (v4) edits in related work, minor edits elsewhere. (v6) A correction for Theorem 3, corollary for contextual dynamic pricing with discretization; updated follow-up work & open questions

  45. arXiv:1312.1277  [pdf, ps, other

    cs.DS cs.LG

    Bandits and Experts in Metric Spaces

    Authors: Robert Kleinberg, Aleksandrs Slivkins, Eli Upfal

    Abstract: In a multi-armed bandit problem, an online algorithm chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies. While the performance of bandit algorithms with a small finite strategy set is quite well understood, bandit problems with large strategy sets are still a topic of very active investigation, motivated by practical applications su… ▽ More

    Submitted 15 April, 2019; v1 submitted 4 December, 2013; originally announced December 2013.

    Comments: This manuscript is a merged and definitive version of (R. Kleinberg, Slivkins, Upfal: STOC 2008) and (R. Kleinberg, Slivkins: SODA 2010), with a significantly revised presentation

  46. arXiv:1308.1746  [pdf, ps, other

    cs.SI cs.CY cs.HC

    Online Decision Making in Crowdsourcing Markets: Theoretical Challenges (Position Paper)

    Authors: Aleksandrs Slivkins, Jennifer Wortman Vaughan

    Abstract: Over the past decade, crowdsourcing has emerged as a cheap and efficient method of obtaining solutions to simple tasks that are difficult for computers to solve but possible for humans. The popularity and promise of crowdsourcing markets has led to both empirical and theoretical research on the design of algorithms to optimize various aspects of these markets, such as the pricing and assignment of… ▽ More

    Submitted 26 November, 2013; v1 submitted 7 August, 2013; originally announced August 2013.

  47. arXiv:1306.0155  [pdf, ps, other

    cs.LG cs.DS

    Dynamic Ad Allocation: Bandits with Budgets

    Authors: Aleksandrs Slivkins

    Abstract: We consider an application of multi-armed bandits to internet advertising (specifically, to dynamic ad allocation in the pay-per-click model, with uncertainty on the click probabilities). We focus on an important practical issue that advertisers are constrained in how much money they can spend on their ad campaigns. This issue has not been considered in the prior work on bandit-based approaches fo… ▽ More

    Submitted 1 June, 2013; originally announced June 2013.

  48. Bandits with Knapsacks

    Authors: Ashwinkumar Badanidiyuru, Robert Kleinberg, Aleksandrs Slivkins

    Abstract: Multi-armed bandit problems are the predominant theoretical model of exploration-exploitation tradeoffs in learning, and they have countless applications ranging from medical trials, to communication networks, to Web search and advertising. In many of these application domains the learner may be constrained by one or more supply (or budget) limits, in addition to the customary limitation on the ti… ▽ More

    Submitted 5 September, 2017; v1 submitted 11 May, 2013; originally announced May 2013.

    Comments: An extended abstract of this work has appeared in the 54th IEEE Symposium on Foundations of Computer Science (FOCS 2013). 55 pages. Compared to the initial "full version" from May'13, this version has a significantly revised presentation and reflects the current status of the follow-up work. Also, this version contains a stronger regret bound in one of the main results

  49. arXiv:1304.7468  [pdf, ps, other

    cs.GT cs.SI physics.soc-ph

    Selection and Influence in Cultural Dynamics

    Authors: David Kempe, Jon Kleinberg, Sigal Oren, Aleksandrs Slivkins

    Abstract: One of the fundamental principles driving diversity or homogeneity in domains such as cultural differentiation, political affiliation, and product adoption is the tension between two forces: influence (the tendency of people to become similar to others they interact with) and selection (the tendency to be affected most by the behavior of others who are already similar). Influence tends to promote… ▽ More

    Submitted 27 October, 2015; v1 submitted 28 April, 2013; originally announced April 2013.

    Comments: A one-page abstract of this work has appeared in ACM EC 2013

  50. arXiv:1302.4138  [pdf, ps, other

    cs.GT

    Multi-parameter Mechanisms with Implicit Payment Computation

    Authors: Moshe Babaioff, Robert Kleinberg, Aleksandrs Slivkins

    Abstract: In this paper we show that payment computation essentially does not present any obstacle in designing truthful mechanisms, even for multi-parameter domains, and even when we can only call the allocation rule once. We present a general reduction that takes any allocation rule which satisfies "cyclic monotonicity" (a known necessary and sufficient condition for truthfulness) and converts it to a tru… ▽ More

    Submitted 12 May, 2013; v1 submitted 17 February, 2013; originally announced February 2013.

    Comments: This is a full version of a paper in ACM EC 2013

    ACM Class: J.4; K.4.4; F.2.2