Skip to main content

Showing 1–11 of 11 results for author: Avadhanula, V

.
  1. arXiv:2301.03099  [pdf, ps, other

    cs.AI cs.LG

    Fully Dynamic Online Selection through Online Contention Resolution Schemes

    Authors: Vashist Avadhanula, Andrea Celli, Riccardo Colini-Baldeschi, Stefano Leonardi, Matteo Russo

    Abstract: We study fully dynamic online selection problems in an adversarial/stochastic setting that includes Bayesian online selection, prophet inequalities, posted price mechanisms, and stochastic probing problems subject to combinatorial constraints. In the classical ``incremental'' version of the problem, selected elements remain active until the end of the input sequence. On the other hand, in the full… ▽ More

    Submitted 8 January, 2023; originally announced January 2023.

  2. arXiv:2211.06516  [pdf, other

    cs.LG

    Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms

    Authors: Vashist Avadhanula, Omar Abdul Baki, Hamsa Bastani, Osbert Bastani, Caner Gocmen, Daniel Haimovich, Darren Hwang, Dima Karamshuk, Thomas Leeper, Jiayuan Ma, Gregory Macnamara, Jake Mullett, Christopher Palow, Sung Park, Varun S Rajagopal, Kevin Schaeffer, Parikshit Shah, Deeksha Sinha, Nicolas Stier-Moses, Peng Xu

    Abstract: We describe the current content moderation strategy employed by Meta to remove policy-violating content from its platforms. Meta relies on both handcrafted and learned risk models to flag potentially violating content for human review. Our approach aggregates these risk models into a single ranking score, calibrating them to prioritize more reliable risk models. A key challenge is that violation t… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

  3. arXiv:2202.05194  [pdf, other

    cs.GT

    Robust and fair work allocation

    Authors: Amine Allouah, Christian Kroer, Xuan Zhang, Vashist Avadhanula, Anil Dania, Caner Gocmen, Sergey Pupyrev, Parikshit Shah, Nicolas Stier

    Abstract: In today's digital world, interaction with online platforms is ubiquitous, and thus content moderation is important for protecting users from content that do not comply with pre-established community guidelines. Having a robust content moderation system throughout every stage of planning is particularly important. We study the short-term planning problem of allocating human content reviewers to di… ▽ More

    Submitted 14 February, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

  4. arXiv:2112.06517  [pdf, other

    cs.LG stat.ML

    Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations

    Authors: Evrard Garcelon, Vashist Avadhanula, Alessandro Lazaric, Matteo Pirotta

    Abstract: We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent, and possibly biased, \emph{evaluations} of the true reward of each arm and it selects $K$ arms with the objective of accumulating as much reward as possible over $T$ rounds. Under the assumption that at each round the true reward of each arm is drawn from a fixed distribution, we… ▽ More

    Submitted 12 April, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

  5. arXiv:2103.16816  [pdf, other

    cs.LG

    QUEST: Queue Simulation for Content Moderation at Scale

    Authors: Rahul Makhijani, Parikshit Shah, Vashist Avadhanula, Caner Gocmen, Nicolás E. Stier-Moses, Julián Mestre

    Abstract: Moderating content in social media platforms is a formidable challenge due to the unprecedented scale of such systems, which typically handle billions of posts per day. Some of the largest platforms such as Facebook blend machine learning with manual review of platform content by thousands of reviewers. Operating a large-scale human review system poses interesting and challenging methodological qu… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

  6. arXiv:2103.10246  [pdf, other

    cs.GT cs.LG

    Stochastic Bandits for Multi-platform Budget Optimization in Online Advertising

    Authors: Vashist Avadhanula, Riccardo Colini-Baldeschi, Stefano Leonardi, Karthik Abinav Sankararaman, Okke Schrijvers

    Abstract: We study the problem of an online advertising system that wants to optimally spend an advertiser's given budget for a campaign across multiple platforms, without knowing the value for showing an ad to the users on those platforms. We model this challenging practical application as a Stochastic Bandits with Knapsacks problem over $T$ rounds of bidding with the set of arms given by the set of distin… ▽ More

    Submitted 25 March, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

  7. arXiv:2011.14033  [pdf, other

    cs.LG stat.ML

    A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

    Authors: Priyank Agrawal, Theja Tulabandhula, Vashist Avadhanula

    Abstract: In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the response in every round. Consumers purchase products to maximize their utility. We assume that a set of attributes describe the products, and the mean utility of… ▽ More

    Submitted 14 April, 2024; v1 submitted 27 November, 2020; originally announced November 2020.

    Comments: Bug fixed

  8. arXiv:2011.01488  [pdf, other

    cs.LG cs.AI

    Multi-armed Bandits with Cost Subsidy

    Authors: Deeksha Sinha, Karthik Abinav Sankararama, Abbas Kazerouni, Vashist Avadhanula

    Abstract: In this paper, we consider a novel variant of the multi-armed bandit (MAB) problem, MAB with cost subsidy, which models many real-life applications where the learning agent has to pay to select an arm and is concerned about optimizing cumulative costs and rewards. We present two applications, intelligent SMS routing problem and ad audience optimization problem faced by several businesses (especial… ▽ More

    Submitted 15 March, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

  9. arXiv:1911.00638  [pdf, other

    cs.LG cs.AI stat.ML

    Thompson Sampling for Contextual Bandit Problems with Auxiliary Safety Constraints

    Authors: Samuel Daulton, Shaun Singh, Vashist Avadhanula, Drew Dimmery, Eytan Bakshy

    Abstract: Recent advances in contextual bandit optimization and reinforcement learning have garnered interest in applying these methods to real-world sequential decision making problems. Real-world applications frequently have constraints with respect to a currently deployed policy. Many of the existing constraint-aware algorithms consider problems with a single objective (the reward) and a constraint on th… ▽ More

    Submitted 1 November, 2019; originally announced November 2019.

    Comments: To appear at NeurIPS 2019, Workshop on Safety and Robustness in Decision Making. 11 pages (including references and appendix)

  10. arXiv:1706.03880  [pdf, other

    cs.LG

    MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

    Authors: Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

    Abstract: We consider a dynamic assortment selection problem, where in every round the retailer offers a subset (assortment) of $N$ substitutable products to a consumer, who selects one of these products according to a multinomial logit (MNL) choice model. The retailer observes this choice and the objective is to dynamically learn the model parameters, while optimizing cumulative revenues over a selling hor… ▽ More

    Submitted 29 June, 2018; v1 submitted 12 June, 2017; originally announced June 2017.

  11. arXiv:1706.00977  [pdf, other

    cs.LG

    Thompson Sampling for the MNL-Bandit

    Authors: Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

    Abstract: We consider a sequential subset selection problem under parameter uncertainty, where at each time step, the decision maker selects a subset of cardinality $K$ from $N$ possible items (arms), and observes a (bandit) feedback in the form of the index of one of the items in said subset, or none. Each item in the index set is ascribed a certain value (reward), and the feedback is governed by a Multino… ▽ More

    Submitted 3 January, 2019; v1 submitted 3 June, 2017; originally announced June 2017.

    Comments: Accepted for presentation at Conference on Learning Theory (COLT) 2017