Skip to main content

Showing 1–19 of 19 results for author: Leslie, D S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2205.10017  [pdf, other

    cs.GT

    A stochastic game framework for patrolling a border

    Authors: Matthew Darlington, Kevin D. Glazebrook, David S. Leslie, Rob Shone, Roberto Szechtman

    Abstract: In this paper we consider a stochastic game for modelling the interactions between smugglers and a patroller along a border. The problem we examine involves a group of cooperating smugglers making regular attempts to bring small amounts of illicit goods across a border. A single patroller has the goal of preventing the smugglers from doing so, but must pay a cost to travel from one location to ano… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

  2. arXiv:2111.03340  [pdf, other

    cs.IR cs.LG stat.ML

    FINN.no Slates Dataset: A new Sequential Dataset Logging Interactions, allViewed Items and Click Responses/No-Click for Recommender Systems Research

    Authors: Simen Eide, Arnoldo Frigessi, Helge Jenssen, David S. Leslie, Joakim Rishaug, Sofie Verrewaere

    Abstract: We present a novel recommender systems dataset that records the sequential interactions between users and an online marketplace. The users are sequentially presented with both recommendations and search results in the form of ranked lists of items, called slates, from the marketplace. The dataset includes the presented slates at each round, whether the user clicked on any of these items and which… ▽ More

    Submitted 5 November, 2021; originally announced November 2021.

    Comments: 5 pages, Fifteen ACM Conference on Recommender Systems (recsys21), 2021, Amsterdam, Netherlands

  3. arXiv:2109.14412  [pdf, other

    cs.LG

    Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification

    Authors: James A. Grant, David S. Leslie

    Abstract: We consider a variant of online binary classification where a learner sequentially assigns labels ($0$ or $1$) to items with unknown true class. If, but only if, the learner chooses label $1$ they immediately observe the true label of the item. The learner faces a trade-off between short-term classification accuracy and long-term information gain. This problem has previously been studied under the… ▽ More

    Submitted 22 April, 2024; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: Update to Theorem 1 and experimental work

  4. arXiv:2106.02748  [pdf, other

    cs.GT cs.LG cs.MA math.DS

    Decentralized Q-Learning in Zero-sum Markov Games

    Authors: Muhammed O. Sayin, Kaiqing Zhang, David S. Leslie, Tamer Basar, Asuman Ozdaglar

    Abstract: We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum Markov games. We focus on the practical but challenging setting of decentralized MARL, where agents make decisions without coordination by a centralized controller, but only based on their own payoffs and local actions executed. The agents need not observe the opponent's actions or payoffs, possibly being ev… ▽ More

    Submitted 12 December, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: To appear at NeurIPS 2021. Strengthened the results in Theorem 1 and Corollary 1

  5. arXiv:2104.15046  [pdf, other

    stat.ML cs.LG

    Dynamic Slate Recommendation with Gated Recurrent Units and Thompson Sampling

    Authors: Simen Eide, David S. Leslie, Arnoldo Frigessi

    Abstract: We consider the problem of recommending relevant content to users of an internet platform in the form of lists of items, called slates. We introduce a variational Bayesian Recurrent Neural Net recommender system that acts on time series of interactions between the internet platform and the user, and which scales to real world industrial situations. The recommender system is tested both online on r… ▽ More

    Submitted 30 April, 2021; originally announced April 2021.

    Comments: The code and the data used in the article are available in the following repository: https://github.com/finn-no/recsys-slates-dataset

  6. arXiv:2102.03324  [pdf, other

    cs.LG stat.ML

    GIBBON: General-purpose Information-Based Bayesian OptimisatioN

    Authors: Henry B. Moss, David S. Leslie, Javier Gonzalez, Paul Rayson

    Abstract: This paper describes a general-purpose extension of max-value entropy search, a popular approach for Bayesian Optimisation (BO). A novel approximation is proposed for the information gain -- an information-theoretic quantity central to solving a range of BO problems, including noisy, multi-fidelity and batch optimisations across both continuous and highly-structured discrete spaces. Previously, th… ▽ More

    Submitted 26 October, 2021; v1 submitted 5 February, 2021; originally announced February 2021.

    Journal ref: Journal of Machine Learning Research 2021

  7. arXiv:2010.00979  [pdf, other

    cs.LG cs.AI stat.ML

    BOSS: Bayesian Optimization over String Spaces

    Authors: Henry B. Moss, Daniel Beck, Javier Gonzalez, David S. Leslie, Paul Rayson

    Abstract: This article develops a Bayesian optimization (BO) method which acts directly over raw strings, proposing the first uses of string kernels and genetic algorithms within BO loops. Recent applications of BO over strings have been hindered by the need to map inputs into a smooth and unconstrained latent space. Learning this projection is computationally and data-intensive. Our approach instead builds… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

  8. arXiv:2009.03207  [pdf, other

    cs.LG cs.IR stat.ML

    Learning to Rank under Multinomial Logit Choice

    Authors: James A. Grant, David S. Leslie

    Abstract: Learning the optimal ordering of content is an important challenge in website design. The learning to rank (LTR) framework models this problem as a sequential problem of selecting lists of content and observing where users decide to click. Most previous work on LTR assumes that the user considers each item in the list in isolation, and makes binary choices to click or not on each. We introduce a m… ▽ More

    Submitted 11 May, 2023; v1 submitted 7 September, 2020; originally announced September 2020.

    Comments: updated with new material including regret bound for unknown position bias setting

  9. arXiv:2007.00939  [pdf, other

    cs.LG stat.ML

    BOSH: Bayesian Optimization by Sampling Hierarchically

    Authors: Henry B. Moss, David S. Leslie, Paul Rayson

    Abstract: Deployments of Bayesian Optimization (BO) for functions with stochastic evaluations, such as parameter tuning via cross validation and simulation optimization, typically optimize an average of a fixed set of noisy realizations of the objective function. However, disregarding the true objective function in this manner finds a high-precision optimum of the wrong function. To solve this problem, we p… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

  10. arXiv:2006.12093  [pdf, other

    cs.LG stat.ML

    MUMBO: MUlti-task Max-value Bayesian Optimization

    Authors: Henry B. Moss, David S. Leslie, Paul Rayson

    Abstract: We propose MUMBO, the first high-performing yet computationally efficient acquisition function for multi-task Bayesian optimization. Here, the challenge is to perform efficient optimization by evaluating low-cost functions somehow related to our true target function. This is a broad class of problems including the popular task of multi-fidelity optimization. However, while information-theoretic ac… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

  11. arXiv:2001.02323  [pdf, other

    cs.LG stat.ML

    On Thompson Sampling for Smoother-than-Lipschitz Bandits

    Authors: James A. Grant, David S. Leslie

    Abstract: Thompson Sampling is a well established approach to bandit and reinforcement learning problems. However its use in continuum armed bandit problems has received relatively little attention. We provide the first bounds on the regret of Thompson Sampling for continuum armed bandits under weak conditions on the function class containing the true function and sub-exponential observation noise. Our boun… ▽ More

    Submitted 26 February, 2020; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: Accepted to AISTATS 2020. 26 pages, 2 figures

  12. arXiv:1906.12230  [pdf, other

    cs.LG cs.CL stat.ML

    FIESTA: Fast IdEntification of State-of-The-Art models using adaptive bandit algorithms

    Authors: Henry B. Moss, Andrew Moore, David S. Leslie, Paul Rayson

    Abstract: We present FIESTA, a model selection approach that significantly reduces the computational resources required to reliably identify state-of-the-art performance from large collections of candidate models. Despite being known to produce unreliable comparisons, it is still common practice to compare model evaluations based on single choices of random seeds. We show that reliable model selection also… ▽ More

    Submitted 28 June, 2019; originally announced June 2019.

    Comments: ACL 2019. Code available at: https://github.com/apmoore1/fiesta

  13. arXiv:1905.06821  [pdf, other

    stat.ML cs.LG

    Adaptive Sensor Placement for Continuous Spaces

    Authors: James A Grant, Alexis Boukouvalas, Ryan-Rhys Griffiths, David S Leslie, Sattar Vakili, Enrique Munoz de Cote

    Abstract: We consider the problem of adaptively placing sensors along an interval to detect stochastically-generated events. We present a new formulation of the problem as a continuum-armed bandit problem with feedback in the form of partial observations of realisations of an inhomogeneous Poisson process. We design a solution method by combining Thompson sampling with nonparametric inference via increasing… ▽ More

    Submitted 16 May, 2019; originally announced May 2019.

    Comments: 13 pages, accepted to ICML 2019

  14. arXiv:1810.02176  [pdf, other

    cs.LG stat.ML

    Adaptive Policies for Perimeter Surveillance Problems

    Authors: James A. Grant, David S. Leslie, Kevin Glazebrook, Roberto Szechtman, Adam N. Letchford

    Abstract: Maximising the detection of intrusions is a fundamental and often critical aim of perimeter surveillance. Commonly, this requires a decision-maker to optimally allocate multiple searchers to segments of the perimeter. We consider a scenario where the decision-maker may sequentially update the searchers' allocation, learning from the observed data to improve decisions over time. In this work we pro… ▽ More

    Submitted 11 November, 2019; v1 submitted 4 October, 2018; originally announced October 2018.

  15. arXiv:1810.01925  [pdf, ps, other

    cs.GT cs.LG math.OC

    Bandit learning in concave $N$-person games

    Authors: Mario Bravo, David S. Leslie, Panayotis Mertikopoulos

    Abstract: This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players'… ▽ More

    Submitted 3 October, 2018; originally announced October 2018.

    Comments: 24 pages, 1 figure

    MSC Class: Primary 91A10; 91A26; secondary 68Q32; 68T02

  16. arXiv:1806.07139  [pdf, other

    cs.CL stat.ML

    Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models

    Authors: Henry B. Moss, David S. Leslie, Paul Rayson

    Abstract: K-fold cross validation (CV) is a popular method for estimating the true performance of machine learning models, allowing model selection and parameter tuning. However, the very process of CV requires random partitioning of the data and so our performance estimates are in fact stochastic, with variability that can be substantial for natural language processing tasks. We demonstrate that these unst… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

    Comments: COLING 2018. Code available at: https://github.com/henrymoss/COLING2018

  17. arXiv:1705.09605  [pdf, ps, other

    cs.LG stat.ML

    Combinatorial Multi-Armed Bandits with Filtered Feedback

    Authors: James A. Grant, David S. Leslie, Kevin Glazebrook, Roberto Szechtman

    Abstract: Motivated by problems in search and detection we present a solution to a Combinatorial Multi-Armed Bandit (CMAB) problem with both heavy-tailed reward distributions and a new class of feedback, filtered semibandit feedback. In a CMAB problem an agent pulls a combination of arms from a set $\{1,...,k\}$ in each round, generating random outcomes from probability distributions associated with these a… ▽ More

    Submitted 26 May, 2017; originally announced May 2017.

    Comments: 16 pages

  18. arXiv:1412.0543  [pdf, ps, other

    math.OC cs.GT cs.MA stat.ML

    Game-theoretical control with continuous action sets

    Authors: Steven Perkins, Panayotis Mertikopoulos, David S. Leslie

    Abstract: Motivated by the recent applications of game-theoretical learning techniques to the design of distributed control systems, we study a class of control problems that can be formulated as potential games with continuous action sets, and we propose an actor-critic reinforcement learning algorithm that provably converges to equilibrium in this class of problems. The method employed is to analyse the l… ▽ More

    Submitted 1 December, 2014; originally announced December 2014.

    Comments: 19 pages

  19. arXiv:1112.2315  [pdf, other

    stat.ML cs.LG cs.MA

    Adaptive Forgetting Factor Fictitious Play

    Authors: Michalis Smyrnakis, David S. Leslie

    Abstract: It is now well known that decentralised optimisation can be formulated as a potential game, and game-theoretical learning algorithms can be used to find an optimum. One of the most common learning techniques in game theory is fictitious play. However fictitious play is founded on an implicit assumption that opponents' strategies are stationary. We present a novel variation of fictitious play that… ▽ More

    Submitted 10 December, 2011; originally announced December 2011.