-
On the Importance of Uncertainty in Decision-Making with Large Language Models
Authors:
Nicolò Felicioni,
Lucas Maystre,
Sina Ghiassian,
Kamil Ciosek
Abstract:
We investigate the role of uncertainty in decision-making problems with natural language as input. For such tasks, using Large Language Models as agents has become the norm. However, none of the recent approaches employ any additional phase for estimating the uncertainty the agent has about the world during the decision-making task. We focus on a fundamental decision-making framework with natural…
▽ More
We investigate the role of uncertainty in decision-making problems with natural language as input. For such tasks, using Large Language Models as agents has become the norm. However, none of the recent approaches employ any additional phase for estimating the uncertainty the agent has about the world during the decision-making task. We focus on a fundamental decision-making framework with natural language as input, which is the one of contextual bandits, where the context information consists of text. As a representative of the approaches with no uncertainty estimation, we consider an LLM bandit with a greedy policy, which picks the action corresponding to the largest predicted reward. We compare this baseline to LLM bandits that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy. We employ different techniques for uncertainty estimation, such as Laplace Approximation, Dropout, and Epinets. We empirically show on real-world data that the greedy policy performs worse than the Thompson Sampling policies. These findings suggest that, while overlooked in the LLM literature, uncertainty plays a fundamental role in bandit tasks with LLMs.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay
Authors:
Thomas M. McDonald,
Lucas Maystre,
Mounia Lalmas,
Daniel Russo,
Kamil Ciosek
Abstract:
Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards. We observe that there is an apparent trade-off in choosing the learning signal: Waiting for the full reward to become a…
▽ More
Recommender systems are a ubiquitous feature of online platforms. Increasingly, they are explicitly tasked with increasing users' long-term satisfaction. In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards. We observe that there is an apparent trade-off in choosing the learning signal: Waiting for the full reward to become available might take several weeks, hurting the rate at which learning happens, whereas measuring short-term proxy rewards reflects the actual long-term goal only imperfectly. We address this challenge in two steps. First, we develop a predictive model of delayed rewards that incorporates all information obtained to date. Full observations as well as partial (short or medium-term) outcomes are combined through a Bayesian filter to obtain a probabilistic belief. Second, we devise a bandit algorithm that takes advantage of this new predictive model. The algorithm quickly learns to identify content aligned with long-term success by carefully balancing exploration and exploitation. We apply our approach to a podcast recommendation problem, where we seek to identify shows that users engage with repeatedly over two months. We empirically validate that our approach results in substantially better performance compared to approaches that either optimize for short-term proxies, or wait for the long-term outcome to be fully realized.
△ Less
Submitted 20 July, 2023; v1 submitted 19 July, 2023;
originally announced July 2023.
-
Fast Interactive Search with a Scale-Free Comparison Oracle
Authors:
Daniyar Chumbalov,
Lars Klein,
Lucas Maystre,
Matthias Grossglauser
Abstract:
A comparison-based search algorithm lets a user find a target item $t$ in a database by answering queries of the form, ``Which of items $i$ and $j$ is closer to $t$?'' Instead of formulating an explicit query (such as one or several keywords), the user navigates towards the target via a sequence of such (typically noisy) queries.
We propose a scale-free probabilistic oracle model called $γ$-CKL…
▽ More
A comparison-based search algorithm lets a user find a target item $t$ in a database by answering queries of the form, ``Which of items $i$ and $j$ is closer to $t$?'' Instead of formulating an explicit query (such as one or several keywords), the user navigates towards the target via a sequence of such (typically noisy) queries.
We propose a scale-free probabilistic oracle model called $γ$-CKL for such similarity triplets $(i,j;t)$, which generalizes the CKL triplet model proposed in the literature. The generalization affords independent control over the discriminating power of the oracle and the dimension of the feature space containing the items.
We develop a search algorithm with provably exponential rate of convergence under the $γ$-CKL oracle, thanks to a backtracking strategy that deals with the unavoidable errors in updating the belief region around the target.
We evaluate the performance of the algorithm both over the posited oracle and over several real-world triplet datasets. We also report on a comprehensive user study, where human subjects navigate a database of face portraits.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Estimating long-term causal effects from short-term experiments and long-term observational data with unobserved confounding
Authors:
Graham Van Goffrier,
Lucas Maystre,
Ciarán Gilligan-Lee
Abstract:
Understanding and quantifying cause and effect is an important problem in many domains. The generally-agreed solution to this problem is to perform a randomised controlled trial. However, even when randomised controlled trials can be performed, they usually have relatively short duration's due to cost considerations. This makes learning long-term causal effects a very challenging task in practice,…
▽ More
Understanding and quantifying cause and effect is an important problem in many domains. The generally-agreed solution to this problem is to perform a randomised controlled trial. However, even when randomised controlled trials can be performed, they usually have relatively short duration's due to cost considerations. This makes learning long-term causal effects a very challenging task in practice, since the long-term outcome is only observed after a long delay. In this paper, we study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Previous work provided an estimation strategy to determine long-term causal effects from such data regimes. However, this strategy only works if one assumes there are no unobserved confounders in the observational data. In this paper, we specifically address the challenging case where unmeasured confounders are present in the observational data. Our long-term causal effect estimator is obtained by combining regression residuals with short-term experimental outcomes in a specific manner to create an instrumental variable, which is then used to quantify the long-term causal effect through instrumental variable regression. We prove this estimator is unbiased, and analytically study its variance. In the context of the front-door causal structure, this provides a new causal estimator, which may be of independent interest. Finally, we empirically test our approach on synthetic-data, as well as real-data from the International Stroke Trial.
△ Less
Submitted 21 February, 2023;
originally announced February 2023.
-
Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective
Authors:
Lucas Maystre,
Daniel Russo,
Yu Zhao
Abstract:
We study the problem of optimizing a recommender system for outcomes that occur over several weeks or months. We begin by drawing on reinforcement learning to formulate a comprehensive model of users' recurring relationships with a recommender system. Measurement, attribution, and coordination challenges complicate algorithm design. We describe careful modeling -- including a new representation of…
▽ More
We study the problem of optimizing a recommender system for outcomes that occur over several weeks or months. We begin by drawing on reinforcement learning to formulate a comprehensive model of users' recurring relationships with a recommender system. Measurement, attribution, and coordination challenges complicate algorithm design. We describe careful modeling -- including a new representation of user state and key conditional independence assumptions -- which overcomes these challenges and leads to simple, testable recommender system prototypes. We apply our approach to a podcast recommender system that makes personalized recommendations to hundreds of millions of listeners. A/B tests demonstrate that purposefully optimizing for long-term outcomes leads to large performance gains over conventional approaches that optimize for short-term proxies.
△ Less
Submitted 28 February, 2023; v1 submitted 7 February, 2023;
originally announced February 2023.
-
A Strong Baseline for Batch Imitation Learning
Authors:
Matthew Smith,
Lucas Maystre,
Zhenwen Dai,
Kamil Ciosek
Abstract:
Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making. We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm, in which the agent must learn solely from data collected a priori. This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern. Our…
▽ More
Imitation of expert behaviour is a highly desirable and safe approach to the problem of sequential decision making. We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm, in which the agent must learn solely from data collected a priori. This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern. Our algorithm requires no additional hyper-parameter tuning beyond any standard batch reinforcement learning (RL) algorithm, making it an ideal baseline for such data-strict regimes. Furthermore, we provide formal sample complexity guarantees for the algorithm in finite Markov Decision Problems. In doing so, we formally demonstrate an unproven claim from Kearns & Singh (1998). On the empirical side, our contribution is twofold. First, we develop a practical, robust and principled evaluation protocol for offline RL methods, making use of only the dataset provided for model selection. This stands in contrast to the vast majority of previous works in offline RL, which tune hyperparameters on the evaluation environment, limiting the practical applicability when deployed in new, cost-critical environments. As such, we establish precedent for the development and fair evaluation of offline RL algorithms. Second, we evaluate our own algorithm on challenging continuous control benchmarks, demonstrating its practical applicability and competitiveness with state-of-the-art performance, despite being a simpler algorithm.
△ Less
Submitted 6 February, 2023;
originally announced February 2023.
-
A User Study of Perceived Carbon Footprint
Authors:
Victor Kristof,
Valentin Quelquejay-Leclère,
Robin Zbinden,
Lucas Maystre,
Matthias Grossglauser,
Patrick Thiran
Abstract:
We propose a statistical model to understand people's perception of their carbon footprint. Driven by the observation that few people think of CO2 impact in absolute terms, we design a system to probe people's perception from simple pairwise comparisons of the relative carbon footprint of their actions. The formulation of the model enables us to take an active-learning approach to selecting the pa…
▽ More
We propose a statistical model to understand people's perception of their carbon footprint. Driven by the observation that few people think of CO2 impact in absolute terms, we design a system to probe people's perception from simple pairwise comparisons of the relative carbon footprint of their actions. The formulation of the model enables us to take an active-learning approach to selecting the pairs of actions that are maximally informative about the model parameters. We define a set of 18 actions and collect a dataset of 2183 comparisons from 176 users on a university campus. The early results reveal promising directions to improve climate communication and enhance climate mitigation.
△ Less
Submitted 4 December, 2019; v1 submitted 26 November, 2019;
originally announced November 2019.
-
Scalable and Efficient Comparison-based Search without Features
Authors:
Daniyar Chumbalov,
Lucas Maystre,
Matthias Grossglauser
Abstract:
We consider the problem of finding a target object $t$ using pairwise comparisons, by asking an oracle questions of the form \emph{"Which object from the pair $(i,j)$ is more similar to $t$?"}. Objects live in a space of latent features, from which the oracle generates noisy answers. First, we consider the {\em non-blind} setting where these features are accessible. We propose a new Bayesian compa…
▽ More
We consider the problem of finding a target object $t$ using pairwise comparisons, by asking an oracle questions of the form \emph{"Which object from the pair $(i,j)$ is more similar to $t$?"}. Objects live in a space of latent features, from which the oracle generates noisy answers. First, we consider the {\em non-blind} setting where these features are accessible. We propose a new Bayesian comparison-based search algorithm with noisy answers; it has low computational complexity yet is efficient in the number of queries. We provide theoretical guarantees, deriving the form of the optimal query and proving almost sure convergence to the target $t$. Second, we consider the \emph{blind} setting, where the object features are hidden from the search algorithm. In this setting, we combine our search method and a new distributional triplet embedding algorithm into one scalable learning framework called \textsc{Learn2Search}. We show that the query complexity of our approach on two real-world datasets is on par with the non-blind setting, which is not achievable using any of the current state-of-the-art embedding methods. Finally, we demonstrate the efficacy of our framework by conducting an experiment with users searching for movie actors.
△ Less
Submitted 3 September, 2020; v1 submitted 13 May, 2019;
originally announced May 2019.
-
Pairwise Comparisons with Flexible Time-Dynamics
Authors:
Lucas Maystre,
Victor Kristof,
Matthias Grossglauser
Abstract:
Inspired by applications in sports where the skill of players or teams competing against each other varies over time, we propose a probabilistic model of pairwise-comparison outcomes that can capture a wide range of time dynamics. We achieve this by replacing the static parameters of a class of popular pairwise-comparison models by continuous-time Gaussian processes; the covariance function of the…
▽ More
Inspired by applications in sports where the skill of players or teams competing against each other varies over time, we propose a probabilistic model of pairwise-comparison outcomes that can capture a wide range of time dynamics. We achieve this by replacing the static parameters of a class of popular pairwise-comparison models by continuous-time Gaussian processes; the covariance function of these processes enables expressive dynamics. We develop an efficient inference algorithm that computes an approximate Bayesian posterior distribution. Despite the flexbility of our model, our inference algorithm requires only a few linear-time iterations over the data and can take advantage of modern multiprocessor computer architectures. We apply our model to several historical databases of sports outcomes and find that our approach outperforms competing approaches in terms of predictive performance, scales to millions of observations, and generates compelling visualizations that help in understanding and interpreting the data.
△ Less
Submitted 17 May, 2019; v1 submitted 18 March, 2019;
originally announced March 2019.
-
Can Who-Edits-What Predict Edit Survival?
Authors:
Ali Batuhan Yardım,
Victor Kristof,
Lucas Maystre,
Matthias Grossglauser
Abstract:
As the number of contributors to online peer-production systems grows, it becomes increasingly important to predict whether the edits that users make will eventually be beneficial to the project. Existing solutions either rely on a user reputation system or consist of a highly specialized predictor that is tailored to a specific peer-production system. In this work, we explore a different point in…
▽ More
As the number of contributors to online peer-production systems grows, it becomes increasingly important to predict whether the edits that users make will eventually be beneficial to the project. Existing solutions either rely on a user reputation system or consist of a highly specialized predictor that is tailored to a specific peer-production system. In this work, we explore a different point in the solution space that goes beyond user reputation but does not involve any content-based feature of the edits. We view each edit as a game between the editor and the component of the project. We posit that the probability that an edit is accepted is a function of the editor's skill, of the difficulty of editing the component and of a user-component interaction term. Our model is broadly applicable, as it only requires observing data about who makes an edit, what the edit affects and whether the edit survives or not. We apply our model on Wikipedia and the Linux kernel, two examples of large-scale peer-production systems, and we seek to understand whether it can effectively predict edit survival: in both cases, we provide a positive answer. Our approach significantly outperforms those based solely on user reputation and bridges the gap with specialized predictors that use content-based features. It is simple to implement, computationally inexpensive, and in addition it enables us to discover interesting structure in the data.
△ Less
Submitted 5 July, 2018; v1 submitted 12 January, 2018;
originally announced January 2018.
-
ChoiceRank: Identifying Preferences from Node Traffic in Networks
Authors:
Lucas Maystre,
Matthias Grossglauser
Abstract:
Understanding how users navigate in a network is of high interest in many applications. We consider a setting where only aggregate node-level traffic is observed and tackle the task of learning edge transition probabilities. We cast it as a preference learning problem, and we study a model where choices follow Luce's axiom. In this case, the $O(n)$ marginal counts of node visits are a sufficient s…
▽ More
Understanding how users navigate in a network is of high interest in many applications. We consider a setting where only aggregate node-level traffic is observed and tackle the task of learning edge transition probabilities. We cast it as a preference learning problem, and we study a model where choices follow Luce's axiom. In this case, the $O(n)$ marginal counts of node visits are a sufficient statistic for the $O(n^2)$ transition probabilities. We show how to make the inference problem well-posed regardless of the network's structure, and we present ChoiceRank, an iterative algorithm that scales to networks that contains billions of nodes and edges. We apply the model to two clickstream datasets and show that it successfully recovers the transition probabilities using only the network structure and marginal (node-level) traffic data. Finally, we also consider an application to mobility networks and apply the model to one year of rides on New York City's bicycle-sharing system.
△ Less
Submitted 15 June, 2017; v1 submitted 20 October, 2016;
originally announced October 2016.
-
The Player Kernel: Learning Team Strengths Based on Implicit Player Contributions
Authors:
Lucas Maystre,
Victor Kristof,
Antonio J. González Ferrer,
Matthias Grossglauser
Abstract:
In this work, we draw attention to a connection between skill-based models of game outcomes and Gaussian process classification models. The Gaussian process perspective enables a) a principled way of dealing with uncertainty and b) rich models, specified through kernel functions. Using this connection, we tackle the problem of predicting outcomes of football matches between national teams. We deve…
▽ More
In this work, we draw attention to a connection between skill-based models of game outcomes and Gaussian process classification models. The Gaussian process perspective enables a) a principled way of dealing with uncertainty and b) rich models, specified through kernel functions. Using this connection, we tackle the problem of predicting outcomes of football matches between national teams. We develop a player kernel that relates any two football matches through the players lined up on the field. This makes it possible to share knowledge gained from observing matches between clubs (available in large quantities) and matches between national teams (available only in limited quantities). We evaluate our approach on the Euro 2008, 2012 and 2016 final tournaments.
△ Less
Submitted 5 September, 2016;
originally announced September 2016.
-
Just Sort It! A Simple and Effective Approach to Active Preference Learning
Authors:
Lucas Maystre,
Matthias Grossglauser
Abstract:
We address the problem of learning a ranking by using adaptively chosen pairwise comparisons. Our goal is to recover the ranking accurately but to sample the comparisons sparingly. If all comparison outcomes are consistent with the ranking, the optimal solution is to use an efficient sorting algorithm, such as Quicksort. But how do sorting algorithms behave if some comparison outcomes are inconsis…
▽ More
We address the problem of learning a ranking by using adaptively chosen pairwise comparisons. Our goal is to recover the ranking accurately but to sample the comparisons sparingly. If all comparison outcomes are consistent with the ranking, the optimal solution is to use an efficient sorting algorithm, such as Quicksort. But how do sorting algorithms behave if some comparison outcomes are inconsistent with the ranking? We give favorable guarantees for Quicksort for the popular Bradley-Terry model, under natural assumptions on the parameters. Furthermore, we empirically demonstrate that sorting algorithms lead to a very simple and effective active learning strategy: repeatedly sort the items. This strategy performs as well as state-of-the-art methods (and much better than random sampling) at a minuscule fraction of the computational cost.
△ Less
Submitted 15 June, 2017; v1 submitted 19 February, 2015;
originally announced February 2015.
-
Mitigating Epidemics through Mobile Micro-measures
Authors:
Mohamed Kafsi,
Ehsan Kazemi,
Lucas Maystre,
Lyudmila Yartseva,
Matthias Grossglauser,
Patrick Thiran
Abstract:
Epidemics of infectious diseases are among the largest threats to the quality of life and the economic and social well-being of develo** countries. The arsenal of measures against such epidemics is well-established, but costly and insufficient to mitigate their impact. In this paper, we argue that mobile technology adds a powerful weapon to this arsenal, because (a) mobile devices endow us with…
▽ More
Epidemics of infectious diseases are among the largest threats to the quality of life and the economic and social well-being of develo** countries. The arsenal of measures against such epidemics is well-established, but costly and insufficient to mitigate their impact. In this paper, we argue that mobile technology adds a powerful weapon to this arsenal, because (a) mobile devices endow us with the unprecedented ability to measure and model the detailed behavioral patterns of the affected population, and (b) they enable the delivery of personalized behavioral recommendations to individuals in real time. We combine these two ideas and propose several strategies to generate such recommendations from mobility patterns. The goal of each strategy is a large reduction in infections, with a small impact on the normal course of daily life. We evaluate these strategies over the Orange D4D dataset and show the benefit of mobile micro-measures, even if only a fraction of the population participates. These preliminary results demonstrate the potential of mobile technology to complement other measures like vaccination and quarantines against disease epidemics.
△ Less
Submitted 8 July, 2013;
originally announced July 2013.