-
Unbiased Offline Evaluation for Learning to Rank with Business Rules
Authors:
Matej Jakimov,
Alexander Buchholz,
Yannik Stein,
Thorsten Joachims
Abstract:
For industrial learning-to-rank (LTR) systems, it is common that the output of a ranking model is modified, either as a results of post-processing logic that enforces business requirements, or as a result of unforeseen design flaws or bugs present in real-world production systems. This poses a challenge for deploying off-policy learning and evaluation methods, as these often rely on the assumption…
▽ More
For industrial learning-to-rank (LTR) systems, it is common that the output of a ranking model is modified, either as a results of post-processing logic that enforces business requirements, or as a result of unforeseen design flaws or bugs present in real-world production systems. This poses a challenge for deploying off-policy learning and evaluation methods, as these often rely on the assumption that rankings implied by the model's scores coincide with displayed items to the users. Further requirements for reliable offline evaluation are proper randomization and correct estimation of the propensities of displaying each item in any given position of the ranking, which are also impacted by the aforementioned post-processing. We investigate empirically how these scenarios impair off-policy evaluation for learning-to-rank models. We then propose a novel correction method based on the Birkhoff-von-Neumann decomposition that is robust to this type of post-processing. We obtain more accurate off-policy estimates in offline experiments, overcoming the problem of post-processed rankings. To the best of our knowledge this is the first study on the impact of real-world business rules on offline evaluation of LTR models.
△ Less
Submitted 3 November, 2023;
originally announced November 2023.
-
Fair Effect Attribution in Parallel Online Experiments
Authors:
Alexander Buchholz,
Vito Bellini,
Giuseppe Di Benedetto,
Yannik Stein,
Matteo Ruffini,
Fabian Moerchen
Abstract:
A/B tests serve the purpose of reliably identifying the effect of changes introduced in online services. It is common for online platforms to run a large number of simultaneous experiments by splitting incoming user traffic randomly in treatment and control groups. Despite a perfect randomization between different groups, simultaneous experiments can interact with each other and create a negative…
▽ More
A/B tests serve the purpose of reliably identifying the effect of changes introduced in online services. It is common for online platforms to run a large number of simultaneous experiments by splitting incoming user traffic randomly in treatment and control groups. Despite a perfect randomization between different groups, simultaneous experiments can interact with each other and create a negative impact on average population outcomes such as engagement metrics. These are measured globally and monitored to protect overall user experience. Therefore, it is crucial to measure these interaction effects and attribute their overall impact in a fair way to the respective experimenters. We suggest an approach to measure and disentangle the effect of simultaneous experiments by providing a cost sharing approach based on Shapley values. We also provide a counterfactual perspective, that predicts shared impact based on conditional average treatment effects making use of causal inference techniques. We illustrate our approach in real world and synthetic data experiments.
△ Less
Submitted 15 October, 2022;
originally announced October 2022.
-
Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo sampling
Authors:
Alexander Buchholz,
Jan Malte Lichtenberg,
Giuseppe Di Benedetto,
Yannik Stein,
Vito Bellini,
Matteo Ruffini
Abstract:
The Plackett-Luce (PL) model is ubiquitous in learning-to-rank (LTR) because it provides a useful and intuitive probabilistic model for sampling ranked lists. Counterfactual offline evaluation and optimization of ranking metrics are pivotal for using LTR methods in production. When adopting the PL model as a ranking policy, both tasks require the computation of expectations with respect to the mod…
▽ More
The Plackett-Luce (PL) model is ubiquitous in learning-to-rank (LTR) because it provides a useful and intuitive probabilistic model for sampling ranked lists. Counterfactual offline evaluation and optimization of ranking metrics are pivotal for using LTR methods in production. When adopting the PL model as a ranking policy, both tasks require the computation of expectations with respect to the model. These are usually approximated via Monte-Carlo (MC) sampling, since the combinatorial scaling in the number of items to be ranked makes their analytical computation intractable. Despite recent advances in improving the computational efficiency of the sampling process via the Gumbel top-k trick, the MC estimates can suffer from high variance. We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model by combining the Gumbel top-k trick with quasi-Monte Carlo (QMC) sampling, a well-established technique for variance reduction. We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Ranker-agnostic Contextual Position Bias Estimation
Authors:
Oriol Barbany Mayor,
Vito Bellini,
Alexander Buchholz,
Giuseppe Di Benedetto,
Diego Marco Granziol,
Matteo Ruffini,
Yannik Stein
Abstract:
Learning-to-rank (LTR) algorithms are ubiquitous and necessary to explore the extensive catalogs of media providers. To avoid the user examining all the results, its preferences are used to provide a subset of relatively small size. The user preferences can be inferred from the interactions with the presented content if explicit ratings are unavailable. However, directly using implicit feedback ca…
▽ More
Learning-to-rank (LTR) algorithms are ubiquitous and necessary to explore the extensive catalogs of media providers. To avoid the user examining all the results, its preferences are used to provide a subset of relatively small size. The user preferences can be inferred from the interactions with the presented content if explicit ratings are unavailable. However, directly using implicit feedback can lead to learning wrong relevance models and is known as biased LTR. The mismatch between implicit feedback and true relevances is due to various nuisances, with position bias one of the most relevant. Position bias models consider that the lack of interaction with a presented item is not only attributed to the item being irrelevant but because the item was not examined. This paper introduces a method for modeling the probability of an item being seen in different contexts, e.g., for different users, with a single estimator. Our suggested method, denoted as contextual (EM)-based regression, is ranker-agnostic and able to correctly learn the latent examination probabilities while only using implicit feedback. Our empirical results indicate that the method introduced in this paper outperforms other existing position bias estimators in terms of relative error when the examination probability varies across queries. Moreover, the estimated values provide a ranking performance boost when used to debias the implicit ranking data even if there is no context dependency on the examination probabilities.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
Learning to Rank in the Position Based Model with Bandit Feedback
Authors:
Beyza Ermis,
Patrick Ernst,
Yannik Stein,
Giovanni Zappella
Abstract:
Personalization is a crucial aspect of many online experiences. In particular, content ranking is often a key component in delivering sophisticated personalization results. Commonly, supervised learning-to-rank methods are applied, which suffer from bias introduced during data collection by production systems in charge of producing the ranking. To compensate for this problem, we leverage contextua…
▽ More
Personalization is a crucial aspect of many online experiences. In particular, content ranking is often a key component in delivering sophisticated personalization results. Commonly, supervised learning-to-rank methods are applied, which suffer from bias introduced during data collection by production systems in charge of producing the ranking. To compensate for this problem, we leverage contextual multi-armed bandits. We propose novel extensions of two well-known algorithms viz. LinUCB and Linear Thompson Sampling to the ranking use-case. To account for the biases in a production environment, we employ the position-based click model. Finally, we show the validity of the proposed algorithms by conducting extensive offline experiments on synthetic datasets as well as customer facing online A/B experiments.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Reconstruction of the Path Graph
Authors:
Chaya Keller,
Yael Stein
Abstract:
Let $P$ be a set of $n \geq 5$ points in convex position in the plane. The path graph $G(P)$ of $P$ is an abstract graph whose vertices are non-crossing spanning paths of $P$, such that two paths are adjacent if one can be obtained from the other by deleting an edge and adding another edge.
We prove that the automorphism group of $G(P)$ is isomorphic to $D_{n}$, the dihedral group of order $2n$.…
▽ More
Let $P$ be a set of $n \geq 5$ points in convex position in the plane. The path graph $G(P)$ of $P$ is an abstract graph whose vertices are non-crossing spanning paths of $P$, such that two paths are adjacent if one can be obtained from the other by deleting an edge and adding another edge.
We prove that the automorphism group of $G(P)$ is isomorphic to $D_{n}$, the dihedral group of order $2n$. The heart of the proof is an algorithm that first identifies the vertices of $G(P)$ that correspond to boundary paths of $P$, where the identification is unique up to an automorphism of $K(P)$ as a geometric graph, and then identifies (uniquely) all edges of each path represented by a vertex of $G(P)$. The complexity of the algorithm is $O(N \log N)$ where $N$ is the number of vertices of $G(P)$.
△ Less
Submitted 31 December, 2017;
originally announced January 2018.
-
Improved Time-Space Trade-offs for Computing Voronoi Diagrams
Authors:
Bahareh Banyassady,
Matias Korman,
Wolfgang Mulzer,
André van Renssen,
Marcel Roeloffzen,
Paul Seiferth,
Yannik Stein
Abstract:
Let $P$ be a planar set of $n$ sites in general position. For $k\in\{1,\dots,n-1\}$, the Voronoi diagram of order $k$ for $P$ is obtained by subdividing the plane into cells such that points in the same cell have the same set of nearest $k$ neighbors in $P$. The (nearest site) Voronoi diagram (NVD) and the farthest site Voronoi diagram (FVD) are the particular cases of $k=1$ and $k=n-1$, respectiv…
▽ More
Let $P$ be a planar set of $n$ sites in general position. For $k\in\{1,\dots,n-1\}$, the Voronoi diagram of order $k$ for $P$ is obtained by subdividing the plane into cells such that points in the same cell have the same set of nearest $k$ neighbors in $P$. The (nearest site) Voronoi diagram (NVD) and the farthest site Voronoi diagram (FVD) are the particular cases of $k=1$ and $k=n-1$, respectively. For any given $K\in\{1,\dots,n-1\}$, the family of all higher-order Voronoi diagrams of order $k=1,\dots,K$ for $P$ can be computed in total time $O(nK^2+ n\log n)$ using $O(K^2(n-K))$ space [Aggarwal et al., DCG'89; Lee, TC'82]. Moreover, NVD and FVD for $P$ can be computed in $O(n\log n)$ time using $O(n)$ space [Preparata, Shamos, Springer'85].
For $s\in\{1,\dots,n\}$, an $s$-workspace algorithm has random access to a read-only array with the sites of $P$ in arbitrary order. Additionally, the algorithm may use $O(s)$ words, of $Θ(\log n)$ bits each, for reading and writing intermediate data. The output can be written only once and cannot be accessed or modified afterwards.
We describe a deterministic $s$-workspace algorithm for computing NVD and FVD for $P$ that runs in $O((n^2/s)\log s)$ time. Moreover, we generalize our $s$-workspace algorithm so that for any given $K\in O(\sqrt{s})$, we compute the family of all higher-order Voronoi diagrams of order $k=1,\dots,K$ for $P$ in total expected time $O (\frac{n^2 K^5}{s}(\log s+K2^{O(\log^* K)}))$ or in total deterministic time $O(\frac{n^2 K^5}{s}(\log s+K\log K))$. Previously, for Voronoi diagrams, the only known $s$-workspace algorithm runs in expected time $O\bigl((n^2/s)\log s+n\log s\log^* s)$ [Korman et al., WADS'15] and only works for NVD (i.e., $k=1$). Unlike the previous algorithm, our new method is very simple and does not rely on advanced data structures or random sampling techniques.
△ Less
Submitted 1 October, 2018; v1 submitted 2 August, 2017;
originally announced August 2017.
-
Routing in Polygonal Domains
Authors:
Bahareh Banyassady,
Man-Kwun Chiu,
Matias Korman,
Wolfgang Mulzer,
André van Renssen,
Marcel Roeloffzen,
Paul Seiferth,
Yannik Stein,
Birgit Vogtenhuber,
Max Willert
Abstract:
We consider the problem of routing a data packet through the visibility graph of a polygonal domain $P$ with $n$ vertices and $h$ holes. We may preprocess $P$ to obtain a label and a routing table for each vertex of $P$. Then, we must be able to route a data packet between any two vertices $p$ and $q$ of $P$, where each step must use only the label of the target node $q$ and the routing table of t…
▽ More
We consider the problem of routing a data packet through the visibility graph of a polygonal domain $P$ with $n$ vertices and $h$ holes. We may preprocess $P$ to obtain a label and a routing table for each vertex of $P$. Then, we must be able to route a data packet between any two vertices $p$ and $q$ of $P$, where each step must use only the label of the target node $q$ and the routing table of the current node.
For any fixed $\varepsilon > 0$, we present a routing scheme that always achieves a routing path whose length exceeds the shortest path by a factor of at most $1 + \varepsilon$. The labels have $O(\log n)$ bits, and the routing tables are of size $O((\varepsilon^{-1}+h)\log n)$. The preprocessing time is $O(n^2\log n)$. It can be improved to $O(n^2)$ for simple polygons.
△ Less
Submitted 2 August, 2018; v1 submitted 28 March, 2017;
originally announced March 2017.
-
The Rainbow at the End of the Line --- A PPAD Formulation of the Colorful Carathéodory Theorem with Applications
Authors:
Frédéric Meunier,
Wolfgang Mulzer,
Pauline Sarrabezolles,
Yannik Stein
Abstract:
Let $C_1,...,C_{d+1}$ be $d+1$ point sets in $\mathbb{R}^d$, each containing the origin in its convex hull. A subset $C$ of $\bigcup_{i=1}^{d+1} C_i$ is called a colorful choice (or rainbow) for $C_1, \dots, C_{d+1}$, if it contains exactly one point from each set $C_i$. The colorful Carathéodory theorem states that there always exists a colorful choice for $C_1,\dots,C_{d+1}$ that has the origin…
▽ More
Let $C_1,...,C_{d+1}$ be $d+1$ point sets in $\mathbb{R}^d$, each containing the origin in its convex hull. A subset $C$ of $\bigcup_{i=1}^{d+1} C_i$ is called a colorful choice (or rainbow) for $C_1, \dots, C_{d+1}$, if it contains exactly one point from each set $C_i$. The colorful Carathéodory theorem states that there always exists a colorful choice for $C_1,\dots,C_{d+1}$ that has the origin in its convex hull. This theorem is very general and can be used to prove several other existence theorems in high-dimensional discrete geometry, such as the centerpoint theorem or Tverberg's theorem. The colorful Carathéodory problem (CCP) is the computational problem of finding such a colorful choice. Despite several efforts in the past, the computational complexity of CCP in arbitrary dimension is still open.
We show that CCP lies in the intersection of the complexity classes PPAD and PLS. This makes it one of the few geometric problems in PPAD and PLS that are not known to be solvable in polynomial time. Moreover, it implies that the problem of computing centerpoints, computing Tverberg partitions, and computing points with large simplicial depth is contained in $\text{PPAD} \cap \text{PLS}$. This is the first nontrivial upper bound on the complexity of these problems. Finally, we show that our PPAD formulation leads to a polynomial-time algorithm for a special case of CCP in which we have only two color classes $C_1$ and $C_2$ in $d$ dimensions, each with the origin in its convex hull, and we would like to find a set with half the points from each color class that contains the origin in its convex hull.
△ Less
Submitted 5 August, 2016;
originally announced August 2016.
-
Approximating the Simplicial Depth
Authors:
Peyman Afshani,
Donald R. Sheehy,
Yannik Stein
Abstract:
Let $P$ be a set of $n$ points in $d$-dimensions. The simplicial depth, $σ_P(q)$ of a point $q$ is the number of $d$-simplices with vertices in $P$ that contain $q$ in their convex hulls. The simplicial depth is a notion of data depth with many applications in robust statistics and computational geometry. Computing the simplicial depth of a point is known to be a challenging problem. The trivial s…
▽ More
Let $P$ be a set of $n$ points in $d$-dimensions. The simplicial depth, $σ_P(q)$ of a point $q$ is the number of $d$-simplices with vertices in $P$ that contain $q$ in their convex hulls. The simplicial depth is a notion of data depth with many applications in robust statistics and computational geometry. Computing the simplicial depth of a point is known to be a challenging problem. The trivial solution requires $O(n^{d+1})$ time whereas it is generally believed that one cannot do better than $O(n^{d-1})$. In this paper, we consider approximation algorithms for computing the simplicial depth of a point. For $d=2$, we present a new data structure that can approximate the simplicial depth in polylogarithmic time, using polylogarithmic query time. In 3D, we can approximate the simplicial depth of a given point in near-linear time, which is clearly optimal up to polylogarithmic factors. For higher dimensions, we consider two approximation algorithms with different worst-case scenarios. By combining these approaches, we compute a $(1+\varepsilon)$-approximation of the simplicial depth in time $\tilde{O}(n^{d/2 + 1})$ ignoring polylogarithmic factor. All of these algorithms are Monte Carlo algorithms. Furthermore, we present a simple strategy to compute the simplicial depth exactly in $O(n^d \log n)$ time, which provides the first improvement over the trivial $O(n^{d+1})$ time algorithm for $d>4$. Finally, we show that computing the simplicial depth exactly is #P-complete and W[1]-hard if the dimension is part of the input.
△ Less
Submitted 27 December, 2015; v1 submitted 15 December, 2015;
originally announced December 2015.
-
Time-Space Trade-offs for Triangulations and Voronoi Diagrams
Authors:
Matias Korman,
Wolfgang Mulzer,
Andre van Renssen,
Marcel Roeloffzen,
Paul Seiferth,
Yannik Stein
Abstract:
Let $S$ be a planar $n$-point set. A triangulation for $S$ is a maximal plane straight-line graph with vertex set $S$. The Voronoi diagram for $S$ is the subdivision of the plane into cells such that all points in a cell have the same nearest neighbor in $S$. Classically, both structures can be computed in $O(n \log n)$ time and $O(n)$ space. We study the situation when the available workspace is…
▽ More
Let $S$ be a planar $n$-point set. A triangulation for $S$ is a maximal plane straight-line graph with vertex set $S$. The Voronoi diagram for $S$ is the subdivision of the plane into cells such that all points in a cell have the same nearest neighbor in $S$. Classically, both structures can be computed in $O(n \log n)$ time and $O(n)$ space. We study the situation when the available workspace is limited: given a parameter $s \in \{1, \dots, n\}$, an $s$-workspace algorithm has read-only access to an input array with the points from $S$ in arbitrary order, and it may use only $O(s)$ additional words of $Θ(\log n)$ bits for reading and writing intermediate data. The output should then be written to a write-only structure. We describe a deterministic $s$-workspace algorithm for computing an arbitrary triangulation of $S$ in time $O(n^2/s + n \log n \log s )$ and a randomized $s$-workspace algorithm for finding the Voronoi diagram of $S$ in expected time $O((n^2/s) \log s + n \log s \log^*s)$.
△ Less
Submitted 2 October, 2020; v1 submitted 13 July, 2015;
originally announced July 2015.
-
Computational Aspects of the Colorful Carathéodory Theorem
Authors:
Wolfgang Mulzer,
Yannik Stein
Abstract:
Let $C_1,\dots,C_{d+1}\subset \mathbb{R}^d$ be $d+1$ point sets, each containing the origin in its convex hull. We call these sets color classes, and we call a sequence $p_1, \dots, p_{d+1}$ with $p_i \in C_i$, for $i = 1, \dots, d+1$, a colorful choice. The colorful Carathéodory theorem guarantees the existence of a colorful choice that also contains the origin in its convex hull. The computation…
▽ More
Let $C_1,\dots,C_{d+1}\subset \mathbb{R}^d$ be $d+1$ point sets, each containing the origin in its convex hull. We call these sets color classes, and we call a sequence $p_1, \dots, p_{d+1}$ with $p_i \in C_i$, for $i = 1, \dots, d+1$, a colorful choice. The colorful Carathéodory theorem guarantees the existence of a colorful choice that also contains the origin in its convex hull. The computational complexity of finding such a colorful choice (CCP) is unknown. This is particularly interesting in the light of polynomial-time reductions from several related problems, such as computing centerpoints, to CCP.
We define a novel notion of approximation that is compatible with the polynomial-time reductions to CCP: a sequence that contains at most $k$ points from each color class is called a $k$-colorful choice. We present an algorithm that for any fixed $\varepsilon > 0$, outputs an $\lceil εd\rceil$-colorful choice containing the origin in its convex hull in polynomial time.
Furthermore, we consider a related problem of CCP: in the nearest colorful polytope problem (NCP), we are given sets $C_1,\dots,C_n\subset\mathbb{R}^d$ that do not necessarily contain the origin in their convex hulls. The goal is to find a colorful choice whose convex hull minimizes the distance to the origin. We show that computing a local optimum for NCP is PLS-complete, while computing a global optimum is NP-hard.
△ Less
Submitted 5 February, 2018; v1 submitted 10 December, 2014;
originally announced December 2014.
-
Approximate k-flat Nearest Neighbor Search
Authors:
Wolfgang Mulzer,
Huy L. Nguyen,
Paul Seiferth,
Yannik Stein
Abstract:
Let $k$ be a nonnegative integer. In the approximate $k$-flat nearest neighbor ($k$-ANN) problem, we are given a set $P \subset \mathbb{R}^d$ of $n$ points in $d$-dimensional space and a fixed approximation factor $c > 1$. Our goal is to preprocess $P$ so that we can efficiently answer approximate $k$-flat nearest neighbor queries: given a $k$-flat $F$, find a point in $P$ whose distance to $F$ is…
▽ More
Let $k$ be a nonnegative integer. In the approximate $k$-flat nearest neighbor ($k$-ANN) problem, we are given a set $P \subset \mathbb{R}^d$ of $n$ points in $d$-dimensional space and a fixed approximation factor $c > 1$. Our goal is to preprocess $P$ so that we can efficiently answer approximate $k$-flat nearest neighbor queries: given a $k$-flat $F$, find a point in $P$ whose distance to $F$ is within a factor $c$ of the distance between $F$ and the closest point in $P$. The case $k = 0$ corresponds to the well-studied approximate nearest neighbor problem, for which a plethora of results are known, both in low and high dimensions. The case $k = 1$ is called approximate line nearest neighbor. In this case, we are aware of only one provably efficient data structure, due to Andoni, Indyk, Krauthgamer, and Nguyen. For $k \geq 2$, we know of no previous results.
We present the first efficient data structure that can handle approximate nearest neighbor queries for arbitrary $k$. We use a data structure for $0$-ANN-queries as a black box, and the performance depends on the parameters of the $0$-ANN solution: suppose we have an $0$-ANN structure with query time $O(n^ρ)$ and space requirement $O(n^{1+σ})$, for $ρ, σ> 0$. Then we can answer $k$-ANN queries in time $O(n^{k/(k + 1 - ρ) + t})$ and space $O(n^{1+σk/(k + 1 - ρ)} + n\log^{O(1/t)} n)$. Here, $t > 0$ is an arbitrary constant and the $O$-notation hides exponential factors in $k$, $1/t$, and $c$ and polynomials in $d$. Our new data structures also give an improvement in the space requirement over the previous result for $1$-ANN: we can achieve near-linear space and sublinear query time, a further step towards practical applications where space constitutes the bottleneck.
△ Less
Submitted 6 November, 2014;
originally announced November 2014.
-
Algorithms for Tolerant Tverberg Partitions
Authors:
Wolfgang Mulzer,
Yannik Stein
Abstract:
Let $P$ be a $d$-dimensional $n$-point set. A partition $T$ of $P$ is called a Tverberg partition if the convex hulls of all sets in $T$ intersect in at least one point. We say $T$ is $t$-tolerant if it remains a Tverberg partition after deleting any $t$ points from $P$. Soberón and Strausz proved that there is always a $t$-tolerant Tverberg partition with $\lceil n / (d+1)(t+1) \rceil$ sets. Howe…
▽ More
Let $P$ be a $d$-dimensional $n$-point set. A partition $T$ of $P$ is called a Tverberg partition if the convex hulls of all sets in $T$ intersect in at least one point. We say $T$ is $t$-tolerant if it remains a Tverberg partition after deleting any $t$ points from $P$. Soberón and Strausz proved that there is always a $t$-tolerant Tverberg partition with $\lceil n / (d+1)(t+1) \rceil$ sets. However, so far no nontrivial algorithms for computing or approximating such partitions have been presented.
For $d \leq 2$, we show that the Soberón-Strausz bound can be improved, and we show how the corresponding partitions can be found in polynomial time. For $d \geq 3$, we give the first polynomial-time approximation algorithm by presenting a reduction to the Tverberg problem with no tolerance. Finally, we show that it is coNP-complete to determine whether a given Tverberg partition is t-tolerant.
△ Less
Submitted 17 September, 2014; v1 submitted 14 June, 2013;
originally announced June 2013.