-
Ranking Median Regression: Learning to Order through Local Consensus
Authors:
Stephan Clémençon,
Anna Korba,
Eric Sibony
Abstract:
This article is devoted to the problem of predicting the value taken by a random permutation $Σ$, describing the preferences of an individual over a set of numbered items $\{1,\; \ldots,\; n\}$ say, based on the observation of an input/explanatory r.v. $X$ e.g. characteristics of the individual), when error is measured by the Kendall $τ$ distance. In the probabilistic formulation of the 'Learning…
▽ More
This article is devoted to the problem of predicting the value taken by a random permutation $Σ$, describing the preferences of an individual over a set of numbered items $\{1,\; \ldots,\; n\}$ say, based on the observation of an input/explanatory r.v. $X$ e.g. characteristics of the individual), when error is measured by the Kendall $τ$ distance. In the probabilistic formulation of the 'Learning to Order' problem we propose, which extends the framework for statistical Kemeny ranking aggregation developped in \citet{CKS17}, this boils down to recovering conditional Kemeny medians of $Σ$ given $X$ from i.i.d. training examples $(X_1, Σ_1),\; \ldots,\; (X_N, Σ_N)$. For this reason, this statistical learning problem is referred to as \textit{ranking median regression} here. Our contribution is twofold. We first propose a probabilistic theory of ranking median regression: the set of optimal elements is characterized, the performance of empirical risk minimizers is investigated in this context and situations where fast learning rates can be achieved are also exhibited. Next we introduce the concept of local consensus/median, in order to derive efficient methods for ranking median regression. The major advantage of this local learning approach lies in its close connection with the widely studied Kemeny aggregation problem. From an algorithmic perspective, this permits to build predictive rules for ranking median regression by implementing efficient techniques for (approximate) Kemeny median computations at a local level in a tractable manner. In particular, versions of $k$-nearest neighbor and tree-based methods, tailored to ranking median regression, are investigated. Accuracy of piecewise constant ranking median regression rules is studied under a specific smoothness assumption for $Σ$'s conditional distribution given $X$.
△ Less
Submitted 18 December, 2017; v1 submitted 31 October, 2017;
originally announced November 2017.
-
A Multiresolution Analysis Framework for the Statistical Analysis of Incomplete Rankings
Authors:
Eric Sibony,
Stéphan Clémençon,
Jérémie Jakubowicz
Abstract:
Though the statistical analysis of ranking data has been a subject of interest over the past centuries, especially in economics, psychology or social choice theory, it has been revitalized in the past 15 years by recent applications such as recommender or search engines and is receiving now increasing interest in the machine learning literature. Numerous modern systems indeed generate ranking data…
▽ More
Though the statistical analysis of ranking data has been a subject of interest over the past centuries, especially in economics, psychology or social choice theory, it has been revitalized in the past 15 years by recent applications such as recommender or search engines and is receiving now increasing interest in the machine learning literature. Numerous modern systems indeed generate ranking data, representing for instance ordered results to a query or user preferences. Each such ranking usually involves a small but varying subset of the whole catalog of items only. The study of the variability of these data, i.e. the statistical analysis of incomplete rank-ings, is however a great statistical and computational challenge, because of their heterogeneity and the related combinatorial complexity of the problem. Whereas many statistical methods for analyzing full rankings (orderings of all the items in the catalog) are documented in the dedicated literature, partial rankings (full rankings with ties) or pairwise comparisons, only a few approaches are available today to deal with incomplete ranking, relying each on a strong specific assumption. It is the purpose of this article to introduce a novel general framework for the statistical analysis of incomplete rankings. It is based on a representation tailored to these specific data, whose construction is also explained here, which fits with the natural multi-scale structure of incomplete rankings and provides a new decomposition of rank information with a multiresolu-tion analysis interpretation (MRA). We show that the MRA representation naturally allows to overcome both the statistical and computational challenges without any structural assumption on the data. It therefore provides a general and flexible framework to solve a wide variety of statistical problems, where data are of the form of incomplete rankings.
△ Less
Submitted 4 January, 2016;
originally announced January 2016.
-
Multiresolution Analysis of Incomplete Rankings
Authors:
Stéphan Clémençon,
Jérémie Jakubowicz,
Eric Sibony
Abstract:
Incomplete rankings on a set of items $\{1,\; \ldots,\; n\}$ are orderings of the form $a_{1}\prec\dots\prec a_{k}$, with $\{a_{1},\dots a_{k}\}\subset\{1,\dots,n\}$ and $k < n$. Though they arise in many modern applications, only a few methods have been introduced to manipulate them, most of them consisting in representing any incomplete ranking by the set of all its possible linear extensions on…
▽ More
Incomplete rankings on a set of items $\{1,\; \ldots,\; n\}$ are orderings of the form $a_{1}\prec\dots\prec a_{k}$, with $\{a_{1},\dots a_{k}\}\subset\{1,\dots,n\}$ and $k < n$. Though they arise in many modern applications, only a few methods have been introduced to manipulate them, most of them consisting in representing any incomplete ranking by the set of all its possible linear extensions on $\{1,\; \ldots,\; n\}$. It is the major purpose of this paper to introduce a completely novel approach, which allows to treat incomplete rankings directly, representing them as injective words over $\{1,\; \ldots,\; n\}$. Unexpectedly, operations on incomplete rankings have very simple equivalents in this setting and the topological structure of the complex of injective words can be interpretated in a simple fashion from the perspective of ranking. We exploit this connection here and use recent results from algebraic topology to construct a multiresolution analysis and develop a wavelet framework for incomplete rankings. Though purely combinatorial, this construction relies on the same ideas underlying multiresolution analysis on a Euclidean space, and permits to localize the information related to rankings on each subset of items. It can be viewed as a crucial step toward nonlinear approximation of distributions of incomplete rankings and paves the way for many statistical applications, including preference data analysis and the design of recommender systems.
△ Less
Submitted 8 March, 2014;
originally announced March 2014.