-
Improved Feature Importance Computations for Tree Models: Shapley vs. Banzhaf
Authors:
Adam Karczmarz,
Anish Mukherjee,
Piotr Sankowski,
Piotr Wygocki
Abstract:
Shapley values are one of the main tools used to explain predictions of tree ensemble models. The main alternative to Shapley values are Banzhaf values that have not been understood equally well. In this paper we make a step towards filling this gap, providing both experimental and theoretical comparison of these model explanation methods. Surprisingly, we show that Banzhaf values offer several ad…
▽ More
Shapley values are one of the main tools used to explain predictions of tree ensemble models. The main alternative to Shapley values are Banzhaf values that have not been understood equally well. In this paper we make a step towards filling this gap, providing both experimental and theoretical comparison of these model explanation methods. Surprisingly, we show that Banzhaf values offer several advantages over Shapley values while providing essentially the same explanations. We verify that Banzhaf values: (1) have a more intuitive interpretation, (2) allow for more efficient algorithms, and (3) are much more numerically robust. We provide an experimental evaluation of these theses. In particular, we show that on real world instances.
Additionally, from a theoretical perspective we provide new and improved algorithm computing the same Shapley value based explanations as the algorithm of Lundberg et al. [Nat. Mach. Intell. 2020]. Our algorithm runs in $O(TLD+n)$ time, whereas the previous algorithm had $O(TLD^2+n)$ running time bound. Here, $T$ is the number of trees, $L$ is the maximum number of leaves in a tree, and $D$ denotes the maximum depth of a tree in the ensemble. Using the computational techniques developed for Shapley values we deliver an optimal $O(TL+n)$ time algorithm for computing Banzhaf values based explanations. In our experiments these algorithms give running times smaller even by an order of magnitude.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Improved approximate near neighbor search without false negatives for $l_2$
Authors:
Piotr Wygocki
Abstract:
We present a new algorithm for the $c$--approximate nearest neighbor search without false negatives for $l_2^d$. We enhance the dimension reduction method presented in \cite{wygos_red} and combine it with the standard results of Indyk and Motwani~\cite{motwani}. We present an efficient algorithm with Las Vegas guaranties for any $c>1$. This improves over the previous results, which require…
▽ More
We present a new algorithm for the $c$--approximate nearest neighbor search without false negatives for $l_2^d$. We enhance the dimension reduction method presented in \cite{wygos_red} and combine it with the standard results of Indyk and Motwani~\cite{motwani}. We present an efficient algorithm with Las Vegas guaranties for any $c>1$. This improves over the previous results, which require $c=ω(\log\log{n})$ \cite{wygos_red}, where $n$ is the number of the input points. Moreover, we improve both the query time and the pre-processing time.
Our algorithm is tunable, which allows for different compromises between the query and the pre-processing times. In order to illustrate this flexibility, we present two variants of the algorithm. The "efficient query" variant involves the query time of $O(d^2)$ and the polynomial pre-processing time. The "efficient pre-processing" variant involves the pre-processing time equal to $O(d^{ω-1} n)$ and the query time sub-linear in $n$, where $ω$ is the exponent in the complexity of the fast matrix multiplication.
In addition, we introduce batch versions of the mentioned algorithms, where the queries come in batches of size $d$. In this case, the amortized query time of the "efficient query" algorithm is reduced to $O(d^{ω-1})$.
△ Less
Submitted 28 September, 2017;
originally announced September 2017.
-
Approximate nearest neighbors search without false negatives for $l_2$ for $c>\sqrt{\log\log{n}}$
Authors:
Piotr Sankowski,
Piotr Wygocki
Abstract:
In this paper, we report progress on answering the open problem presented by Pagh~[14], who considered the nearest neighbor search without false negatives for the Hamming distance. We show new data structures for solving the $c$-approximate nearest neighbors problem without false negatives for Euclidean high dimensional space $\mathcal{R}^d$. These data structures work for any…
▽ More
In this paper, we report progress on answering the open problem presented by Pagh~[14], who considered the nearest neighbor search without false negatives for the Hamming distance. We show new data structures for solving the $c$-approximate nearest neighbors problem without false negatives for Euclidean high dimensional space $\mathcal{R}^d$. These data structures work for any $c = ω(\sqrt{\log{\log{n}}})$, where $n$ is the number of points in the input set, with poly-logarithmic query time and polynomial preprocessing time. This improves over the known algorithms, which require $c$ to be $Ω(\sqrt{d})$.
This improvement is obtained by applying a sequence of reductions, which are interesting on their own. First, we reduce the problem to $d$ instances of dimension logarithmic in $n$. Next, these instances are reduced to a number of $c$-approximate nearest neighbor search instances in $\big(\mathbb{R}^k\big)^L$ space equipped with metric $m(x,y) = \max_{1 \le i \le L}(\lVert x_i - y_i\rVert_2)$.
△ Less
Submitted 13 September, 2017; v1 submitted 21 August, 2017;
originally announced August 2017.
-
On fast bounded locality sensitive hashing
Authors:
Piotr Wygocki
Abstract:
In this paper, we examine the hash functions expressed as scalar products, i.e., $f(x)=<v,x>$, for some bounded random vector $v$. Such hash functions have numerous applications, but often there is a need to optimize the choice of the distribution of $v$. In the present work, we focus on so-called anti-concentration bounds, i.e. the upper bounds of $\mathbb{P}\left[|<v,x>| < α\right]$. In many app…
▽ More
In this paper, we examine the hash functions expressed as scalar products, i.e., $f(x)=<v,x>$, for some bounded random vector $v$. Such hash functions have numerous applications, but often there is a need to optimize the choice of the distribution of $v$. In the present work, we focus on so-called anti-concentration bounds, i.e. the upper bounds of $\mathbb{P}\left[|<v,x>| < α\right]$. In many applications, $v$ is a vector of independent random variables with standard normal distribution. In such case, the distribution of $<v,x>$ is also normal and it is easy to approximate $\mathbb{P}\left[|<v,x>| < α\right]$. Here, we consider two bounded distributions in the context of the anti-concentration bounds. Particularly, we analyze $v$ being a random vector from the unit ball in $l_{\infty}$ and $v$ being a random vector from the unit sphere in $l_{2}$. We show optimal up to a constant anti-concentration measures for functions $f(x)=<v,x>$.
As a consequence of our research, we obtain new best results for \newline \textit{$c$-approximate nearest neighbors without false negatives} for $l_p$ in high dimensional space for all $p\in[1,\infty]$, for $c=Ω(\max\{\sqrt{d},d^{1/p}\})$. These results improve over those presented in [16]. Finally, our paper reports progress on answering the open problem by Pagh~[17], who considered the nearest neighbor search without false negatives for the Hamming distance.
△ Less
Submitted 19 April, 2017;
originally announced April 2017.
-
Why Do Cascade Sizes Follow a Power-Law?
Authors:
Karol Węgrzycki,
Piotr Sankowski,
Andrzej Pacuk,
Piotr Wygocki
Abstract:
We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is co…
▽ More
We introduce random directed acyclic graph and use it to model the information diffusion network. Subsequently, we analyze the cascade generation model (CGM) introduced by Leskovec et al. [19]. Until now only empirical studies of this model were done. In this paper, we present the first theoretical proof that the sizes of cascades generated by the CGM follow the power-law distribution, which is consistent with multiple empirical analysis of the large social networks. We compared the assumptions of our model with the Twitter social network and tested the goodness of approximation.
△ Less
Submitted 20 February, 2017;
originally announced February 2017.
-
RecSys Challenge 2016: job recommendations based on preselection of offers and gradient boosting
Authors:
Andrzej Pacuk,
Piotr Sankowski,
Karol Węgrzycki,
Adam Witkowski,
Piotr Wygocki
Abstract:
We present the Mim-Solution's approach to the RecSys Challenge 2016, which ranked 2nd. The goal of the competition was to prepare job recommendations for the users of the website Xing.com.
Our two phase algorithm consists of candidate selection followed by the candidate ranking. We ranked the candidates by the predicted probability that the user will positively interact with the job offer. We ha…
▽ More
We present the Mim-Solution's approach to the RecSys Challenge 2016, which ranked 2nd. The goal of the competition was to prepare job recommendations for the users of the website Xing.com.
Our two phase algorithm consists of candidate selection followed by the candidate ranking. We ranked the candidates by the predicted probability that the user will positively interact with the job offer. We have used Gradient Boosting Decision Trees as the regression tool.
△ Less
Submitted 3 December, 2016;
originally announced December 2016.
-
There is Something Beyond the Twitter Network
Authors:
Andrzej Pacuk,
Piotr Sankowski,
Karol Wegrzycki,
Piotr Wygocki
Abstract:
How information spreads through a social network? Can we assume, that the information is spread only through a given social network graph? What is the correct way to compare the models of information flow? These are the basic questions we address in this work.
We focus on meticulous comparison of various, well-known models of rumor propagation in the social network. We introduce the model incorp…
▽ More
How information spreads through a social network? Can we assume, that the information is spread only through a given social network graph? What is the correct way to compare the models of information flow? These are the basic questions we address in this work.
We focus on meticulous comparison of various, well-known models of rumor propagation in the social network. We introduce the model incorporating mass media and effects of absent nodes. In this model the information appears spontaneously in the graph. Using the most conservative metric, we showed that the distribution of cascades sizes generated by this model fits the real data much better than the previously considered models.
△ Less
Submitted 28 November, 2016;
originally announced November 2016.
-
Locality-Sensitive Hashing without False Negatives for l_p
Authors:
Andrzej Pacuk,
Piotr Sankowski,
Karol Wegrzycki,
Piotr Wygocki
Abstract:
In this paper, we show a construction of locality-sensitive hash functions without false negatives, i.e., which ensure collision for every pair of points within a given radius $R$ in $d$ dimensional space equipped with $l_p$ norm when $p \in [1,\infty]$. Furthermore, we show how to use these hash functions to solve the $c$-approximate nearest neighbor search problem without false negatives. Namely…
▽ More
In this paper, we show a construction of locality-sensitive hash functions without false negatives, i.e., which ensure collision for every pair of points within a given radius $R$ in $d$ dimensional space equipped with $l_p$ norm when $p \in [1,\infty]$. Furthermore, we show how to use these hash functions to solve the $c$-approximate nearest neighbor search problem without false negatives. Namely, if there is a point at distance $R$, we will certainly report it and points at distance greater than $cR$ will not be reported for $c=Ω(\sqrt{d},d^{1-\frac{1}{p}})$. The constructed algorithms work: - with preprocessing time $\mathcal{O}(n \log(n))$ and sublinear expected query time, - with preprocessing time $\mathcal{O}(\mathrm{poly}(n))$ and expected query time $\mathcal{O}(\log(n))$. Our paper reports progress on answering the open problem presented by Pagh [8] who considered the nearest neighbor search without false negatives for the Hamming distance.
△ Less
Submitted 28 November, 2016;
originally announced November 2016.
-
Approximation Algorithms for Steiner Tree Problems Based on Universal Solution Frameworks
Authors:
Krzysztof Ciebiera,
Piotr Godlewski,
Piotr Sankowski,
Piotr Wygocki
Abstract:
This paper summarizes the work on implementing few solutions for the Steiner Tree problem which we undertook in the PAAL project. The main focus of the project is the development of generic implementations of approximation algorithms together with universal solution frameworks. In particular, we have implemented Zelikovsky 11/6-approximation using local search framework, and 1.39-approximation by…
▽ More
This paper summarizes the work on implementing few solutions for the Steiner Tree problem which we undertook in the PAAL project. The main focus of the project is the development of generic implementations of approximation algorithms together with universal solution frameworks. In particular, we have implemented Zelikovsky 11/6-approximation using local search framework, and 1.39-approximation by Byrka et al. using iterative rounding framework. These two algorithms are experimentally compared with greedy 2-approximation, with exact but exponential time Dreyfus-Wagner algorithm, as well as with results given by a state-of-the-art local search techniques by Uchoa and Werneck. The results of this paper are twofold. On one hand, we demonstrate that high level algorithmic concepts can be designed and efficiently used in C++. On the other hand, we show that the above algorithms with good theoretical guarantees, give decent results in practice, but are inferior to state-of-the-art heuristical approaches.
△ Less
Submitted 28 October, 2014;
originally announced October 2014.