Showing 1–2 of 2 results for author: Tse, D N

Search v0.5.6 released 2020-02-24

arXiv:1906.04356 [pdf, other]

cs.LG cs.DS cs.IT stat.ML

Ultra Fast Medoid Identification via Correlated Sequential Halving

Authors: Tavor Z. Baharav, David N. Tse

Abstract: The medoid of a set of n points is the point in the set that minimizes the sum of distances to other points. It can be determined exactly in O(n^2) time by computing the distances between all pairs of points. Previous works show that one can significantly reduce the number of distance computations needed by adaptively querying distances. The resulting randomized algorithm is obtained by a direct c… ▽ More The medoid of a set of n points is the point in the set that minimizes the sum of distances to other points. It can be determined exactly in O(n^2) time by computing the distances between all pairs of points. Previous works show that one can significantly reduce the number of distance computations needed by adaptively querying distances. The resulting randomized algorithm is obtained by a direct conversion of the computation problem to a multi-armed bandit statistical inference problem. In this work, we show that we can better exploit the structure of the underlying computation problem by modifying the traditional bandit sampling strategy and using it in conjunction with a suitably chosen multi-armed bandit algorithm. Four to five orders of magnitude gains over exact computation are obtained on real data, in terms of both number of distance computations needed and wall clock time. Theoretical results are obtained to quantify such gains in terms of data parameters. Our code is publicly available online at https://github.com/TavorB/Correlated-Sequential-Halving. △ Less

Submitted 4 November, 2019; v1 submitted 10 June, 2019; originally announced June 2019.

Comments: NeurIPS 2019
arXiv:1805.08321 [pdf, other]

cs.LG cs.DS cs.IT stat.CO stat.ML

Bandit-Based Monte Carlo Optimization for Nearest Neighbors

Authors: Vivek Bagaria, Tavor Z. Baharav, Govinda M. Kamath, David N. Tse

Abstract: The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits. We apply th… ▽ More The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits. We apply this technique to solve the problem of high-dimensional $k$-nearest neighbors, develo** an algorithm which we prove is able to identify exact nearest neighbors with high probability. We show that under regularity assumptions on a dataset of $n$ points in $d$-dimensional space, the complexity of our algorithm scales logarithmically with the dimension of the data as $O\left((n+d)\log^2 \left(\frac{nd}δ\right)\right)$ for error probability $δ$, rather than linearly as in exact computation requiring $O(nd)$. We corroborate our theoretical results with numerical simulations, showing that our algorithm outperforms both exact computation and state-of-the-art algorithms such as kGraph, NGT, and LSH on real datasets. △ Less

Submitted 28 April, 2021; v1 submitted 21 May, 2018; originally announced May 2018.

Comments: Accepted to the IEEE Journal on Selected Areas in Information Theory (JSAIT) - Special Issue on Sequential, Active, and Reinforcement Learning

Search v0.5.6 released 2020-02-24