AMS-DIMACS Book Chapter (to appear) and Proc. 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) 2000

Locally Lifting the Curse of Dimensionality for Nearest Neighbor Search

Peter N. Yianilos

Abstract: Our work gives a positive result for nearest neighbor search in high dimensions. It establishes that radius-limited search is, under particular circumstances, free of the {\em curse of dimensionality}. It further illuminates the nature of the curse, and may therefore someday contribute to improved general purpose algorithms for high dimensions and for general metric spaces.

We consider the problem of nearest neighbor search in the Euclidean hypercube $[-1,+1]^d$ with uniform distributions, and the additional natural assumption that the nearest neighbor is located within a constant fraction $R$ of the maximum interpoint distance in this space, i.e. within distance $2 R \sqrt{d}$ of the query.

We introduce the idea of {\em aggressive pruning} and give a family of practical algorithms, an idealized analysis, and describe experiments. Our main result is that search complexity measured in terms of $d$-dimensional inner product operations, is i) strongly sublinear with respect to the data set size $n$ for moderate $R$, ii) asymptotically, and as a practical matter, independent of dimension.

Given a random data set, a random query within distance $2 R \sqrt{d}$ of some database element, and a randomly constructed data structure, the search succeeds with a specified probability, which is a parameter of the search algorithm. On average a search performs $\approx n^\rho$ distance computations where $n$ is the number of of points in the database, and $\rho < 1$ is calculated in our analysis. Linear and near-linear space structures are described, and our algorithms and analysis are free of large hidden constants, i.e. the algorithms perform far less work than exhaustive search -- both in theory and practice.

Keywords: Nearest neighbor search, Vantage point tree (vp-tree), Kd-tree, Computational geometry, Metric space.