NEC Research Institute Technical Report 1998

Excluded Middle Vantage Point Forests for Nearest Neighbor Search

Peter N. Yianilos

Abstract: The excluded middle vantage point forest is a practical new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a fixed radius $\tau$ of arbitrary queries. Worst case performance depends on the dataset but is not affected by the distribution of queries.

Our analysis predicts vp-forest performance in simple settings such as $L_p$ spaces with uniform random datasets --- and experiments confirm these predictions. Another contribution of the analysis is a new perspective on the curse of dimensionality . In our idealized setting the dataset is organized into a forest of $O(N^{1-\rho})$ trees, each of depth $O(\log{N})$. Here $\rho$ may be viewed as depending on $\tau$, the distance function, and on the dataset. The radius of interest $\tau$ is an input to the organization process and the result is a linear space data structure specialized to answer queries within this distance. Searches then require $O(N^{1-\rho} \log{N})$ time, or $O(\log{N})$ time given $O(N^{1-\rho})$ processors.

We also discuss design variations including an approach to trading space for reductions in search time.

Keywords: Nearest neighbor search, Vantage point tree (vp-tree), Kd-tree, Computational geometry, Metric space.


This work was largely done during 1996-1997 and was presented as part of the CRP (Computing Research at Princeton) talk series during the Summer of 1997.