NEC Research Institute Technical Report | 1998 |
Abstract: The excluded middle vantage point forest is a practical new data structure that supports worst case sublinear time searches in a metric space for nearest neighbors within a fixed radius $\tau$ of arbitrary queries. Worst case performance depends on the dataset but is not affected by the distribution of queries.
Our analysis predicts vp-forest performance in simple settings such as $L_p$ spaces with uniform random datasets --- and experiments confirm these predictions. Another contribution of the analysis is a new perspective on the curse of dimensionality . In our idealized setting the dataset is organized into a forest of $O(N^{1-\rho})$ trees, each of depth $O(\log{N})$. Here $\rho$ may be viewed as depending on $\tau$, the distance function, and on the dataset. The radius of interest $\tau$ is an input to the organization process and the result is a linear space data structure specialized to answer queries within this distance. Searches then require $O(N^{1-\rho} \log{N})$ time, or $O(\log{N})$ time given $O(N^{1-\rho})$ processors.
We also discuss design variations including an approach to trading space for reductions in search time.
Keywords: Nearest neighbor search, Vantage point tree (vp-tree), Kd-tree, Computational geometry, Metric space.