NEC Research Institute Technical Report 1995

Metric Learning via Normal Mixtures

Peter N. Yianilos

Abstract: Natural learners rarely have access to perfectly labeled data - motivating the study of unsupervised learning in an attempt to assign labels. An alternative viewpoint, which avoids the issue of labels entirely, has as the learner's goal the discovery of an effective metric with which similarity judgments can be made. We refer to this paradigm as {\em metric learning}. Effective classification, for example, then becomes a consequence rather than the direct purpose of learning.

Consider the following setting: a database made up of exactly one observation of each of many different objects. This paper shows that, under admittedly strong assumptions, there exists a natural prescription for metric learning in this data starved case.

Our outlook is stochastic, and the metric we learn is represented by a joint probability density estimated from the observed data. We derive a closed-form expression for the value of this density starting from an explanation of the data as a Gaussian Mixture. Our framework places two known classification techniques of statistical pattern recognition at opposite ends of a spectrum - and describes new intermediate possibilities. The notion of a stochastic equivalence predicate is introduced and striking differences between its behavior and that of conventional metrics are illuminated. As a result one of the basic tenets of nearest-neighbor-based classification is challenged.

Keywords: Nearest Neighbor Search, Metric Learning, Normal/Gaussian Mixture Densities, Unsupervised Learning, Neural Network, Encoder Network.


This manuscript was completed during October 1995.