NEC Research Institute Technical Report | 1995 |

**Abstract: **
Natural learners rarely have access to perfectly labeled data -
motivating the study of unsupervised learning in an attempt to
assign labels. An alternative viewpoint, which avoids the
issue of labels entirely, has as the learner's goal the
discovery of an effective metric with which similarity
judgments can be made. We refer to this paradigm as {\em
metric learning}. Effective classification, for example, then
becomes a consequence rather than the direct purpose of
learning.

Consider the following setting: a database made up of exactly one observation of each of many different objects. This paper shows that, under admittedly strong assumptions, there exists a natural prescription for metric learning in this data starved case.

Our outlook is stochastic, and the metric we learn is
represented by a joint probability density estimated
from the observed data. We derive a closed-form expression
for the value of this density starting from an explanation of
the data as a Gaussian Mixture. Our framework places two
known classification techniques of statistical pattern
recognition at opposite ends of a spectrum - and describes
new intermediate possibilities.
The notion of a *stochastic equivalence predicate* is introduced and
striking differences between its behavior and that of
conventional metrics are illuminated. As a result one of the
basic tenets of nearest-neighbor-based classification is
challenged.

**Keywords: **
Nearest Neighbor Search, Metric Learning, Normal/Gaussian Mixture
Densities, Unsupervised Learning, Neural Network, Encoder Network.

- View the complete paper (converted to HTML)
- The complete paper (PDF)
- The complete paper (PostScript)
- BibTex entry
- My homepage ( Peter N. Yianilos )
- See also:

This manuscript was completed during October 1995.