IEEE Symposium on Information Theory | 1998 |
Abstract: Expectation maximization (EM) provides a simple and elegant approach to the problem of optimizing the parameters of a normal mixture on an unlabeled dataset. To accomplish this, EM iteratively reweights the elements of the dataset until a locally optimal normal mixture is obtained. This paper explores the intriguing question of whether such an EM-style algorithm exists for the related and apparently more difficult problem of finding a normal mixture that maximizes the a posteriori class probabilities of a labeled dataset.
We expose a fundamental degeneracy in the relationship between a normal mixture and the a posteriori class probability functions that it induces, and use this degeneracy to prove that reweighting a dataset can almost always give rise to a normal mixture that exhibits any desired class function behavior. This establishes that EM-style approaches are sufficiently expressive for a posteriori optimization problems and opens the way to the design of new algorithms for them.
Keywords: Expectation Maximization (EM), Normal Mixtures, Gaussian Mixtures, Supervised Learning, Maximize mutual information (MMI).