Princeton University Ph.D. Thesis | June 1997 |
Abstract: Motivated by the goal of establishing stochastic and information theoretic foundations for the study of intelligence and synthesis of intelligent machines, this thesis probes several topics relating to hidden state stochastic models.
Finite Growth Models (FGM) are introduced. These are nonnegative functionals that arise from parametrically-weighted directed acyclic graphs and a tuple observation that affects these weights. Using FGMs the parameters of a highly general form of stochastic transducer can be learned from examples, and the particular case of stochastic string edit distance is developed. Experiments are described that illustrate the application of learned string edit distance to the problem of recognizing a spoken word given a phonetic transcription of the acoustic signal. With FGMs one may direct learning by criteria beyond simple maximum-likelihood. The MAP (maximum a posteriori estimate) and MDL (minimum description length) are discussed along with the application to causal-context probability models and unnormalized noncausal models. The FGM framework, algorithms, and data structures describe hidden Markov models, stochastic context free grammars, and many other conventional similar models while providing a unified and natural way for computer scientists to learn and reason about them and their many variations. A software system and scripting language is proposed to serve as an assembly language or sorts for many higher level model types.
This thesis also illuminates certain fundamental aspects of the nature of normal (Gaussian) mixtures and the reparameterization of related optimization problems. The use of conditional normal mixtures is proposed as a tool for image modeling, and issues relating to the estimation of their parameters are discussed.
Keywords: Expectation Maximization (EM), Hidden Markov Model (HMM), Baum-Welch Reestimation (forward backward algorithm), Stochastic: Model, Grammar, Transducer, Transduction, Metric Learning, Maximum Likelihood (ML), Maximum a Posteriori Estimate (MAP), Minimum Description Length (MDL), String Edit Distance, String Similarity. Normal Mixtures, Gaussian Mixtures, Supervised Learning, Maximize mutual information (MMI), Levenshtein distance, stochastic transduction, pattern recognition, prototype dictionary, spelling correction, string correction, string similarity, string classification, speech recognition, pronunciation modeling, Switchboard corpus.