Emory University M.S. Thesis (mathematics) June 1978

The Definition, Computation, and Application of Symbol String Similarity Functions

Peter N. Yianilos

Abstract: In Computer Science the term "pattern match" has come to mean some measure of the similarity between two things. This is in general a difficult concept to treat, but in certain restrictive settings, one can produce algorithms that agree well with human intuition while remaining mathematically rigorous and well defined. This paper presents a general theoretical framework which is then used to address the problem of "matching" simple strings of symbols. Several other problems become accessible once the concept of match between strings is understood.

One of the results presented is a simple efficient algorithm for matching strings. The characteristics of this algorithm are investigated and the algorithm is evaluated in terms of computational efficiency.

The algorithm is easily computed, compares well with intuition and approximates well the other such algorithm found in the literature, but has a number of major advantages that are discussed in the paper.

Acknowledgments: I would like to thank Dr. Kenneth Mandelberg, Dr. John Kevin Doyle and Bob Harbort for the help and suggestions. I am also indebted to the Emory University Computing Center for the use of their excellent facilities including their UNIVAC 90/80 computer.