How Close are Two Strings? Part I

Suppose you have two strings. How can you write a function which returns a number which is a measure of how different the strings are? This distance function has a wide range of uses in software development and testing. One possible answer is to use what is called the Hamming distance. This is a standard topic in a Discrete Mathematics course, which is a part of a standard computer science degree program at most universities. The Hamming distance between two binary strings of equal length is just the number of positions in which the bits differ. For example, suppose:

A = 011010

B = 001011

The Hamming distance between these two binary strings is 2 because they differ at the 2nd and last positions. Interestingly, in the case of binary strings, the Hamming distance is just the "weight" (number of "1" bits) of the result of XORing the two strings. For the two binary strings above, A XOR B = 010001 and the weight of that result is 2. This gives a convenient way to compute the Hamming distance for binary strings. For character strings, we can generalize and just compute the number of positions where characters differ. So, if:

A = computer

B = colleges

the Hamming distance is 5. Well, this is great but unfortunately, the Hamming distance applies only to strings with equal length. I’ll describe how to deal with strings of unequal length in the next blog entry.

This entry was posted in Software Test Automation. Bookmark the permalink.