I was working on some data mining related code last week and ran into an old problem I’ve bumped up against many times in the past, namely, how to generate all combinations of a set of strings. Mathematical combinatorics was my primary area of study during my undergraduate days and I have my own library of code that I’ve created over the years. For example see http://msdn.microsoft.com/en-us/magazine/cc163957.aspx. However I decided to search the Internet to see what I’d turn up. I was somewhat surprised at a.) the limited amount of information available for such a common problem, and b.) the amount of information available which is just incorrect. One thing I immediately noticed is that many Internet posts confuse combinations and permutations. A permutation of a set of items is a rearrangement. For example, if a set contains {"Adam", "Bill", "Carl"} then one permutation is {"Bill", "Adam", "Carl}. There are a total of n! different permutations of a set of size n. Combinations on the other hand are subsets of size k of the original set where order does not matter. For example, with the above set, if k = 2, then all three possible combinations are {"Adam", "Bill"} {"Adam", "Carl"}, and {"Bill", "Carl"}. Anyway, generating all possible combinations would yield these 7 combinations:

(k = 1): {"Adam"}, {"Bill"}, {"Carl"}

(k = 2): {"Adam", "Bill"} {"Adam", "Carl"}, and {"Bill", "Carl"}

(k = 3): {"Adam", "Bill", "Carl"}

(k = 2): {"Adam", "Bill"} {"Adam", "Carl"}, and {"Bill", "Carl"}

(k = 3): {"Adam", "Bill", "Carl"}

This problem turns out to be somewhat surprisingly tricky. I’ll give some code in a future blog post. Combinatorics, which includes combinations and permutations, is one of the most fundamentally important areas of software testing.

Advertisements