Rule Set Extraction

Recently I’ve been looking at the problem of rule set extraction. Actually, to be more accurate, I’ve been looking at a particular class of problems and I’m not exactly sure what these problems are called. Suppose you have a set of categorical data like this:
(Red, Small, Hot) -> c0
(Red, Small, Cold) -> c0
(Blue, Medium, Hot) -> c1
(Green, Large, Cold) -> c1
(Yellow, Large, Warm) -> c2
(Blue, Small, Hot) -> c2
The first tuple, or itemset, means that there is something which has an attribute of color = red, size = small, temperature = hot, and is assigned to category, or cluster c0. The problem I’m looking at is how to programmatically extract a set of rules from this data. For example, a human might conculde that:
if (color = Red) then cluster = c0
else if (size = Medium) then cluster = c1
else if (color = Gren) then cluster = c1
else cluster = c2
I’ve been looking for existing work in this area, but haven’t found anything that matches this particular problem. The closet area I’ve found was pointed out to me by a colleague in Microsoft Research. That area is generally called Association Rules. However, Association Rules are slighty different because they look at all the associations within tuple attrbute values rather than the associations between the first n-1 attribute values, and some cluster or category. Anyway, it’s a very interesting problem.
This entry was posted in Software Test Automation. Bookmark the permalink.