I wrote a paper this week for the 18th Annual Software Engineering and Data Engineering Conference. The title is, "An Empirical Study of the Effectiveness of Partial Antirandom Testing." Most testers are familiar with random testing where you generate random data and send it to the system under test. In most situations, because here is no explicit expected value or state, the goal of random testing is to try and produce and hang crash, or failure in the SUT. The field of hardware testing came up with the concept of antirandom testing. Antirandom testing is a variant of random testing where a set of test case inputs is generated in such a way that each new test input added to the test set is the test input which is the most different from the test inputs currently in the test set. The concept is best explained by example. Suppose that a system has a parameter which accepts 3-bit values and the test effort wishes to generate a test set which contains four antirandom inputs. The complete input domain is:

0: 000

1: 001

2: 010

3: 011

4: 100

5: 101

6: 110

7: 111

1: 001

2: 010

3: 011

4: 100

5: 101

6: 110

7: 111

The initial antirandom test set is empty and value {0,0,0} can be arbitrarily selected from the input domain and added to the test set. The next input to be added to the antirandom test set is the value from the domain space which is most different from the current inputs in the test set. In this case input {1,1,1} is the most different from {0,0,0} using virtually any rational definition of different so the antirandom test set now contains {0,0,0} and {1,1,1}. Similarly, assume that value {0,0,1} is selected from the input domain using the algorithm described below and added to the antirandom test set so the set contains {0,0,0}, {1,1,1}, and {0,0,1}. Any of the remaining five values in the input domain are candidates for the fourth and final member of the test set, however exactly which value is chosen depends upon the difference function used. One obvious candidate function is the Hamming distance. Another possibility is the Cartesian distance; suppose this is used as the difference function. The sum of the Cartesian differences between each candidate and the three current members of the antirandom test set are:

010: sqrt(1) + sqrt(2) + sqrt(2) = 3.82

011: sqrt(2) + sqrt(1) + sqrt(1) = 3.41

100: sqrt(1) + sqrt(2) + sqrt(2) = 3.82

101: sqrt(2) + sqrt(1) + sqrt(1) = 3.41

110: sqrt(2) + sqrt(1) + sqrt(3) = 4.14

011: sqrt(2) + sqrt(1) + sqrt(1) = 3.41

100: sqrt(1) + sqrt(2) + sqrt(2) = 3.82

101: sqrt(2) + sqrt(1) + sqrt(1) = 3.41

110: sqrt(2) + sqrt(1) + sqrt(3) = 4.14

Because {1,1,0} is the most different from the existing members of the antirandom test set, it is added to the test set. The fundamental basis of antirandom testing is the premise that similar inputs will likely yield similar information about the system under test, and therefore random inputs which are as different as possible will yield more information than arbitrary random inputs. The example above points out that pure antirandom test input generation requires a complete enumeration of the entire input domain. This may be possible for certain types of hardware testing with a relatively small input byte size or for software systems with a small input domain. However, pure antirandom testing is rarely possible for complex software systems. Anyway, I came up with an approach I call partial antirandom testing which can be used to test software systems.