Analyzing a Sequence for Randomness

Suppose you see the following sequence of males and females entering a store:

F F M M M F F F F F F F

Is the pattern random or not? A standard classical technique to analyze a pattern for randomness is to use the Wald-Wolfowitz Runs test. In this example, the observed number of runs in the sequence is 3 and the theoretical number of runs you’d see is 5.5 if the pattern is in fact random. However, the WW Runs test is limited to problems with just two types of items. So, you couldn’t analyze a pattern like:

A A B C C A B B B A

When I need to analyze a pattern for randomness, I usually write a simulation program that calculates how likely an observed number of runs is if the pattern is random.

There are two main steps. The first step takes the sequence to analyze, finds the number of each symbol in the sequence and uses that information to find the expected number of runs for a random pattern.

The second step repeatedly generates random sequences with the specified number of symbols, calculates the number of runs, and counts how many times the calculated number of runs is less than or equal to the observed number of runs in the target sequence.

The moral is that classical math techniques are often beautiful and elegant, but sometimes brute force via a simulation is a good approach.

sequenceanalysis

This entry was posted in Machine Learning. Bookmark the permalink.