Recently I needed to repeatedly take many sets of data and generate counts of the number of data points which fall into different buckets. For example if I have as input a data set of test scores like { 90.0, 70.0, 75.0, 73.0, 85.0, 85.0, 65.0 } and I want frequency counts of the number of data points between 0-60, 60-70, 70-80, 80-90, and 90-100, I want the output to be [ 0, 1, 3, 2, 1 ] because there are 0 scores less than 60.0, 1 score between 60.0 and 70.0, and so on. Notice that dealing with the endpoints is not trivial. Anyway, I wrote a routine called CreaterFrequencyArray() in C# which can be called like so:

double[] values =

new double[] { 90.0, 70.0, 75.0, 73.0, 85.0, 85.0, 65.0 };

string[] labels = null;

int[] freq =

CreaterFrequencyArray(values, 60.0, 90.0, 10.0, out labels);

for (int i = 0; i < freq.Length; ++i)

{

Console.WriteLine(labels[i] + ” count = ” + freq[i]);

}

The resulting output looks like:

< 60.00 count = 0

>= 60.00 && < 70.00 count = 1

>= 70.00 && < 80.00 count = 3

>= 80.00 && < 90.00 count = 2

>= 90.00 count = 1

The CreateFrequencyArray() method accepts a source data array, a left endpoint (60.0 in the example), a right endpoint (90.0 in the example) and a width for each frequency bucket (10.0 in the example). The routine also creates a set of labels which describe each bucket (“< 60.0” and so on). The method was a bit trickier than I expected. Here is the code:

static int[] CreaterFrequencyArray(double[] sourceValues,

double leftEndPoint,

double rightEndPoint,

double bucketWidth,

out string[] labels)

{

// assumes width evenly divides |left – right|

// 1. create an array of end points

double x = Math.Abs(leftEndPoint – rightEndPoint);

double y = x / bucketWidth;

int numberEndPoints = (int)y + 1;

double[] endPoints = new double[numberEndPoints];

endPoints[0] = leftEndPoint; // seed first end point

for (int i = 1; i < numberEndPoints; ++i) {

endPoints[i] = endPoints[i – 1] + bucketWidth;

}

// 2. create an array of labels that correspond to the endpoints

labels = new string[numberEndPoints + 1];

labels[0] = “< ” + endPoints[0].ToString(“F2”);

for (int i = 1; i < labels.Length – 1; ++i) {

labels[i] = “>= ” + endPoints[i – 1].ToString(“F2″) +

” && < ” + endPoints[i].ToString(“F2”);

}

labels[labels.Length – 1] = “>= ” +

endPoints[endPoints.Length – 1].ToString(“F2”);

// 3. bucketize

int[] frequencies = new int[numberEndPoints + 1];

for (int i = 0; i < sourceValues.Length; ++i)

{

double currValue = sourceValues[i];

if (currValue < endPoints[0]) // first bucket

{

++frequencies[0];

}

else if (currValue >= endPoints[endPoints.Length-1]) // last bucket

{

++frequencies[frequencies.Length-1];

}

else // one of the middle buckets

{

for (int j = 0; j < endPoints.Length -1; ++j)

{

if (currValue >= endPoints[j] && currValue < endPoints[j + 1])

{

++frequencies[j + 1];

}

}

}

} // scan of sourceValues

return frequencies;

} // CreaterFrequencyArray()

This method is crude in the sense that it does no error-checking and requires the calling code to supply correct left endpoint, right endpoint, and bucket width parameters, but it’s working pretty well for me.

This frequency is missed in boundary testing also.

Normally for a 1 to 100 accepted value (supposedly) edit-box most try: -1,0,1,99,100,101.

Some try also other negative, non alpha numeric, decimal points etc.

Of course some try a value like 1 mil to try to crash the application for example.

But no one really tried more values in between like 11,22, 33,…

Would be better to explain with an actual context.

Sebi

Quite right. This was just an example without error-checking and one which targets only a very specific problem I was working on.