I ran into a problem that, although I think I solved it, I’m not sure. The problem was to generate data between some min value and some max value in such a way that the data roughly follows an exponential probability distribution. Huh? Imagine you want to generate data that follows the Gaussian bell-shaped curve. This is a well-known problem and you can do this easily by using the Box-Muller algorithm combined with a simple transform. But I searched the Internet and could find no examples of generating data that follows an exponential probability distribution curve. For example, if min = 100 and max = 900, and you generate 1,000 data items, most of the 1,000 items (about 500 of them) should be between 100 and 200, fewer (about 250) would be between 200 and 300, and so on until there would only be a handful of, or zero, data items between 800 and 900.

So I wrote some C# code that does this. I think. It seems to work but I’m really not too confident that it is theoretically sound.

Here’s some non-OOP style C# code that generates 1,000 exponentially distributed values between 100.0 and 900.0:

double minVal = 100.0; double maxVal = 900.0; Random r = new Random(0); int numValues = 1000; int numGenerated = 0; double lambda = 1.0; while (numGenerated < numValues) { double u = r.NextDouble(); double t = -Math.Log(u) / lambda; double increment = (maxVal - minVal) / 6.0; double result = minVal + (t * increment); if (result < maxVal) // winner { Console.WriteLine("next data item = " + result.ToString("F3")); ++numGenerated; } }

I use the built-in .NET random number generator to get a uniform random value between 0.0 and 1.0. Next I use the log() function and divide by lambda = 1.0 to get a random value heavily biased to smaller values. The choice of lambda was guesswork on my part. With lambda = 1.0, an exponential distribution will have mean = 1 / lambda = 1.0 and standard deviation also equal to 1.0. So, 99.9% plus of the values will be between 0 and 6. The idea is to take the minVal and then add some increment in a way that most of the time the increment is small. The increment I get by dividing by the range of desired values by 6.0 — because most of the data will be less than 6 standard deviations away from the mean. Because the result can be arbitrarily large (but with low probability), I check to see if the generated value is greater than maxVal, and if so, just try again.

Here’s essentially the same C# code but in OOP style:

public class ExponentialData { private Random r; private double minValue; private double maxValue; public ExponentialData(double minValue, double maxValue, int seed) { r = new Random(seed); this.minValue = minValue; this.maxValue = maxValue; } public double NextData() { double lambda = 1.0; double result = this.maxValue; do { double u = r.NextDouble(); double t = -Math.Log(u) / lambda; double increment = (maxValue - minValue) / 6.0; result = minValue + (t * increment); } while (result >= this.maxValue); return result; } }

This code could be called along the lines of:

double minVal = 100.0; double maxVal = 900.0; ExponentialData ed = new ExponentialData(minVal, maxVal, 0); int numValues = 1000; for (int i = 0; i < numValues; ++i) { double d = ed.NextData(); // count of values between 100-200, etc. } // etc.

The strange part about all of this is that I couldn’t find any examples of this problem — either the problem or a code solution — anywhere. This leads me to believe that generating data that follows an exponential probability distribution just isn’t done very often or at all. My hunch is that the exponential distribution is used just in classical Poisson type scenarios where there are things arriving (cars to a drive-through, users to a Web site, etc.) and the minimum value (time between arrivals) is always 0. Conceptually confusing to me.