## Estimating the Probability your Project will Finish on Time

I think most software developers, testers, and managers should have a basic understanding of estimating the probability that a project will finish on time (or finish behind schedule). The technique is fairly simple. First you break your project down into manageable sized chunks. At a coarse level of granularity these chunks can be milestones (typically measured in weeks or months), or at a fine level of granularity these chunks can be work packages (typically lasting from 4 to 40 hours) that are derived from a Work Breakdown Structure. Next for each chuck you estimate how long it will take, using an optimistic guess, a pessimistic guess, and a most likely guess. Of course this is the hard part, and you have to rely on historical data from similar projects, expert judgment, or some other method. Now for each chunk you compute the duration mean and variance. How you do this depends on which probability distribution you use, but the beta distribution (along with the triangular) is the most common. The mean for a beta distribution is the quantity of (optimistic, plus 4 times most likely, plus pessimistic), all divided by 6. The variance for beta is simply the square of (the quantity of pessimistic minus optimistic divided by 6). Now you compute the sum of the means and the sum of the variances. With these you can compute a Z score as (X – M) / (sqrt(sum of variances)) and use the Normal distribution to compute your probabilities. This sounds a lot worse than it is. Here’s a highly simplified example. You have three chunks, A, B, C. The means are 4.0, 5.0, and 8.0 (arbitrary units) respectively. The variances are 4.0, 9.0, and 36.0 repectively. The sum of the means is 17.0. The sqrt of the sum of the variances is sqrt(4 + 9 + 36) = sqrt(49) = 7.0. You want to know the probability that your project will take between 17.0 days (the mean) and 27.5 days. Z = (27.5 – 17.0) / 7.0 = 10.5 / 7.0 = 1.50. Looking up this  value in a Standard Normal Distribution  table you get probability = 0.4332 or 43%. As with any quantitative technique, a.) your result is just a crude estimate, b.) because in most cases even a crude estimate is better than no estimate, c.) your final estimate is only as good as your input data, and d.) the most important value from such an analysis come from setting the problem p, not the final answer.