The Difference Between a Probability Density Function Value and Area

I was giving a lecture at the tech company I work for and there was a question from one of the attendees about the probability density function (PDF) for a Gaussian (aka Normal, bell-shaped) distribution. Briefly, the area under the PDF between two x values is the probability that a randomly generated x will be between those two values. For example, for a Gaussian with mean = 0 and standard deviation = 1, the probability that a randomly generated x is between 0.0 and 1.0 is the area under the curve between 0.0 and 1.0 which is approximately 0.3413.

The PDF value at x = 1.0 is approximately 0.2420. A PDF value can be used to compare the relative likelihoods of two different x values. For example, the PDF at x = 2.0 is about 0.0540 so getting x = 1.0 is more likely than getting x = 2.0. PDF values are not probabilities.

The total area under a Gaussian distribution is 1.0 but a PDF value can be greater than 1.0 if the distribution is squished, meaning it has a very small standard deviation.

In machine learning, probably the most common task related to probability distributions is to generate x values from a Gaussian distribution. Computing a PDF value is less common and can be easily done using a program-defined function or the scipy norm.pdf() function. To compute the area under the curve between two values (that is, the probability x is between two values), you can use the scipy norm.cdf() function (cumulative density function).

The Gaussian distribution is also known as the Normal distribution because, well, it’s mathematically normal. Two un-normal math photos. Left: Teaching students about angles at a U.S. high school. Explains a lot. Right: The concept of infinity that’s not so infinite.

Demo code:


import numpy as np
from scipy.stats import norm

def my_pdf(x, u, sd):
  a = np.exp(-(u - sd)**2 / 2)
  b = np.sqrt(2 * np.pi)
  return a / b

print("\nBegin Gaussian pdf() demo ")

print("\nSampling 5 values from N(0,1) ")
for i in range(5):
  x = np.random.normal(loc=0.0, scale=1.0)
  print("x = %8.4f " % x)

print("\nComputing pdf() for x = 1.0 ")
y = norm.pdf(x=1.0, loc=0.0, scale=1.0)
print("%8.4f " % y)

y = my_pdf(x=1.0, u=0.0, sd=1.0)
print("%8.4f " % y)

print("\nEnd demo ")
This entry was posted in Machine Learning. Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s