Here’s an old problem. A mathematician is in a harem with 10 rooms. Each room has a different number of girls in it. The mathematician is blindfolded but has a magic parrot that knows probability. The mathematician needs to estimate the number of girls in each room. He can do so by using this algorithm:

start in any room loop max_iterations parrot counts number girls in room mathematician picks a random next room parrot peeks into selected next room if next room has more girls: parrot says move to next room else if next room has fewer girls: move to next room with small p else stay spend night in curr room and increment room count end-loop estimated dist of girls = room counts / iterations

In the demo program below, a harem is set up with 10 rooms where the first room has one girl, the second room has two girls, and so on. There are a total of 1 + 2 + 3 + . . + 10 = 55 girls in the harem. Therefore the true probability distribution to estimate is [1/55, 2/55, . . 10/55] = [0.0182, 0.0364, . . 0.1818].

The mathematician is estimating the probability distribution of girls in each room. There’s a ton of things going on here but the key idea is that it’s possible to estimate a probability distribution using comparisons of two states at a time. This is fascinating math related to Gibbs Sampling and Metropolis Sampling and Monte Carlo methods.

The clever part of the algorithm is when the parrot decides to tell the mathematician to move to a worse room (fewer girls). That happens with probability next_girls / curr_girls. For example, if the current room has 6 girls and the next room has 2 girls, the magic parrot will tell the mathematician to move to the worse room with p = 2 / 6 = 0.3333.

# harem_parrot.py # estimate a probability distribution import numpy as np def print_vec(v): for i in range(len(v)): print("%0.4f " % v[i], end="") print("") def main(): print("\nBegin sultan's harmem and magic parrot demo \n") np.random.seed(0) # set number girls in each room harem = np.array([1,2,3,4,5,6,7,8,9,10], dtype=np.float32) harem /= np.sum(harem) # a p-distribution counts = np.zeros(10, dtype=np.int) # nights spent each room max_iter = 10000 curr_room = 5 # start here for i in range(max_iter): curr_girls = harem[curr_room] # num girls curr room next_room = np.random.randint(low=0, high=10) next_girls = harem[next_room] if next_girls > curr_girls: # next room better; move curr_room = next_room else: # next room is worse but move sometimes t = next_girls / curr_girls # threshhold p = np.random.random() # [0.0, 1.0) if p < t: # accept and move to worse room curr_room = next_room else: # reject and stay in same room curr_room = curr_room # for readability counts[curr_room] += 1 # spend night in room est_dist = counts / max_iter # estimated distribution print("True and estimateds distribution of girls: \n") print_vec(harem) # the true distribution print_vec(est_dist) # the estimated print("\nEnd demo ") if __name__ == "__main__": main()

*I don’t know much about harems, but I doubt that Hollywood depictions resemble reality. From left, “Son of Sinbad” (1955) with Vincent Price and the interesting Lili St. Cyr, “Don Juan DeMarco” (1994) with Johnny Depp, “Lost in a Harem” (1944) with Abbott and Costello, and “Harem Scarem” (1965) with Elvis Presley.*