The Sultan’s Harem, Magic Parrot, and the Mathematician

Here’s an old problem. A mathematician is in a harem with 10 rooms. Each room has a different number of girls in it. The mathematician is blindfolded but has a magic parrot that knows probability. The mathematician needs to estimate the number of girls in each room. He can do so by using this algorithm:

start in any room
loop max_iterations
  parrot counts number girls in room
  mathematician picks a random next room
  parrot peeks into selected next room

  if next room has more girls:
    parrot says move to next room
  else if next room has fewer girls:
    move to next room with small p else stay

  spend night in curr room and increment room count

estimated dist of girls = room counts / iterations

In the demo program below, a harem is set up with 10 rooms where the first room has one girl, the second room has two girls, and so on. There are a total of 1 + 2 + 3 + . . + 10 = 55 girls in the harem. Therefore the true probability distribution to estimate is [1/55, 2/55, . . 10/55] = [0.0182, 0.0364, . . 0.1818].

The mathematician is estimating the probability distribution of girls in each room. There’s a ton of things going on here but the key idea is that it’s possible to estimate a probability distribution using comparisons of two states at a time. This is fascinating math related to Gibbs Sampling and Metropolis Sampling and Monte Carlo methods.

The clever part of the algorithm is when the parrot decides to tell the mathematician to move to a worse room (fewer girls). That happens with probability next_girls / curr_girls. For example, if the current room has 6 girls and the next room has 2 girls, the magic parrot will tell the mathematician to move to the worse room with p = 2 / 6 = 0.3333.

# estimate a probability distribution

import numpy as np

def print_vec(v):
  for i in range(len(v)):
    print("%0.4f " % v[i], end="")

def main():
  print("\nBegin sultan's harmem and magic parrot demo \n")

  # set number  girls in each room
  harem = np.array([1,2,3,4,5,6,7,8,9,10], dtype=np.float32)
  harem /= np.sum(harem)  # a p-distribution

  counts = np.zeros(10,  # nights spent each room
  max_iter = 10000
  curr_room = 5  # start here

  for i in range(max_iter):
    curr_girls = harem[curr_room]  # num girls curr room
    next_room = np.random.randint(low=0, high=10)
    next_girls = harem[next_room]

    if next_girls > curr_girls:  # next room better; move
      curr_room = next_room
    else:   # next room is worse but move sometimes
      t = next_girls / curr_girls   # threshhold
      p = np.random.random() # [0.0, 1.0)
      if p < t:  # accept and move to worse room
        curr_room = next_room
      else:  # reject and stay in same room
        curr_room = curr_room  # for readability

    counts[curr_room] += 1  # spend night in room

  est_dist = counts / max_iter  # estimated distribution

  print("True and estimateds distribution of girls: \n")
  print_vec(harem)        # the true distribution
  print_vec(est_dist)     # the estimated

  print("\nEnd demo ")

if __name__ == "__main__":

I don’t know much about harems, but I doubt that Hollywood depictions resemble reality. From left, “Son of Sinbad” (1955) with Vincent Price and the interesting Lili St. Cyr, “Don Juan DeMarco” (1994) with Johnny Depp, “Lost in a Harem” (1944) with Abbott and Costello, and “Harem Scarem” (1965) with Elvis Presley.

Posted in Miscellaneous | Leave a comment

Displaying an MNIST Digit Yet Once Again

It seems like there are some mini-problems that I do over and over. Displaying an MNIST digit is one such mini-problem. I’m in the process of writing an article for Visual Studio Magazine ( ) and the topic is MNIST image classification.

An article about MNIST needs to explain what MNIST is, and that means showing an MNIST digit. MNIST (“modified National Institute of Standards and Technology”) is a collection of 70,000 images of handwritten digits. Each digit is 28×28 pixels, and each pixel value is between 0 and 255.

As is usually the case, a picture is worth a thousand words, so I wrote a little Python program that loads the Keras library pre-installed MNIST images into memory and then displays the first training image. I show the image both in terms of raw values (expressed in hex to make the output nicer) and in terms of a grayscale image.

# load built-in Keras MNIST dataset
# display (first training) image

import numpy as np
import keras as K
import matplotlib.pyplot as plt

def main():
  print("\nBegin show MNIST image \n")

  (train_x, train_y), \
  (test_x, test_y) = K.datasets.mnist.load_data()

  # print(train_x.shape)  # (60000, 28, 28)
  # print(train_y.shape)  # (60000,)

  d = train_x[0]
  d = d.reshape(28,28)
  for row in range(0,28):
    for col in range(0,28):
      print("%02X " % d[row][col], end="")

  lbl = train_y[0]
  print("\ndigit = ", lbl)

  plt.imshow(d, cmap=plt.get_cmap('gray_r'))  

  print("\nEnd \n")

if __name__ == "__main__":

I don’t think there’s a moral to the story. Displaying an MNIST digit is a fun little challenge but after the first 20 or so times the novelty wears off a bit. However, repetition is important for anyone who writes code — if you don’t practice continuously, you’ll lose your coding skills quickly.

Posted in Keras, Machine Learning | Leave a comment

My Top Ten Favorite Science Fiction Movies Set on an Island

I enjoy science fiction movies that are set in an isolated environment, such as an Arctic outpost, because the confined location forces writers and directors and actors to be creative. Here are my favorite science fiction films that take place on an island.

1. Attack of the Crab Monsters (1957) – A scientific expedition goes to a remote Pacific island to find out what happened to a previous expedition. Giant crabs, but with the twist that when they eat someone they absorb their intelligence. A low-budget but surprisingly effective movie.

2. King Kong (1933) – The original 1933 version is the best. You know the story of course. The special effects by Willis O’Brien were absolutely brilliant for the time in which the movie was made.

3. Island of Terror (1966) – Genetic experiments on a remote British island. What could go wrong? Giant man-eating amoebas based on silicon, that’s what could go wrong. The great actor Peter Cushing starred.

4. No Escape (1994) – In the year 2022, the penal system is run by corporations. Unjustly imprisoned ex-Marine John Robbins (played by Ray Liotta) ends up on a island for the most dangerous inmates. Lots of action.

5. Dagon (2001) – Businessman Paul Marsh, his girlfriend, and two other friend are shipwrecked on Imboca, a remote island off the coast of Spain. The islanders worship Dagon, who demands human sacrifices. The clever (but not-so-happy) ending surprised me.

6. Grabbers (2012) – The inhabitants of a small island off the coast of Ireland is attacked by bloodsucking tentacled aliens. This movie didn’t get very good reviews but I enjoyed it.

7. Mysterious Island (1961) – This movie is based on the novel of the same name by Jules Verne. There have been many adaptations, but this one with special effects by Ray Harryhausen is my favorite.

8. The Killer Shrews (1959) – A scientist is performing experiments on a remote island, but unintentionally creates dog-sized shrews. I loved how the people on the isalnd get inside big cans, rope them together as sort of a homemade walking tank, so they can creep down to their boat.

9. Island of Lost Souls (1932) – The brilliant actor Charles Laughton plays a mad scientist who experiments with human and animals. I liked the panther woman and won’t ever forget The House of Pain.

10. The Island (2005) – In a dystopian near future, Lincoln Six Echo and Jordan Two Delta (Ewan McGregor and Scarlett Johansson) are selected to go to an isalnd where everything is better. Except for the organ harvesting.

Bonus Picks

11. Voodoo Island (1957) – The island in question has three, count ’em three, different kinds of man-eating plants. Boris Karloff starred and I think this is one of his best performances.

12. Matango (1963) – A group of castaways on an island who are unwittingly altered by a local species of mutating mushrooms. Japanese film directed by the great Ishiro Honda who also directed Godzilla and many other films.

13. From Hell It Came (1957) – This is the best film ever that features an evil walking tree on a South Pacific Island. Well, OK, it’s the only such film. Realistically this is a pretty bad movie but it has a certain charm.

14. Journey 2: The Mysterious Island (2012) – Starring Dwayne The Rock Johnson. This is sort-of a sequel to “Journey to the Center of the Earth” (2008). A group of characters look for a boy’s lost grandfather on an island that turns out to be Atlantis.

Posted in Top Ten | Leave a comment

Running the ML.NET Quick Start Tutorial

Before I write anything else, let me say Bravo! At last somebody (or group of people) created a quick start for a new technology, and the quick start is perfect — straight to the point and it works first time.

OK, so what exactly impressed me? I took a look at the brand new ML.NET which is a machine learning code library for software developers who use .NET technologies. The ML.NET library is based on an internal Microsoft library named TLC. TLC has been around for years (the current version inside Microsoft is 3.9) a TLC code is used in many existing Microsoft products and services.

I decided I’d take a look at the early ML.NET documentation at Most documentation is horrible so I was mentally prepared for a bad experience, but as I mentioned, the documentation was excellent.

The quick start uses the new .NET Core which is a software ecosystem similar to the .NET Framework, but .NET Core is ideal for console (shell) applications including ML.NET applications. First, I downloaded and installed .NET Core onto my machine.

Next I opened a command shell and created a new console application.

Next, I created the data file.

Next, I copy-pasted the ML.NET C# program.

And last, I ran the program.

OK, so there’s a lot I don’t understand yet, but the point of a quick start is to just get started. The rest is relatively easy.

The start of the 1966 Le Mans race where drivers sprint to their cars. Ford GT40s took first, second, and third places, ending five years of Ferrari wins. The golden age of motor sports.

Posted in Machine Learning | Leave a comment

Accuracy, Precision, Recall, and F1 Score

If you have a binary classification problem, four fundamental metrics are accuracy, precision, recall, and F1 score. They’re best explained by example. Suppose the problem is to predict if a sports team will win or lose. There are four possible scenarios:

1. you predict the team will win and they do ("true positive")
2. you predict the team will win but they don't ("false positive")
3. you predict the team will lose and they do ("true negative")
4. you predict the team will lose but they don't ("false negative")

Suppose you make 100 predictions for different games and your results are:

TP = 40 (correctly predicted a win)
FP = 20 (incorrectly predict a win)
TN = 30 (correctly predicted a loss)
FN = 10 (incorrectly predict a loss)

The four metrics are:

1. accuracy = num correct / (num correct + num wrong)
            = (TP + TN) / (TP + FP + TN + FN)
            = 70 / 100
            = 0.70

2. precision = TP / (TP + FP)
             = 40 / (40 + 20)
             = 40 / 60
             = 0.67

3. recall = TP / (TP + FN)
          = 40 / (40 + 10)
          = 40 / 50
          = 0.80

4. F1 score = 1 / [ ((1 / 0.67) + (1 / 0.80)) / 2 ]
            = 1 / [ (1.50 + 1.25) / 2 ]
            = 1 / (2.75 / 2)
            = 1 / 1.375
            = 0.73

Accuracy is intuitive, and in my opinion, the single most important metric. Precision and recall are very difficult for me to interpret intuitively, so I just think of them only as metrics where higher values are better. As precision increases, recall must decrease, and vice versa. The F1 score is the harmonic average of precision and recall, the idea being that it gives you a single combined metric. Therefore, for F1 scores, larger values are better. Notice that the F1 score of 0.73 is between the precision (0.67) and recall (0.80). You could use a regular average instead of a harmonic average, but because precision and recall are both proportions, a harmonic average in more principled.

The movie “Total Recall” (1990) starring Arnold Schwarzenegger and Sharon Stone, had fantastic special effects for the time in which the movie was made. But the plot had me very confused — I never really knew exactly who was good and who was bad, even at the end of the movie. I don’t like ambiguous movie endings. The remake in 2012 was just plain bad, bad, bad.

Posted in Machine Learning | Leave a comment

The Keras MNIST Example using Model Instead of Sequential

Just for fun, I decided to code up the classic MNIST image recognition example using Keras. I started by doing an Internet search. I found the EXACT same code repeated over and over by multiple people. The original code comes from the Keras documentation. I was stunned that nobody made even the slightest effort to add something new.

So, I figured I’d refactor the code to use the Model() approach rather than the Sequential() approach. The Sequential() approach creates a model like this:

model = Sequential()
model = model.add(Conv2D(32, input_shape=(28,28,1)))
etc, etc, for 8 layers

The Model() approach look like:

X = Input(shape=(28,28,1))
layer1 = Conv2D(32)(X)
layer2 = Conv@d(64)(layer1)
etc, etc
model = Model(X, layer8)

The exercise allowed me get insights into exactly how CNN image classification works using Keras. Like everyone I know, I learn by starting with some working code, often from documentation, but then the key is to experiment with the code.


from __future__ import print_function
import keras
import numpy as np
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

bat_size = 128
epochs = 3

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 28, 28, 1)
x_test = x_test.reshape(10000, 28, 28, 1)

x_train = x_train.astype(np.float32)
x_test = x_test.astype(np.float32)
x_train /= 255
x_test /= 255

y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# model = Sequential()
# model.add(Conv2D(32, kernel_size=(3, 3),
#                  activation='relu',
#                  input_shape=input_shape))
# model.add(Conv2D(64, (3, 3), activation='relu'))
# model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
# model.add(Flatten())
# model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.5))
# model.add(Dense(num_classes, activation='softmax'))

X = keras.layers.Input(shape=(28,28,1))
layer1 = Conv2D(filters=32, kernel_size=(3, 3),
  activation='relu', padding='valid')(X)
layer2 = Conv2D(filters=64, kernel_size=(3, 3),
layer3 = MaxPooling2D(pool_size=(2, 2))(layer2)
layer4 = Dropout(0.25)(layer3)
layer5 = Flatten()(layer4)
layer6 = Dense(128, activation='relu')(layer5)
layer7 = Dropout(0.5)(layer6)
layer8 = Dense(10, activation='softmax')(layer7)

model = keras.models.Model(X, layer8)

  metrics=['accuracy']), y_train, batch_size=bat_size,
  epochs=epochs, verbose=1)

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Hogwart’s school made entirely from matchsticks.

Posted in Keras, Machine Learning | 1 Comment

Introduction to DNN Image Classification Using CNTK

I wrote an article titled “Introduction to DNN Image Classification Using CNTK” in the July 2018 issue of Microsoft MSDN Magazine. See

Image classification is a standard problem in machine learning. If you have an image (typically a photograph), the goal is the image classify it as, for example, “dog”, “cat”, or “squirrel”.

The standard way to perform image classification is to use an exotic type of neural network called a convolutional neural network (CNN). But until just a few years ago, the most common approach was to use a standard deep neural network (DNN). In my article I showed how to use the older DNN approach.

For my example, I used the well-known MNIST (modified National Institute of Standard and Technology) dataset. It consists of 60,000 training images and 10,000 test images. Each image is a picture of a handwritten digit, ‘0’ through ‘9’. Each image is small — 28×28 grayscale pixels (each pixel value is 0 to 255). So the goal is to accept a 28×28 image and classify it as a “zero” or a “one” or a “two”, etc.

Instead of creating a custom image classification system using a DNN or a CNN, you can use pre-trained image models from companies such as Google or Microsoft.

Using a generic image classification model may not give you optimal results. Google found this out when their image classification system was classifying some people as gorillas. According to Wired Magazine, Google’s image search function now refuses to return any results for “gorilla” or “monkey”.

Posted in CNTK, Machine Learning | Leave a comment