My Top 10 Favorite Tintin Adventures

I’m a huge fan of the 24 Tintin cartoon albums by Belgian artist Georges Remi, who used the name “Herge”. I vividly remember buying my first book when I was a very young man. My father had loaded us into the family station wagon and we stopped at the Sav-On drug store to buy supplies for a fairly long drive. I was fascinated by the cover art and bought “King Ottokar’s Sceptre”. From the very first page I was absolutely mesmerized.

Here are my top 10 favorite Tintin adventures. (And I refuse to acknowledge the dreadful 2011 film “The Adventures of Tintin”. Ugh. What a huge disappointment.)

1. King Ottokar’s Sceptre (1939). The young Belgian reporter and his dog Snowy travel to Syldavia where they foil a plot to overthrow the monarchy that involved an ancient Sceptre.

2. The Crab with the Golden Claws (1941) – Tintin solves a mystery involving opium smuggling and a brand of tinned crab meat. The action takes place on the high seas and in the Sahara Desert.

3. The Cigars of the Pharaoh (1934) – Another drug smuggling mystery, this time in Egypt. And this time the opium is being smuggled in cigars.

4. The Blue Lotus (1936) – This continues the Cigars drug smuggling story. Tintin goes to China during the 1931 invasion by Japan and battles spies in addition to the smugglers.

5. Destination Moon (1953) – Tintin, Snowy, Captain Haddock, Professor Calculus, Thomson & Thompson, Frank Wolff, and a stowaway travel to the moon. Remarkably accurate depiction of the first moon flights in the late 1960s.

6. Explorers on the Moon (1954) – This story picks up with Tintin and six others on the moon. There’s a stowaway and one of the crew is an enemy spy. Who is it? Will the adventurers make it back to earth?

7. The Calculus Affair (1956) – The plot involves enemy spies and the plans for a secret ultrasonic weapon. The story echoes the cold war at the time between the Soviets and the West.

8. The Secret of the Unicorn (1943) – Tintin discovers a riddle left by Captain Haddock’s ancestor, which could lead to the hidden treasure of the pirate Red Rackham. Tintin and Haddock must obtain three identical models of Sir Francis’s ship, the Unicorn.

9. Red Rackham’s Treasure (1944) – Picking up where The Secret of the Unicorn left off, Tintin and Captain Haddock go to the West Indies, find the wreck of the treasure ship, but don’t find any treasure there. Where did it go?

10. The Castafiore Emerald (1963) – This story is quite unlike the other adventures; there is almost no travel. Someone has stolen the huge emerald from opera diva Bianca Castafiore. Can Tintin recover it?

Posted in Top Ten | Leave a comment

The Multinomial Distribution

The multinomial probability distribution occurs surprisingly (to me) often in machine learning scenarios. Suppose you have an American roulette wheel. There are a total of 38 slots for the ball to land in: 18 red, 18 black, 2 green (the 0 and 00). If you spin the wheel, the probability of getting red = the probability of getting black = 18/38 = 0.474. The probability of getting green is 2/38 = 0.052.

The Python language’s NumPy library has a nice multinomial() function to sample from a multinomial distribution. The only mildly confusing part is that you can call the multinomial() function in two ways. A call like x = multinomial(n=1, probs) returns something like [0, 1, 0] which you can interpret as a single index so you can use it to select an item from a list (the second item in this example).

But a call like x = multinomial(n=100, probs) returns something like [38, 53, 9] so you can interpret the x value as counts of the number of times each event occurred in a simulation.

My most recent encounter with the multinomial distribution was in the context of neural network classification — an idea I call “stochastic classification”. I’ll explain that idea in a future post.

“Enchanted Tiki Room” – Josh Agle. I worked at Disneyland when I was a college student, and spent many hours at the Tiki Room.

Posted in CNTK, Machine Learning | Leave a comment

Normalizing Numeric Predictor Values using Python

Even if you don’t work with machine learning code, you may have heard something along the lines of, “It’s not uncommon to spend 90%, or more, of your time and effort, getting your data ready.” To prepare data you typically have to normalize numeric predictors values, encode non-numeric predictor variables, encode class labels (if you’re making a classifier), and add tags specific to your ML system readers, and add various formatting such as different field separators and column positioning. Data prep is time-consuming and just not very much fun.

Here’s a demo of how you might go about programmatically performing min-max normalization, as one part of the overall data preparation phase. My dummy demo source data is:

pink,   15.5,  0.38,  rose
white,  12.3,  0.57,  rose
yellow, 14.7,  0.33,  rose
yellow, 11.9,  0.68,  iris
white,  10.5,  0.71,  iris
white,  12.0,  0.69,  iris
pink,    9.9,  0.25,  aster
white,   8.9,  0.28,  aster
yellow,  8.0,  0.30,  aster

The idea is to predict the kind of flower (rose, iris, aster) from color, stem length, and petal width. The length and width predictors should be normalized. The min-max technique is best explained by example. For the length values, the min is 8.0 and the max is 15.5. Each length value x, is normalized by (x – min) / (max – min). For example, for x = 12.0, the normalized x = (12.0 – 8.0) (15.5 – 8.0) = 4.0/7.5 = 0.533.

With min-max normalization, all normalized values will be between 0.0 and 1.0.

For reasonably sized data sets, a good way to normalize is to plop the data into Excel, then normalize there, then export to a file. But for large data sets, you are probably better off writing a utility program.

In addition to min-max normalization, the two other techniques I use are z-score normalization and divide-by-power-of-10 normalization.

Posted in Machine Learning | Leave a comment

The Mahalanobis Distance Between Two Vectors

There are many different ways to measure the distance between two vectors. The most common is Euclidean Distance, which is the square root of the sum of the squared differences between corresponding vector component values. A more sophisticated technique is the Mahalanobis Distance, which takes into account the variability in dimensions.

Suppose you have data for five people, and each person vector has a X = Height, Y = Score on some test, and Z = Age:

	X	Y	Z
	Height	Score	Age
	64.0	580.0	29.0
	66.0	570.0	33.0
	68.0	590.0	37.0
	69.0	660.0	46.0
	73.0	600.0	55.0
m =	68.0	600.0	40.0

The mean of the data is (68.0, 600.0, 40.0). Now suppose you want to know how far person, v1 = (66, 570, 33), is from person v2 = (69, 660, 46). It turns out the Mahalanobis Distance between the two is 3.24.

The MD uses the covariance matrix of the dataset – that’s a somewhat complicated side-topic. The covariance matrix summarizes the variability of the dataset. It has the X, Y, Z variances on the diagonal and the XY, XZ, YZ covariances off the diagonal.

Mathematically, the MD is defined as:

The top equation is the base definition for the distance between an arbitrary vector and the mean of the entire dataset. The bottom equation is the variation of MD between two vectors from the dataset, instead of one vector and a dataset.

In the Excel spreadsheet shown below, I show an example. First you calculate the covariance matrix, (S in the equation, “covar mat” in the image). Then you find the inverse of S (“inv-covar” in the image). If each vector has d dimensions (3 in the example, then the covariance matrix and its inverse will be dxd square matrices.

First you subtract v1 -v1 to get (-3.0, -110.0, -13.0). Then you matrix-multiply that 1×3 vector by the 3×3 inverse covariance matrix to get an intermediate 1×3 result tmp = (-1.2981, -0.1209, 0.5172). Then you multiply the 1×3 intermediate result by the 3×1 transpose of v1-v2 -3.0, -110.0, -13.0) to get the squared distance result = 10.4694. The last step is to take the square root, giving the final Mahalanobis Distance = 3.24.

The Wikipedia entry on Mahalanobis Distance can fill you in with all the theoretical details.

The Tarantula Nebula is 170,000 Light Years Distant

Posted in Machine Learning | Leave a comment

NFL 2017 Week 11 Predictions – Zoltar Goes Crazy Again

Zoltar is my NFL football machine learning prediction system. It’s a hybrid system that uses a custom “reinforcement learning” algorithm plus a neural network. Here are Zoltar’s predictions for week #11 of the 2017 NFL season:

Zoltar:    steelers  by    6  dog =      titans    Vegas:    steelers  by    7
Zoltar:     packers  by    6  dog =      ravens    Vegas:      ravens  by    2
Zoltar:      saints  by    6  dog =    redskins    Vegas:      saints  by    8
Zoltar:     vikings  by    6  dog =        rams    Vegas:     vikings  by  2.5
Zoltar:    dolphins  by    6  dog =  buccaneers    Vegas:    dolphins  by  2.5
Zoltar:       lions  by    4  dog =       bears    Vegas:       lions  by    3
Zoltar:      chiefs  by    5  dog =      giants    Vegas:      chiefs  by 10.5
Zoltar:      texans  by    2  dog =   cardinals    Vegas:   cardinals  by  1.5
Zoltar:     jaguars  by    5  dog =      browns    Vegas:     jaguars  by  7.5
Zoltar:       bills  by    0  dog =    chargers    Vegas:    chargers  by    4
Zoltar:     broncos  by    6  dog =     bengals    Vegas:     broncos  by  2.5
Zoltar:    patriots  by    3  dog =     raiders    Vegas:    patriots  by    0
Zoltar:     cowboys  by    4  dog =      eagles    Vegas:      eagles  by    3
Zoltar:    seahawks  by    4  dog =     falcons    Vegas:    seahawks  by    3

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. Typically, Zoltar has only three or four hypothetical recommendations, but for week #11, Zoltar has eight suggestions — just like last week. This is probably due to the very large number of injuries in mid-season.

1. Zoltar likes the Vegas underdog Packers against the Ravens. Vegas has the Ravens as a slight 2.0-point favorite but Zoltar thinks the Packers are 6 points better than the Ravens. Therefore, a bet on the Packers will pay off if the Packers win by any score, or if the Ravens win but by less than 2.0 points (in other words, 1 point — if the Ravens win by exactly 2 points, the bet is a push).

2. Zoltar likes the Vegas favorite Vikings against the Rams. Vegas lists the Vikings as 2.5 points better than the Rams but Zoltar thinks the Vikings are 6 points better. So if you bet on the Vikings, you will win your bet only if the Vikings win by more than 2.5 points (in other words, 3 points or more — notice that when the Vegas point spread has a “.5”, no push is possible).

3. Zoltar likes the Vegas favorite Dolphins against the Buccaneers.

4. Zoltar likes the Vegas underdog Giants against the Chiefs.

5. Zoltar likes the Vegas underdog Texans against the Cardinals.

6. Zoltar likes the Vegas underdog Bills against the Chargers.

7. Zoltar likes the Vegas favorite Broncos over the Bengals.

8. Zoltar likes the Vegas underdog Cowboys against the Eagles.

I actually watched many of the games last week, and I notice that this week Zoltar likes six teams that looked absolutely terrible last week — Dolphins, Giants, Texans, Bills, Broncos, Cowboys. This means Zoltar believes that bettors are overreacting to those teams’ terrible play.


Zoltar had a so-so showing last week. Against the Vegas spread, which is what Zoltar is designed to predict, Zoltar went only 5-3 (a last minute point spread change on the Seahawks – Cardinals game made a difference between a 4-4 and a 5-3 record).

For the 2017 season so far, against the Vegas point spread, Zoltar is a pretty good 29-14 (67% accuracy). If you must bet $110 to win $100 (typical in Vegas) then you must theoretically predict with 53% or better accuracy to make money, but realistically you must predict at 60% or better accuracy.

Just for fun, I also track how well Zoltar does when only predicting which team will win. This isn’t really useful except for parlay betting. For week #10, Zoltar was a good 13-1 just predicting winners.

For comparison purposes, I also track how well Bing and the Vegas line do when just predicting who will win. In week #10, Bing was a good 11-3, and Vegas was also good at 9-3 (two pushes) when just predicting winners.

For the 2017 season so far, just predicting the winning team, Zoltar is 96-50 (65% accuracy), Bing is about the same at 94-52 (64% accuracy), and Vegas is 88-52 (63% accuracy). The best humans are typically about 67% accurate predicting winners, so neither Zoltar nor Bing not Vegas is as good as the best humans when just predicting which team will win.

Zoltar from the 1988 Movie “Big”

Posted in Machine Learning, Zoltar | Leave a comment

Exploring Neural Network Input-Output using CNTK

The CNTK machine learning code library has become one of my most-used tools. I set out to explore the neural network input-output mechanism using CNTK. My goal was to completely understand how to create a network, assign values to the network’s weights and biases, feed input to the network, and examine the hidden node and output node values.

It took a bit of experimentation, but I am now confident I completely understand how t work with a basic, feed-forward, single hidden layer, neural network.

I’ve been thinking about putting together a set of materials for programmers who are new to CNTK. If I get around to that task, the demo program I just wrote will definitely be the first code presented.

“Input-Output” – unknown artist

Posted in CNTK, Machine Learning | Leave a comment

When to Apply Softmax on a Neural Network

Suppose you want to predict the political party affiliation (democrat, republican other) of a person based on age, income, and education. A training data set for this problem might look like:

32  48  14  0 1 0
24  28  12  1 0 0
. . .

The first line represents a 32-year old person, who makes $48,000 a year, has 14 years of education, and is a republican (0 1 0).

A neural network would have 3 input nodes (age, income, education), some number of hidden nodes (perhaps 10) determined by trial and error, and 3 output nodes where (1 0 0) is democrat, (0 1 0) is republican, and (0 0 1) is other.

The neural network will generate three output values that could be anything, such as (3.0, -1.0, 4.0). Some neural libraries have two error functions for training, cross-entropy and cross-entropy-with-softmax. You can apply softmax to the raw output nodes and then apply regular cross-entropy, and then use that error to adjust the networks weights and biases. Or you can not apply softmax and then apply cross-entropy-with-softmax. The result will be the same.

But a potentially minor mistake (that I made recently) is to apply softmax to the raw output nodes, and the use cross-entropy-with-softmax during training. The result is to apply softmax twice, which sometimes isn’t a good idea. I’ll try to explain with an example.

Suppose raw output node values are (3.0, -1.0, 4.0) and the target values in the training data are (0, 0, 1). If you apply softmax, the output node values become (0.268, 0.005, 0.727) and the with regular cross-entropy you’d be comparing 0.727 and 1 and you have nice separation between the three probabilities.

But if you apply cross-entropy-with-softmax, the output node values are re-softmaxed and become (0.298, 0.229, 0.472). The probabilities are now much closer together, and so training will likely be slower.

Very interesting stuff! (Well for geeks anyway).

“The Sequence” – Rick Eskridge

Posted in CNTK, Machine Learning | Leave a comment