Machine Learning and eSports

I know what eSports are but I don’t fully understand all of the nuances. I’ve always been a big fan of games, but only classical games like chess and poker. I’ve only ever played video games a handful of times – the original Doom many years ago, StarCraft when it was first released, and a couple of others. I get bored with video games quite quickly, probably because I’m not very good at them.

I have a good friend, KL, who lives and breathes video games. He’s a very senior guy at Microsoft’s Xbox. I was chatting with him recently and my first question was related to the similarities between watching games like baseball and football, and watching experts play video games. KL explained to me that even though there are thousands of video games, only a few are used for eSports and that they’re popular games played by millions of people. OK, that makes sense to me.


I will be presenting on a panel about machine learning at the upcoming CEC conference, Sept. 4-5, 2019. I plan to pick the brains of the academics there from the University of Nevada.

KL also explained to me that most of the video games used in eSports require roughly equal amounts of strategy and physical skill. This, in principle, makes the idea of watching eSports more interesting and plausible.

Perhaps the idea is similar to most other sports. People who play a sport or game (or used to play when they were young) like to watch professionals and experts play those games. Men that grew up playing ice hockey (or whatever) like to watch professional ice hockey, but people who didn’t play ice hockey (or whatever) as young men likely have limited interest.

This could explain why eSports could become wildly popular. It could also explain why the abomination that’s women’s pro basketball has no chance of becoming popular among normal women. But I still have a lot of questions in my mind. Professional sports is monetized by TV advertising, which means the audience has to have money and be inclined to buy what is being advertised (typically things like beer, automobiles, etc.) It’s not clear to me what an eSports audience can support in terms of monetization.

Well, what’s the connection between eSports and machine learning? I’m not 100% sure. Machine learning is all about prediction, and sports prediction is useful if there is wagering involved. I do know there is intense activity in state governments surrounding efforts to create all kinds of new wagering scenarios for professional sports. If eSports gains traction, then it’s possible that machine learning can be applied in much the same way that it can be applied to traditional sports wagering.

In the end, everything is about economics and money. But not really. In the end, money is all about what good it can do for others.



Women’s professional basketball attracts dozens of fans. Professional tuna tossing. Semi-pro toe wresting. Unicycles plus basketball equals strange.

Posted in Conferences, Miscellaneous | 1 Comment

A Big Step Closer to the IMDB Movie Sentiment Example Using PyTorch

I made a big step in getting closer to my goal of creating a PyTorch LSTM prediction system for the IMDB movie review data. The IMDB dataset has 50,000 real movie reviews: 25,000 training (12,500 positive reviews, 12,500 negative reviews) and 25,000 test reviews.

I’m slowly coming to the realization that working with PyTorch LSTM networks is very, very tricky. You have to have a solid grasp of PyTorch tensors, near-expert level skill with Python, deep understanding of LSTM cells, awareness of PyTorch strangenesses (such as the invisible forward() method), and advanced knowledge of machine learning concepts such as cross entropy loss and dropout.

All that said, I finally put together a working demo that’s halfway to a solution for the IMDB problem. For simplicity, I use only 10 hard-coded dummy movie reviews, such as “the movie was excellent”. I use variable length reviews (no padding) which means I’m pretty much restricted to online training (i.e., processing one review at a time rather than batch processing multiple reviews). I use three classes (“negative”, “average/neutral”, “positive”) instead of just two because multiclass classification is (surprisingly if you’re new to ML) somewhat easier than binary classification.

But I’m satisfied the system works, and more than that, I completely understand exactly how the system works. With this information in my head, I’m confident I can tackle the IMDB problem.



Random images of Japanese TV commercials. Interesting strangenesses. Sometimes I’m glad that I don’t watch TV.

Posted in Machine Learning, PyTorch | 1 Comment

The Normal Distribution Probability Density Function and Machine Learning

There are many examples in machine learning where an idea is very complex because it has many (like a dozen or more) key concepts. Even if each individual concept is relatively simple, when you combine a lot of these concepts you get something that’s very difficult for beginners (or experts) to understand.

An example of this complexity idea is Gaussian Mixture Model (GMM) data clustering. To understand GMM clustering you must have a solid grasp of the multivariate normal (MV) probability density function (PDF), covariance, matrix multiplication, determinants and inverses, the expectation-maximization algorithm, and a handful of other topics.

To understand the multivariate normal probability density function, you need to understand the simpler (univariate) normal distribution PDF. A full explanation would take dozens of pages, but let me take a stab at a quickest-possible explanation for an ultimate goal of understanding GMM clustering.


A simple PDF and the PDF for a normal probability distribution with mean = 0.0 and sd = 1.0. The area under each curve is 1.0. The PDF equation is in the upper right.

Many real-life data, when graphed as a histogram, exhibit a bell-shaped curve. For example, if you made a bar graph of the heights of 100 men who work at a tech company in the U.S., you’d probably see the tallest bar (most heights) at about 70.0 inches with fewer heights at 70-74 inches, fewer still at 74-78 inches, and so on.

The mathematical normal probability distribution is a math idea. There isn’t just one normal probability distribution, there are zillions, each characterized by a mean (middle value) and standard deviation (sd, measure of spread). But all are bell-shaped curves that extend to positive and negative infinity on the x-axis, and have a total area under the curve equal to 1.0.

The math function that defines the bell-shaped curve is called the probability density function. Consider the function y = f(x) = 1/2 * x, where x must be between [3,4] only. The area under the “curve” is a triangle which has area 1/2 * base * height = 1/2 * 1 * 2 = 1.0.

One of the most common normal probability distributions is the one with mean = 0.0 and standard deviation = 1.0. It is called the standard normal (probability) distribution. Let me emphasize that it’s just an abstract math thing, not real data.

The probability density function for a normal distribution looks quite intimidating, but it’s just a function. If you feed the function an x value, a mean, and a standard deviation, you get a y value. For example, if you feed the PDF x = 2.0, mean = 0.0, and sd = 1.0, then you get y = 0.0450.

On the path to understanding GMM clustering, after you understand the probability density function for a normal distribution where the input has a single value, like x = 2.0, the next step is to understand the PDF for a normal distribution where x is a vector of multiple values, like x = [2.33, -0.87, 1.54]. This is called the multivariate normal probability distribution. I’ll explain that in a future post.

Eventually, I’ll cover the roughly half a dozen key ideas for GMM clustering and then I’ll explain how to implement it. None of the ideas is beyond understanding, but there are a lot of ideas to deal with.



Complicated wedding dresses. Thank you Internet for just being there.

Posted in Machine Learning | Leave a comment

I Simulate a PyTorch LSTM from Scratch

I’ve been investigating LSTM (long, short-term memory) networks for quite a long time. LSTM networks are very, very complex. As part of my path to knowledge, I simulated a PyTorch version of an LSTM cell (there are many slight variations of LSTMs) using nothing but raw Python. For me, doing this was the only way for me to be sure that I abolutely undertand LSTMs.

So, first I set up a PyTorch LSTM with 3 inputs and 5 outputs. This means it’s an LSTM cell designed to accept one word at a time, where each word is a vector of three values, like (0.98, 1.32, 0.87) and each word emits five output values. An LSTM network has an embedding layer to convert words to their numeric values, and has a dense layer to convert the output values into a form useful for the problem at hand.



Top image: A PyTorch LSTM cell. Bottom image: My from-scratch version of the same LSTM cell.

Then I initialized the PyTorch LSTM cell’s 200 weights and biases values to 0.01, 0.02, 0.03, . . . 1.99, 2.00.

Next, I set up a dummy micro-sentence with two words where each word has three values — (1.0 2.0 3.0), (4.0 5.0 6.0).

I fed the two words to the PyTorch LSTM and captured the final outputs (ht) and the final internal cell state (ct) after the second word:

Final ht:
0.9618  0.9623  0.9626  0.9629  0.9631

Final ct:
1.9700  1.9760  1.9807  1.9843  1.9872 

Then I looked at my simulated PyTorch LSTM cell. I initialized it to the same 200 initial values and fed the same inputs, and . . . drum roll please . . . got the identical output values.


The outputs of the PyTorch version and the from-scratch version are identical. Success.

My simulated PyTorch LSTM was simplified in the sense that it doesn’t do sentence-batching, doesn’t do bi-directional processing, and doesn’t allow cell stacking. Even so, my simulated LSTM cell is very complex.

I am now satisfied that I understand exactly how PyTorch LSTM cells work.

One of my character flaws is that once a technical problem enters my brain, I can’t rest until I solve the problem to my satisfaction. This is often a good thing but it has a downside too because some problems will stick in my head for months or even years. Such problems are continuously floating around in my head and emerge to my subconscious when I’m sleeping. But, this is just how my brain works so I don’t worry about it one way or another — it’s beyond my control for the most part.

I’ve never seen a really good, but simple, explanation with code, of exactly how LSTM cells work. So, I intend to tidy up my demo code a bit and then write up a (hopefully) good explanation, and then publish that code and explanation in Visual Studio Magazine where I write a monthly column on data science: https://visualstudiomagazine.com/Articles/List/Neural-Network-Lab.aspx



Research suggests that men and women have different causes of sleeplessness. Women tend to worry about family and interpersonal relationships. Men tend to worry about work and money. What’s not clear is the extent to which these differences are biological. Consensus seems to be the majority of the difference is biological, but there’s no way to come up with a definitive answer.

Posted in Machine Learning, PyTorch | Leave a comment

I Give a Talk About Fuzzy C-Means Data Clustering

Quite some time ago, I was working with data clustering algorithms. Data clustering is the process of grouping data so that similar items are in the same group/cluster, and also clusters are different from each other.

There are many different clustering algorithms. The two most common algorithms are called k-means clustering (and a minor variation called k-means++), and Gaussian mixture model clustering. These two algorithms only work with strictly numeric data, so you can’t have variables like sex = (male, female) or hair_color = (brown, blonde, black, red, gray).

Note that clustering non-numeric data is surprisingly tricky. I devised several clustering algorithms for non-numeric data, and wrote up technical articles:

Data Clustering Using Category Utility:
https://msdn.microsoft.com/en-us/magazine/dn198247.aspx

Data Clustering Using Entropy Minimization
https://visualstudiomagazine.com/articles/2013/02/01/data-clustering-using-entropy-minimization.aspx

Data Clustering Using Naive Bayes Inference
https://msdn.microsoft.com/en-us/magazine/jj991980.aspx

Anyway, one of the lesser-known clustering algorithms for numeric data is called fuzzy C-means clustering. The key idea of fuzzy clustering is that instead of assigning a data item to one of the k classes definitively, a data item is assigned one membership value for each possible cluster where the membership values indicate the degree to which the item belongs to each cluster.

Suppose you set k = 3, and each data item represents a person’s height and weight. Then the result of fuzzy C-means clustering might look something like:

Height  Weight   k=0   k=1   k=2
=================================
 65.0    120.0   0.82  0.08  0.10
 72.0    185.0   0.10  0.30  0.60
. . .

Here, the person whose (height, weight) is (65.0, 120.0) has mostly membership in the k=0 cluster.

Well, in the end, fuzzy C-means data clustering isn’t used very much because the additional information you get is somewhat difficult to interpret and use.



Fuzzy hats. Sometimes nice on women, sometimes not. But never, ever nice on guys.

Posted in Machine Learning | Leave a comment

Why Isn’t batch_first the Default Geometry for PyTorch LSTM Modules?

I’ve been working for many weeks on dissecting PyTorch LSTM modules. An LSTM module is a very complex object that can be used to analyze natural language. The classic example is movie review sentiment.

I’ve never happy unless I completely understand a software module. By completely, I mean well enough to implement the module in question from scratch, using Notepad and the relevant programming language (Python in the case of a PyTorch LSTM).

After a lot of experimentation, I was satisfied that I understood how a PyTorch LSTM deals with a single sequence input. What do I mean by a single sequence? One of the problems with understanding LSTMs is that the vocabulary is very inconsistent, and in many cases, including official documentation, the vocabulary is blatantly incorrect.

In my mind, an LSTM batch is a collection of sentences, a sentence is a collection of words. And a word is made of several numeric values (called a word embedding). Almost all of the very few examples, and PyTorch documentation, uses terms like “input” which can have roughly a dozen different meanings, and “hidden” which usually means “output”, and “output” which really means “all outputs”, and on and on. Anyway, the point is I prefer the terms “batch”, “sentence”, “word”, and “values” (embedding values).

The problem I was examining was how to batch together two or more sentences so that a PyTorch LSTM can understand them. Put another way, what is the geometry of a PyTorch LSTM batch?

My first experiment was to set up 2 separate sentences. Each sentence has 4 words. And each word is represented by 3 embedding values. Therefore, each sentence has 12 numeric values:

# ex:      the    movie   was     good
sent1 = [[0.01, 
          0.02,
          0.03], [0.04,
                  0.05,
                  0.06], [0.07,
                          0.08,
                          0.09], [0.10,
                                  0.11,
                                  0.12]])
sent2 = [[0.13, etc. 0.24]]

After setting up the 2 sentences, I fed them in turn to an LSTM module and displayed the two outputs. OK.

Next, I placed the values for the two sentences in a batch. My first attempt was to use the intuitive approach (spoiler: it didn’t work):

batch = [[0.01, 0.02, . . . 0.12],  # 1st sentence? (no)
         [0.13, 0.14, . . . 0.24]]  # 2nd sentence? (no)

Then I reset the LSTM object and fed the batch to the module and . . . got completely different results.

After much experimentation, I figured out the correct geometry for an LSTM batch of two or more sentences, but it is completely unintuitive:

batch = [[0.01, 0.02, 0.03], # 1st word of 1st sentence
         [0.13, 0.14, 0.15], # 1st word of 2nd sentence
         etc.

Ugh. Just ugh. But lurking in the back of my memory was the recollection of a mysterious LSTM parameter named batch_first=False. I’d never seen it used and the documentation description was utterly unhelpful (something like “put the batch first”). On a hunch I created the LSTM object and set batch_first=True and voila! The intuitive batch geometry now worked (meaning gave the same outputs as feeding sentences individually.

Well, that was fun. Actually, I’m not being sarcastic. I get a nice sense of satisfaction in figuring things like this out.



“My psychiatrist told me I was crazy and I said I wanted a second opinion. He said, OK, you’re ugly too.” – Rodney Danderfield. “I have never let my schooling interfere with my education.” – Mark Twain. “Some people cause happiness wherever they go; others whenever they go.” – Oscar Wilde. “Even if you are on the right track, you’ll get run over if you just sit there.” – Will Rogers. “Sometimes the road less traveled is less traveled for a good reason.” – Jerry Seinfeld.

Posted in Machine Learning, PyTorch | 1 Comment

My Top Ten Favorite Giant Cephalopod Movies

I’d always assumed there were dozens of movies that feature giant squids and octopuses (OK, octopi). But there really aren’t very many giant cephalopod movies. Here are my ten favorites.


1. It Came from Beneath the Sea (1955) – Hydrogen bomb tests make a really big octopus from the Mindanao Deep head towards San Francisco. Low-budget but effective and entertaining movie. Special effects by the famous Ray Harryhausen.


2. 20,000 Leagues Under the Sea (1954) – Rather loose adaptation of Jules Verne’s 19th-century novel. Captain Nemo and the crew of the submarine Nautilius try to stop war by sinking warships. From Disney. An excellent film in every respect and the battle with the giant squid during a storm was a highlight.


3. Mysterious Island (1961) – Another Jules Verne adaptation. Captain Nemo saves shipwrecked Civil War soldiers on an island that has large animals. The crew of the Nautilius encounters a giant unfriendly chambered nautilus. Special effects by Ray Harryhausen.


4. The Fellowship of the Ring (2002) – A huge octopus-like creature, the Watcher in the Water, guards the entrance to Moria. Gandalf comments, “There are older and fouler things than orcs in the deep places of the world.” The scene was filmed with so little light, it’s very difficult to see the creature.


5. Voyage to the Bottom of the Sea (1961) – A giant octopus attacks the submarine Seaview which is on a mission to save the Earth. Luckily the Seaview is equipped with a way to electrify its hull. Huh? This movie gave rise to a 1960s TV series of the same name.


6. Deep Rising (1998) – This movie is more or less a remake of “Alien” on a cruise ship instead of a spaceship. The monster is like a huge octopus with nasty sharp teeth. Most reviewers didn’t like this movie but I think it’s pretty good — it scared me.


7. The Lost Continent (1968) – This is a strange British film where a tramp steamer becomes disabled in fog-enshrouded carnivorous seaweed. There’s a colony of descendants of pirates and a giant one-green-eyed octopus. One of my favorite movies of the 1960s.


8. The Beast (1996) – This movie is similar to “Jaws” (1975) but with a giant squid instead of a giant shark. It was actually a made-for-TV film. This movie got terrible reviews but I think it’s OK, if a bit long. Based on a novel by Peter Benchley, who also wrote “Jaws”.


9. Tentacles (1977) – The title pretty much tells you what you need to know. Giant octopus on a rampage. Intended to capitalize on the huge success of “Jaws” (1975) two years earlier. The film had big stars (John Huston, Shelley Winters, Henry Fonda) who were nearing the ends of their careers. Not a good movie but it makes my top ten.


10. Reap the Wild Wind (1942) – Starring John Wayne and directed by Cecil B. DeMille. Set in the 1840s, the story is about marine salvage operations. Wayne is a diver exploring a wreck when he is attacked by a giant squid.



Honorable Mention


The Meg (2018) – A giant squid makes a very brief appearance when it attacks a minisub — and then the squid is promptly eaten by the megalodon shark.


Wake of the Red Witch (1948) – John Wayne (again!) battles a big octopus (or it might be a squid) for a chest of pearls.


Monster from the Ocean Floor (1954) – Biologists battle a giant one-eyed alien octopus off the Mexico coast.


Pearl of the South Pacific (1955) – Black pearls are stored underwater by natives in a lagoon guarded by a giant octopus.


Bride of the Monster (1955) – A mad scientist has a pet killer octopus.


Sh! The Octopus (1937) – The title is not a typo. A comedy-mystery film that’s rather famous among fans (all three of us) of giant octopus movies.


King Kong vs. Godzilla (1962) – The ape battles a giant octopus on an island in the South Pacific.


War of the Gargantuas (1966) – There is a bad big green gargantua (a hairy giant man) and a good big brown gargantua. The good one fights a giant octopus that looks a lot like the octopus in “King Kong vs. Godzilla” from a few years earlier.


The Rift (1990) – Also known as “Endless Descent”. A team of scientists descends into a deep marine trench in a medium size yellow submarine. The sub is briefly attacked by what looks like a giant marine nudibranch or flatworm. OK, so it’s not a cephalopod but it’s an invertebrate so it’s close. This movie has very good monsters and very bad acting.


Posted in Top Ten | Leave a comment