Gibbs Sampling Part One of Lots

I ran into the topic of Gibbs sampling recently and realized I hadn’t used the technique in a long time. So I figured I’d post a brief explanation of exactly what Gibbs sampling is. I quickly remembered that, contrary to the impression you might get by browsing the Internet for information about Gibbs, Gibbs sampling is extremely complex and even a basic description would require many pages.

So, instead, I just did a quick demo. Gibbs sampling is a technique (MCMC — a huge topic) to approximate drawing samples from a multivariate joint probability distribution (huge topic) when you know the individual conditional distributions (huge topic).

You can see right away that even explaining the type of problem that Gibbs sampling solves requires a fantastic amount of background so I won’t try.

Here’s a concrete example. Suppose you want to draw samples from a bivariate Gaussian distribution (huge topic) with (huge topic):

u1 = 0.0, u2 = 0.0

Covar =  1.0   0.6
         0.6   1.0

As it turns out, the true graph (huge topic) of this joint bivariate probability distribution looks like (from Wikipedia):

Gibbs sampling can be used to approximate this distribution (gigantic topic), if you know the individual conditional distributions, which are (from: theclevermachine.wordpress.com/2012/11/05/mcmc-the-gibbs-sampler/):

I wrote a Python program (big topic) to approximate the true distribution, dropped the results into Excel, and made a graph. You can see the approximation is close to the true distribution.

As I wrote this blog post, I was stunned by how many complex, interrelated ideas there were. I guess the moral of the story is, that if you want to learn what Gibbs sampling is, buckle up because you’ll have to spend many days just learning the background information, and then many more days learning about Gibbs.



Four abstract paintings related to fortune telling. Artist unknown.

Advertisements
Posted in Miscellaneous | Leave a comment

NFL 2018 Week 7 Predictions – Zoltar Likes Underdogs Cardinals, Titans, and Bills

Zoltar is my NFL prediction computer program. It uses a deep neural network and Reinforcement Learning. Here are Zoltar’s predictions for week #7 of the 2018 NFL season:

Zoltar:   cardinals  by    5  dog =     broncos    Vegas:     broncos  by  2.5
Zoltar:    chargers  by    3  dog =      titans    Vegas:    chargers  by  6.5
Zoltar:    patriots  by    4  dog =       bears    Vegas:    patriots  by  3.5
Zoltar:       bills  by    2  dog =       colts    Vegas:       colts  by  6.5
Zoltar:     jaguars  by    6  dog =      texans    Vegas:     jaguars  by  4.5
Zoltar:      chiefs  by    6  dog =     bengals    Vegas:      chiefs  by    6
Zoltar:    dolphins  by    2  dog =       lions    Vegas:       lions  by    1
Zoltar:     vikings  by    4  dog =        jets    Vegas:     vikings  by    3
Zoltar:      eagles  by    5  dog =    panthers    Vegas:      eagles  by  4.5
Zoltar:  buccaneers  by    6  dog =      browns    Vegas:  buccaneers  by    3
Zoltar:      saints  by    0  dog =      ravens    Vegas:      ravens  by  2.5
Zoltar:    redskins  by    2  dog =     cowboys    Vegas:    redskins  by  1.5
Zoltar:        rams  by    9  dog = fortyniners    Vegas:        rams  by   11
Zoltar:     falcons  by   10  dog =      giants    Vegas:     falcons  by    6

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week #7 Zoltar has four hypothetical suggestions.

1. Zoltar likes the Vegas underdog Cardinals against the Broncos. Zoltar thinks the Cardinals are 5 points better than the Broncos but Vegas has the Broncos as favorites by 2.5 points. A bet on the Cardinals will pay off if the Cardinals win, or if the Broncos win by less than 2.5 points (in other words, by 1 or 2 points).

2. Zoltar likes the Vegas underdog Titans against the Chargers. Zoltar thinks the Chargers are 3 points better than the Titans but Vegas has the Chargers favored by 6.5 points. Classic human vs. computer example here: the Titans looked absolutely terrible last week, but Zoltar believes the market has overreacted.

3. Zoltar likes the Vegas underdog Bills against the Colts. Zoltar thinks the Bills are 2 points better than the Colts, but Vegas has the Colts favored by 6.5 points. Big difference of opinion is due to an injury to the Bills quarterback. I have an advanced version of Zoltar that takes injuries into account, but I haven’t run those numbers yet.

4. Zoltar likes the Vegas favorite Falcons against the Giants. Zoltar thinks the Falcons are 10 points better than the Giants but Vegas thinks the Falcons are only 6.0 points better than the Giants. A bet on the Falcons will pay off only if the Falcons “cover the spread” and win by more than 6 points (i.e., 7 or more). If the Falcons win by exactly 6 points the bet is a push.


Click to enlarge

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game (not by how many points). This isn’t useful except for parlay betting.

Zoltar sometimes predicts a 0-point margin of victory. There is one such game in week #7: Saints vs. Ravens. In the first four weeks of the season, Zoltar picks the home team to win. After week #4, Zoltar uses historical data for the current season (which usually, but not always, ends up in a prediction that the home team will win).

==


Click to enlarge

Zoltar did very well in week #6. Against the Vegas point spread, Zoltar was 5-1. For the season so far, against the Vegas spread Zoltar is 22-11 which is 67% accuracy.

Just predicting winners, Zoltar was a good 13-2. Vegas went 12-3 just predicting which team would win. For the season, Zoltar is 65-26 (71% accuracy) and Vegas is 63-27 (70% accuracy).



My system is named after the Zoltar fortune teller machine you can find in arcades. The arcade machine is named after the machine that appeared in the 1988 movie “Big” starring Tom Hanks. This is the movie Zoltar.

Posted in Machine Learning, Zoltar | Leave a comment

A Recap of the 2018 G2E Conference

I gave a short talk at the G2E conference. G2E used to be called the Global Gaming Expo but now the event covers all aspects of hotels and gaming technology so most people refer to it just as G2E. See http://www.globalgamingexpo.com/.

The event ran from October 8-11, 2018 in Las Vegas. The event had about 35,000 people there. My talk was “How Deep Neural Systems Will Disrupt Sports Prediction” and I described recent advances in LSTM (long short-term memory networks) and RL (reinforcement learning) and how they can be applied to sports prediction.


The G2E Expo was huge. These photos don’t give any indication of the gigantic size of the Expo, or the incredible energy level.

The logistics for my talk were a bit unusual. Let me explain. The main activity for G2E is a huge, and I mean really huge, Expo. There were close to 400 exhibitors and some of the “booths” were well over 10,000 square feet in area. I took a few photos but they don’t capture the immense size of the expo area, or the crazy energy level. There were marching bands, Chinese dragons with drums, rock bands, tens of thousands of people walking around and talking, lights of every kind — all going on at once — it was incredible.


My talk was sponsored by the American Gaming Association.

I usually speak in a normal room. But at G2E I was set up on an elevated stage right in the middle of the Expo! There was seating for about 60 people, but most of the 100-200 people audience just stumbled by and listened to me from the Expo floor. I felt like a carnival barker, “Step right up ladies and gentlemen! See the Mystifying LSTM Network! Thrill to the Amazing RL System!”

The companies there represented many billions of dollars of revenue. Most of the big technology infrastructure companies, such as Oracle and Amazon, had a presence at G2E. After I finished delivering my talk, I looked around the Expo. I could have spent days there. Almost every booth had a fascinating product or service.

Many of the exhibitors had things directly related to gambling games. Those were interesting, but I was much more fascinated by things like data analytics companies, various optimization companies, and so on. Interestingly, I saw almost no booths that featured advanced AI or ML. Because AI and ML apply to just about anything, I expect there to be an explosion of activity in this field over the next few months.




Most booths had hostesses and some had mascots too. The hostesses I talked to were bright, well-educated, and articulate. Not the “booth babes” stereotype at all. They all said they actually enjoyed chatting with me — they told me most attendees are either too intimidated to approach them or are too creepy. Booth hostesses can make a lot of money. Not sure about the mascots though.




G2E also has a conference in Macao but I’ve never been to that event — much too far away for me to travel.

Posted in Conferences, Machine Learning | 1 Comment

Support Vector Machine using Raw C#

Well, I did it to myself again. Once a technical problem gets lodged into my brain, I literally can’t sleep until I solve the problem.

This time it was support vector machines (SVMs). An SVM is a machine learning system that can make binary classification predictions (such as patient lives or patient dies). SVMs are very tricky to implement from scratch, and so most people, including me, usually use a machine learning library of some sort.

But I’m never 100% satisfied that I truly understand an ML system unless I can implement it from scratch. So, after a sleepless night, tossing and turning and reviewing the SVM sequential minimal optimization (SMO) algorithm, I came into work uber-early and starting writing code.

A few hours later, I was tired but intellectually happy.

I cheated a bit by hard-coding a particular kernel function, a polynomial kernel, into the SVM implementation. But it would be an easy matter to pass a kernel function in as an object or a delegate or an interface.

To be sure, I leveraged a lot of resources. In particular I looked at one of the early research papers by John Platt, who is a former work colleague of mine but who now works at Google; the Accord.NET library SVM code which was written mostly, I believe, by a guy named Cesar Souza; the LibSvm C++ library code written by some researchers in Taiwan; and Python code from a guy named Alexandre Kowalczyk who wrote an entire book on SVMs that I tech-reviewed.

So, now I’ll be able to sleep again. At least until the next technical challenge lodges itself into my brain.



Software developers and their wives and girlfriends.

Posted in Machine Learning | 2 Comments

Free “Keras Succinctly” Book

I wrote a short book titled “Keras Succinctly” that was published a few days ago. You can get the book in PDF format for free. See https://www.syncfusion.com/ebooks/cntk-succinctly.


Book Web site

The book has seven chapters, and each chapter describes how to solve a particular kind of problem:

Chapter 1 - Getting Started (Installation)
Chapter 2 - Multiclass Classification
Chapter 3 - Regression 
Chapter 4 - Binary Classification 
Chapter 5 - Image Classification (CNN)
Chapter 6 - Sentiment Analysis (LSTM) 
Chapter 7 - Autoencoders

The first paragraph of the book is:

Keras is an open source neural network library written in the Python language. Keras requires a backend engine and can use TensorFlow, CNTK (Microsoft Cognitive Toolkit), Theano, or MXNet. The motivation for Keras is that, although it’s possible to create deep neural systems using TensorFlow directly (or CNTK, Theano, MXNet), because TensorFlow works at a relatively low level of abstraction, coding TensorFlow directly is quite challenging. Keras adds a relatively easy-to-use layer of abstraction over TensorFlow.


Sample page

The e-book is published by a company called Syncfusion. Syncfusion has over 150 free titles. Most of the free e-books are very good, but, like anything else, there are a couple of clunkers in the catalogue. But only a few, and hey, they’re free! I can recommend the Syncfusion company — otherwise I wouldn’t have written a book for them. The people I’ve worked with at Syncfusion — Tres, Graham, Darren, Jacqueline — and all the authors, have been great: smart, articulate, and people of integrity. No clunkers with regards to the people.

To get any of the free e-books, Syncfusion requires you to register with an email address, so that they can send you marketing messages — after all, they’re a business and ultimately need to make money. But the Syncfusion messaging is very restrained (maybe one or two a month), and in fact, often quite interesting.


Posted in Keras, Machine Learning | 2 Comments

NFL 2018 Week 6 Predictions – Zoltar Likes Four Underdogs and Two Favorites

Zoltar is my NFL prediction computer program. It uses a deep neural network and Reinforcement Learning. Here are Zoltar’s predictions for week #6 of the 2018 NFL season:

Zoltar:      eagles  by    5  dog =      giants    Vegas:      eagles  by    3
Zoltar:     falcons  by    6  dog =  buccaneers    Vegas:     falcons  by  3.5
Zoltar:    steelers  by    0  dog =     bengals    Vegas:     bengals  by  2.5
Zoltar:    chargers  by    5  dog =      browns    Vegas:    chargers  by    1
Zoltar:       bills  by    0  dog =      texans    Vegas:      texans  by    8
Zoltar:    dolphins  by    2  dog =       bears    Vegas:       bears  by    3
Zoltar:     vikings  by    9  dog =   cardinals    Vegas:     vikings  by 10.5
Zoltar:        jets  by    6  dog =       colts    Vegas:        jets  by  2.5
Zoltar:    seahawks  by    3  dog =     raiders    Vegas:    seahawks  by    3
Zoltar:    panthers  by    1  dog =    redskins    Vegas:    redskins  by    1
Zoltar:        rams  by    5  dog =     broncos    Vegas:        rams  by    7
Zoltar:     jaguars  by    0  dog =     cowboys    Vegas:     jaguars  by    3
Zoltar:      titans  by    2  dog =      ravens    Vegas:      ravens  by    3
Zoltar:    patriots  by    1  dog =      chiefs    Vegas:    patriots  by  3.5
Zoltar:     packers  by    6  dog = fortyniners    Vegas:     packers  by  9.5

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week #6 Zoltar has six hypothetical suggestions.

1. Zoltar likes the Vegas favorite Chargers over the Browns. Zoltar thinks the Chargers are 5 points better than the Browns but Vegas has the Chargers favored only by 1.0 points so Zoltar thinks the Chargers will cover the spread (win by 2 points or more; if they win by exactly 1 point the bet is a push).

2. Zoltar likes the Vegas underdog Bills against the Texans. Zoltar thinks the two teams are evenly matched but Vegas has the Texans as a big 8.0-point favorite. A bet on the Bills will pay if the Bills win outright, or if the Texans win by 7 points or less).

3. Zoltar likes the Vegas underdog Dolphins against the Bears. Zoltar thinks the Dolphins are 2 points better than the Bears, but Vegas has the Bears favored by 3.0 points.

4. Zoltar likes the Vegas favorite Jets against the Colts. Zoltar thinks the Jets are 6 points better than the Colts but Vegas thinks the Jets are only 2.5 points better than the Colts.

5. Zoltar likes the Vegas underdog Titans against the Ravens. Zoltar thinks the Titans are 2 points better than the Ravens, but Vegas thinks the Ravens are 3.0 points better than the Titans.

6. Zoltar likes the Vegas underdog 49ers against the Packers. Zoltar thinks the Packers are 6 points better than the 49ers but Vegas has the Packers as 9.5-point favorites. So Zoltar thinks the Packers will win, but by fewer than 10 points.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game (not by how many points). This isn’t useful except for parlay betting.

Zoltar sometimes predicts a 0-point margin of victory. There three such games in week #6: Steelers-Bengals, Bills-Texans, and Jaguars-Cowboys. In the first four weeks of the season, Zoltar picks the home team to win. After week #4, Zoltar uses historical data for the current season (which usually, but not always, ends up in a prediction that the home team will win).

==

Zoltar did quite well in week #5. Against the Vegas point spread, Zoltar was 3-1. For the season so far, against the Vegas spread Zoltar is 17-10 which is not quite 63% accuracy.

Just predicting winners, Zoltar was a decent 11-4. Vegas also went 11-4 just predicting which team would win. For the season, Zoltar is 52-24 (68% accuracy) and Vegas is 51-24 (68% accuracy).



My system is named after the Zoltar fortune teller machine you can see in arcades.

Posted in Machine Learning, Zoltar | 2 Comments

A Look at CNTK v2.6 and the Iris Dataset

Version 2.6 of CNTK was released a few weeks ago so I figured I’d update my system and give it a try. CNTK (“Cognitive Network Tool Kit”) is Microsoft’s neural network code library. Primary alternatives include Google’s TensorFlow and Keras (a library that makes TF easier to use), and Facebook’s PyTorch.

To cut to the chase, I deleted by existing CNTK and then installed v2.6 using the pip utility, and then . .

As I write this, I think back about all the effort that was required to figure out how to install CNTK (and TF and Keras and PyTorch). It’s easy for me now, but if you’re new to using neural network code libraries, trust me, there’s a lot to learn — mostly about all the many things that can go wrong with an installation, how to interpret the error messages, and how to resolve.

OK, back to my post. I ran my favorite demo, classification on the Iris Dataset. My old (written for v2.5) CNTK code ran as expected. Excellent!

The real moral of the story is that deep learning with neural network libraries is new and still in a state of constant flux. This makes it tremendously difficult to stay abreast of changes. New releases of these libraries emerge not every free months, or even every few weeks, but often every few days. The pace of development is unlike anything I’ve ever seen in computer science.

Additionally, the NN libraries are just the tip of the technology pyramid. There are dozens and dozens of supporting systems, and they are being developed with blazing speed too. For example, I did an Internet search for “auto ML” and found many systems that are wrappers over CNTK or TF/Keras or PyTorch, and that are intended to automate the process pipeline of things like hyperparameter tuning, data preprocessing, and so on.

The blistering pace of development of neural network code libraries and supporting software will eventually slow down (maybe 18 months as a wild guess), but for now it’s an incredibly exciting time to be working with deep learning systems.



I suspect that an artist’s style doesn’t change too quickly over time (well, after their formative years anyway). Three paintings by an unknown (to me) artist with similar compositions but slightly different styles.

Posted in CNTK, Keras, Machine Learning, PyTorch | Leave a comment