Neural Binary Classification Using PyTorch

Of the neural network code libraries I use most often (TensorFlow, Keras, CNTK, PyTorch), PyTorch is by far the least mature. The Windows version of PyTorch was released only a few weeks ago.

So, there are almost no good PyTorch examples available, and learning PyTorch is a slow process. I took a big step forward recently when I created a binary classifier using PyTorch. The code was surprisingly difficult — many tricky details.

My demo program uses the Banknote Authentication dataset. The goal is to use four predictor variables from digital images of the banknotes and predict which are authentic and which are forgeries.

Because PyTorch operates at a very low level, there are a huge number of design decisions to make. My demo uses a 4-(8-8)-1 deep neural network with tanh activation on the hidden layers and the standard-for-binary-classification sigmoid activation on the output node. I used explicit Glorot initialization on all weights, and initialized all biases to zero.

PyTorch currently doesn’t have any built-in classification accuracy functions so I wrote my own. And there’s no built-in mechanism to generate training mini-batches so I wrote a custom class to do that.

It’s clear that PyTorch is very immature and will change greatly over the next year or so. There’s a strong temptation to just wait until PyTorch stabilizes. But I know from previous experience it’s better to man-up and dive in now and learn as much as possible, even at the expense of a lot of extra effort.

In a weird way, struggling to get models created using PyTorch is fun in spite of the intellectual pain. If you’re a software guy like me, you know exactly what I’m talking about. And if you’re not a software guy, I’ll bet there’s a similar difficult activity you’re passionate about and enjoy the challenge.



The definition of “manly” is “having qualities such as strength and courage that are expected in a man”. But a manly approach to software development may not always be optimal.

Advertisements
Posted in Machine Learning, PyTorch | Leave a comment

NFL 2018 Week 3 Predictions – Zoltar Likes Underdogs Jets, Bills, Redskins, and Cardinals

Zoltar is my NFL prediction computer program. It uses a deep neural network and Reinforcement Learning. Here are Zoltar’s predictions for week #3 of the 2018 NFL season:

Zoltar:        jets  by    1  dog =      browns    Vegas:      browns  by    3
Zoltar:     falcons  by    4  dog =      saints    Vegas:     falcons  by    3
Zoltar:    panthers  by    6  dog =     bengals    Vegas:    panthers  by    3
Zoltar:      texans  by    6  dog =      giants    Vegas:      texans  by    6
Zoltar:     jaguars  by    6  dog =      titans    Vegas:     jaguars  by    7
Zoltar:      chiefs  by    8  dog = fortyniners    Vegas:      chiefs  by    6
Zoltar:    dolphins  by    6  dog =     raiders    Vegas:    dolphins  by    3
Zoltar:     vikings  by    9  dog =       bills    Vegas:     vikings  by 16.5
Zoltar:      eagles  by   11  dog =       colts    Vegas:      eagles  by  6.5
Zoltar:      ravens  by    6  dog =     broncos    Vegas:      ravens  by    5
Zoltar:    redskins  by    1  dog =     packers    Vegas:     packers  by    3
Zoltar:        rams  by    6  dog =    chargers    Vegas:        rams  by    7
Zoltar:   cardinals  by    5  dog =       bears    Vegas:       bears  by  5.5
Zoltar:     cowboys  by    0  dog =    seahawks    Vegas:    seahawks  by  1.5
Zoltar:    patriots  by    2  dog =       lions    Vegas:    patriots  by  6.5
Zoltar:    steelers  by    0  dog =  buccaneers    Vegas:    steelers  by    2

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week #3 Zoltar has six hypothetical suggestions.

1. Zoltar likes the Vegas underdog Jets against the Browns. Zoltar thinks the Jets are better than the Browns by 1 point but Vegas has the Browns as 3-point favorites. The Browns haven’t won a game in two years but looked very good in their two losses so far this season, while the Jets looked bad in a loss last week. Classic example of human vs. computer analysis.

2. Zoltar likes the Vegas underdog Bills against the Vikings. Zoltar believes the Vikings are a big 9 points better than the Bills but Vegas has the Vikings as huge 16.5-point favorites. Such situations are a weakness of Zoltar because he just doesn’t handle enormous point spread differences well because they happen so rarely in the NFL.

3. Zoltar likes the Vegas favorite Eagles against the Colts. Zoltar thinks the Eagles are 11 points better than the Colts but Vegas has the Eagles as favorites by only 6.5 points, therefore, Zoltar thinks the Eagles will cover the spread.

4. Zoltar likes the Vegas underdog Redskins against the Packers. Zoltar believes the Redskins are a slim 1 point better than the Packers but Vegas has the Packers favored by 3 points. So, a bet on the Redskins will pay if the Redskins win outright or if the Packers win, but by less than 3 points (in other words, by 2 points or by 1 point).

5. Zoltar likes the Vegas underdog Cardinals against the Bears. Zoltar thinks the Cardinals are 5 points better than the Bears but Vegas thinks the Bears are 5.5 points better than the Cardinals. This is a huge 10.5-point difference in opinion — the largest I can ever remember seeing. I need to check my data and Zoltar’s logic to make sure I didn’t mess something up.

6. Zoltar likes the Vegas underdog Lions against the Patriots. Zoltar thinks the Patriots are just 2 points better than the Lions but Vegas has the Patriots as 6.5 points better. Historically, Zoltar has done very, very poorly when picking Vegas underdogs against the Patriots (0-9 over the past three years) so my advanced version of Zoltar doesn’t recommend a bet.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting.

Zoltar sometimes predicts a 0-point margin of victory. There are two such games in week #3: Cowboys vs. Seahawks and Steelers vs. Buccaneers. In the first four weeks of the season, Zoltar picks the home team to win. So, Zoltar “sort of” picks the Seahawks and the Buccaneers to win.

After week 4, Zoltar uses historical data for the current season (which usually, but not always, ends up in a prediction that the home team will win).

==

Zoltar did rather so-so in week #2. Against the Vegas point spread, Zoltar was 3-3. Zoltar correctly predicted two underdogs and one favorite, but incorrectly predicted underdogs Panthers, Bills, and Cardinals would prevent the favorites from covering the spread.

Just predicting winners, Zoltar was a decent 10-5 (the Vikings vs. Packers game was a tie). Vegas went 9-5 (Vegas had Ravens vs. Bengals as a “pick-em” game).



My prediction system is named after the arcade fortune teller machine

Posted in Machine Learning, Zoltar | Leave a comment

Career Explore-Exploit and eSports

I spoke at a large conference in Las Vegas recently. When I arrived in Vegas, on the taxi ride from the Las Vegas airport to my hotel, I noticed something odd about the Luxor hotel. The Luxor is the giant pyramid building at the south end of The Strip.

The Luxor had a huge banner-like sign on it that read, “eSports Arena”. What?


Huge “eSports Arena” banner sign on the Luxor Hotel in Las Vegas.

The term eSports means an organized multiplayer video game competition. One form is where teams or individuals compete against each other. In some scenarios the competition is televised with announcers and analysts describing the action similar to a professional football or basketball game.

I almost never play video games. I just don’t enjoy them very much even though I greatly enjoy games like chess, backgammon, and poker. My initial mental reaction to the existence of the eSports Arena was, “Who on earth would want to do this, and why would anyone want to watch?”

But after a few moments I realized that guys like to watch professionals compete in sports that they played in their younger days. Guys who play golf will watch professional golfers. Guys who played baseball will watch professional baseball. And so on. So it’s not a stretch to imagine that the current generation of young men, who grew up playing video games, would want to watch experts playing these video games.

This may be a large part of the reason why women’s pro basketball (WNBA) has tiny attendance and almost no TV viewers — there just aren’t that many women who played competitive basketball when they were young. Well, that, and the fact that women’s basketball just isn’t very interesting.


Entrance to the eSports Arena inside the Luxor Hotel.

After I checked into my hotel, on a break between my conference sessions, I walked over to the eSports Arena inside the Luxor. I was quite impressed. The eSports area must have cost many hundreds of thousands of dollars, maybe more, to construct. There were gaming stations and spectator seats and a big stage and even a bar.


A small part of the interior of the eSports Arena.

The larger picture of this blog post has to do with career explore-exploit. As a person’s career moves along, most of the time you are in exploit mode where you’re basically doing your job or learning how to do your current job better.

But I think it’s important to switch into explore mode every now and then. By this I mean you should investigate areas just outside your comfort zone. The idea is that every now and then a random exploration will reveal a great new career opportunity. This is why I went to check out the eSports Arena.

Now most of the time, explore mode doesn’t lead anywhere. But if you never enter explore mode, the number of opportunities that come your way will be greatly decreased, maybe even zero. I’ve seen young people at my company fail to ever explore and their careers often suffer for it in the long run.

As a final note, there’s an additional factor of being able to recognize an opportunity when it appears (they’re often not obvious) and then having the courage to jump at an opportunity. Not too long ago, the group I work in at my company made a job offer to a woman who had applied for a mid-level management-type position. It was a fantastic opportunity for anyone. Instead of replying with a “yes” or “no”, she asked for a few days to think it over. What?! I was stunned.

This can only mean one of three things, none of them good. Either she really wasn’t excited about the job, or she has an indecisive personality, or she is just using the job offer as bargaining leverage for a different job. If I had been the manager of this position, I would have certainly retracted the job offer.

In the end, managing your career is tricky. A lot depends on being in the right place at the right time. But, as the saying goes, “Good luck is when opportunity meets preparation.”



Being in the wrong place at the wrong time.

Posted in Miscellaneous | Leave a comment

The Kendall tau Distance Metric

Suppose a group of people each rank their preference of a set of options, from best to worst. The Kendall tau distance is a metric that compares how close any two sets of rankings are. If the K-t distance for two rankings is 0.0 then two rankings agree exactly. If the K-t distance is 1.0 then the two rankings have maximum disagreement.

Here’s an example. Suppose Joe and Ken list their preferences of five cities, from best to worst. The raw lists are:

   Joe's    Ken's
===================
1. Camden   Austin 
2. Austin   Eureka
3. Denver   Boston
4. Boston   Camden
5. Eureka   Denver

And so the rankings (1 = best, 2 = second best, etc.) are:

         Joe  Ken
==================
Austin    2    1
Boston    4    3
Camden    1    4
Denver    3    5
Eureka    5    2

To compute Kendall tau distance you look at each possible pair of options for each ranker, count the number of times the rankings differ, then divide by the total number of pairs:

        Joe   Ken   Differ?
============================
AB      >     >     
AC      <     >     x
AD      >     >
AE      >     >
BC      <     >     x
BD      <     >     x
BE      >     <     x
CD      >     >
CE      >     <     x
DE      >     <     x

The AB > > entry means both Joe and Ken agree that A(ustin) is better than B(oston). The AC < > entry means Joe thinks A(ustin) is worse than C(amden) but Ken thinks A(ustin) is better than C(amden). There are a total of ten comparison pairs and Joe and Ken disagree on 6 so the Kendall tau distance between their rankings is 6 / 10 = 0.60.

Notice that if Joe and Ken agreed perfectly, their would be no pairs where they differ and so K-t would be 0 / 10 = 0.0.

I don’t use Kendall tau distance very often but it’s a nice metric to remember whenever you have rankings of items.



Many rankings are subjective. Miss America 1946 contest and Miss America 1985 contest. I wonder what the judges’ Kendall tau scores were.

Posted in Miscellaneous | Leave a comment

Managing Neural Network Library Versions

I regularly use four neural network code libraries: TensorFlow, Keras, PyTorch, and CNTK. The libraries require Python. Managing all the different versions of these libraries is an annoying detail that has to be taken care of very carefully.

I recently did a significant update to my Python which required updates to all my libraries. For the past year and a half, I had been using Python version 3.5.2 contained in the Anaconda3 4.1.1 distribution. (Anaconda has become the clear default Python distribution, at least among my colleagues). I upgraded to Python 3.6. Summary: As I’m writing this blog post, my libraries are:

Anaconda3 5.2.0 (Python 3.6.5)
TensorFlow 1.10.0
Keras 2.2.2
PyTorch 0.4.1
CNTK 2.5.1

My first step was to uninstall existing Python and libraries. Next I installed the Anaconda3 5.2.0 distribution which contains Python 3.6.5 and over 400 compatible libraries such as NumPy, SciPy, and MatPlotLib. The installer executable is available at https://repo.continuum.io/archive/.

Next I installed TensorFlow 1.10.0 by going to https://pypi.org/project/tensorflow/1.10.0/ then saving file tensorflow-1.10.0-cp36-cp36m-win_amd64.whl to my machine. I installed using the command:

> pip install tensorflow-1.10.0-cp36-cp36m-win_amd64.whl

Next I installed Keras 2.2.2 by going to https://pypi.org/project/Keras/2.2.2/ then saving file Keras-2.2.2-py2.py3-none-any.whl to my machine. I installed Keras using the command:

> pip install Keras-2.2.2-py2.py3-none-any.whl

Next I installed PyTorch 0.4.1 by going to https://pytorch.org/ and then I selected Windows – pip – 3.6 – None which showed the URL of the .wh file as:

http://download.pytorch.org/whl/cpu/torch-0.4.1-cp36-cp36m-win_amd64.whl

This is the latest version of PyTorch. If I need to reinstall later, after a newer version of PyTorch is released, the Web page should have a link to “older versions”. Next I pasted the URL into a browser and did a Save As when prompted by the browser. I installed PyTorch using the command:

> pip install torch-0.4.1-cp36-cp36m-win_amd64.whl

Next I installed CNTK 2.5.1 by going to https://pypi.org/project/cntk/2.5.1/ then saving file cntk-2.5.1-cp36-cp36m-win_amd64.whl to my machine. I installed CNTK using the command:

> pip install cntk-2.5.1-cp36-cp36m-win_amd64.whl

Neural network code libraries are all relatively new and are updated frequently. This can make managing different versions somewhat of a headache, but that’s just a price that must be paid to be on the leading edge.



The library at St. Catherine’s Monastery in Egypt. Established in roughly 550 AD. Still in operation. Amazing.

Posted in CNTK, Keras, Machine Learning, PyTorch | Leave a comment

Top Ten Women Scientists in Movies

The Grace Hopper Conference is a for-profit, women-only event. It gives women an alternative to technology conferences that only have solid tech content. Example of a GH talk last year: “Five Ways to Tap Into Your Power: Take The Lead!” According to the Web site, one of the goals is to highlight the contributions of women and foster role models. In honor of 2018 GH, I did some Internet queries on women role model scientists in movies. Here are ten of the most commonly cited examples. Note: I haven’t seen most of these films.


1. Dr. Lara Croft (Angelina Jolie) – Croft is a professor of archaeology and has all kinds of adventures in several films, including “Tomb Raider” (2001). She is by far the most frequently mentioned woman scientist in film.


2. Dr. Christmas Jones (Denise Richards) – Jones is a nuclear physicist in the James Bond movie “The World is not Enough” (1999). She helps Bond uncover a sinister plot involving M’s daughter Elektra.


3. Dr. Sue Storm (Jessica Alba) – Storm’s scientific background varies a bit according to whether she’s in a movie or comic book but she’s usually an astrophysicist and definitely a scientist. She appeared in “Fantastic Four” (2005) and gained the ability to turn invisible.


4. Cora Peterson (Raquel Welch) – Peterson is a scientist who gets shrunk along with several colleagues to microscopic size in “Fantastic Voyage” (1966). The tiny team, along with their tiny submarine, are injected into a sick man to operate on his brain.


5. Dr. Carol Marcus (Alice Eve) – Marcus is a medical doctor who helps McCoy in “Star Trek Into Darkness” (2013). She is the daughter of Admiral Marcus who turns out to be a bad guy.


6. Dr. Sydney Fox (Tia Carrera) – Fox is a professor of ancient history at a fictitious Trinity College in the TV series “Relic Hunter” (1999-2002). Fox plays a Croft-like character. I’ve seen a few episodes, and the show is surprisingly good.


7. Dr. Sheila Gamble (Eva Mendes) – Gamble is a doctor/wife of the character played by comedian Will Ferrell in “The Other Guys” (2010). She is twice as likeable as other characters in the film.


8. Alex Munday (Lucy Liu) – Munday impersonates a scientist while infiltrating Redstar Technologies in “Charlie’s Angels” (2000). She delivers a talk to a room full of software engineers and gains their cooperation on her mission.


9. Dr. Anne Babish (Carmen Electra) – Electra is a professor of marine biology in “Two-Headed Shark Attack” (2012). The movie title says it all. Where the shark comes from is never explained.


10. Dr. Susan Harris (Anitra Ford) – Harris is a mad scientist who studies bees, and if you’re a man, best avoided in “Invasion of the Bee Girls” (1973). This is an obscure, bad movie so I really don’t know why it appears in so many Internet search results for women scientists.



Honorable Mention – Women Scientists in TV Ads


Megan Fox as a scientist for Acer Computers.


Paris Hilton as a scientist in a TV ad.


Posted in Top Ten | Leave a comment

NFL 2018 Week 2 Predictions – Zoltar Likes Five Vegas Underdogs

Zoltar is my NFL prediction computer program. It uses a deep neural network and Reinforcement Learning. Here are Zoltar’s predictions for week #2 of the 2018 NFL season:

Zoltar:      ravens  by    0  dog =     bengals    Vegas:      ravens  by    0
Zoltar:    panthers  by    0  dog =     falcons    Vegas:     falcons  by  5.5
Zoltar:       bills  by    2  dog =    chargers    Vegas:    chargers  by  7.5
Zoltar:     vikings  by    3  dog =     packers    Vegas:     vikings  by    1
Zoltar:      saints  by   11  dog =      browns    Vegas:      saints  by  8.5
Zoltar:        jets  by    2  dog =    dolphins    Vegas:    dolphins  by    1
Zoltar:      titans  by    6  dog =      texans    Vegas:      texans  by  2.5
Zoltar:    steelers  by    6  dog =      chiefs    Vegas:    steelers  by  5.5
Zoltar:      eagles  by    4  dog =  buccaneers    Vegas:      eagles  by    3
Zoltar:    redskins  by    6  dog =       colts    Vegas:    redskins  by  5.5
Zoltar:        rams  by    6  dog =   cardinals    Vegas:        rams  by 10.5
Zoltar:       lions  by    0  dog = fortyniners    Vegas: fortyniners  by  3.5
Zoltar:     broncos  by    4  dog =     raiders    Vegas:     broncos  by    5
Zoltar:    patriots  by    0  dog =     jaguars    Vegas:    patriots  by    2
Zoltar:     cowboys  by    9  dog =      giants    Vegas:     cowboys  by    3
Zoltar:    seahawks  by    0  dog =       bears    Vegas:       bears  by    3

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week #2 Zoltar has six hypothetical suggestions.

1. Zoltar likes the Vegas underdog Panthers against the Falcons. Zoltar thinks the two teams are evenly matched but Vegas has the Falcons as 5.5-poinbt favorites. So, Zoltar believes that the Falcons will not cover the spread.

2. Zoltar likes the Vegas underdog Bills against the Chargers. Zoltar believes the Bills are 2 points better than the Chargers but Vegas has the Chargers as the favorite by 7.5 points. The Bills got obliterated in week #1 but Zoltar thinks Vegas has overreacted.

3. Zoltar likes the Vegas underdog Titans against the Texans. Zoltar thinks the Titans are 6 points better than the Texans but Vegas has the Texans favored by 2.5 points (mostly because the Titans QB was injured in week #1).

4. Zoltar likes the Vegas underdog Cardinals against the Rams. Zoltar believes the Rams are 6 points better than the Cardinals but Vegas has the Rams as a huge favorite by 10.5 points. Zoltar thinks the Rams won’t cover that spread.

5. Zoltar likes the Vegas underdog Lions against the 49ers. Zoltar thinks the teams are evenly matched but Vegas has the 49ers as favorites by 3.5 points.

6. Zoltar likes the Vegas favorite Cowboys against the Giants. Zoltar thinks the Cowboys are much better (9 points) than the Giants but Vegas says the Cowboys are only 3 points better than the Giants.

Theoretically, if you must bet $110 to win $100 (typical in Vegas) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Just for fun, I track how well Zoltar does when just trying to predict just which team will win a game. This isn’t useful except for parlay betting.

Zoltar sometimes predicts a 0-point margin of victory. There are five such games in week #2

Ravens vs. Bengals
Panthers vs. Falcons
Lions vs. 49ers
Patriots vs. Jaguars
Seahawks vs. Bears

In these situations, just to pick a winner so I can track raw number of correct predictions, in the first four weeks of the season, Zoltar picks the home team to win. Therefore, Zoltar picks the

Bengals, Falcons, 49ers, Jaguars, Bears

to “sort of win”. After week 4, Zoltar uses historical data for the current season (which usually, but not always, ends up in a prediction that the home team will win).

==

Zoltar did OK in week #1. Against the Vegas point spread, Zoltar was 4-2. Zoltar incorrectly predicted the Steelers would cover the 6.0 point spread against the Browns (game was a tie), and incorrectly predicted Cardinals over Redskins (Redskins won easily 24-6).

Just predicting winners, Zoltar was a decent 10-5 which is pretty good for the first week.



My system is named after the arcade fortune teller machine

Posted in Machine Learning, Zoltar | Leave a comment