What is the Best Score Prediction for an NFL Football Game?

The 2016 Super Bowl will be played in two days. My Zoltar system (named after the arcade fortune teller machine) says that the Carolina Panthers are 1 point better than the Denver Broncos. Because the Las Vegas point spread has the Panthers 5.5 points better than the Broncos, Zoltar would recommend betting on the underdog Broncos. You’ll win your bet if the Broncos win outright, or if the Panthers win but by 5 points or less (as Zoltar predicts).

ZoltarTop

Suppose you’re sitting around with a group of 10 friends on Sunday morning, getting ready to watch the Super Bowl. To make things interesting, you decide to all predict the score of the game (rather than the point spread). Everyone contributes $100 and whoever has the best prediction wins the $1000 pot.

An interesting question is, “What exactly do we mean by the best predicted score?” A pure math answer is to compute squared error and the best prediction is the one with the smallest error. For example, if one person predicts the Panther will win 31-27 and the actual result is that the Broncos won 30-24 then the squared error is (31-24)^2 + (27-30)^2 = 58.

But this approach makes no sense because the person making the prediction didn’t even get the correct winner of the game. Anyway, there is no definitive way to define what the best score prediction is; you just have to come up with one of many reasonable alternatives and agree to it before the game starts.

For example, one possible approach is:

1.) Any predicted score that has the correct winner of the game is better than any predicted score that doesn’t get the correct winner.

2.) Define error as abs(winner actual score – winner predicted score) + abs(loser actual score – loser predicted score). Here abs is absolute value, meaning just use the difference in predicted and actual scores.

3.) If errors of two predictions are equal, tie break with closest predicted score of the winner.

4.) If still tied, split pot.

Suppose the actual result of the game turns out to be that the Broncos win 35-32 and suppose four of the predictions were:

a.) Broncos 30-28
b.) Broncos 31-30
c.) Broncos 32-29
d.) Panthers 32-31

Person d.) is immediately eliminated because they didn’t get the correct winner. The errors for the other three predictions are:

a.) 5 + 4 = 9
b.) 4 + 2 = 6
c.) 3 + 3 = 6

So persons b.) and c.) are tied with the lowest error. The person b.) prediction for the winner’s score is off by 4 points but the person c.) prediction is off by only 3 points so person c.) has the best prediction and wins the pot.

The moral of the story is: don’t watch the Super Bowl with a bunch of math and computer geeks.

Posted in Machine Learning, Miscellaneous | Leave a comment

Recap of the 2016 Big Data Innovations Summit in Las Vegas

I spoke at the Big Data Innovations Summit, which ran from January 28-29, 2016, in Las Vegas. The event was different from most I speak at. I estimate there were about 250 attendees. Attendees were a mix of technical (developers, engineers) and business (sales, business development) people.

CloseUpTalking

My talk was Making Predictions using Neural Networks. I explained what types of problems can be solved using NNs, explained how NNs work, and commented about the current state of NNs and Big Data.

There were people from government (for example, the U.S. Census Bureau), technology (Microsoft, Google), commerce (Amazon), manufacturing (Boeing, BMW), media (CBS, NPR), and other realms.

The event was organized by a U.K. based company called IE Innovations, Inc. They put on a lot of these summit-type events in Europe, the U.S., and Asia.

The event was held in the upscale Four Seasons Hotel, which is a hotel-within-a-hotel inside the Mandalay Bay Hotel. I actually stayed at the Delano Hotel which is a hotel-attached-to-a-hotel connected to the Mandalay Bay.

FromBackOfRoom

Overall, I give the event a solid four stars out of five. The talks were good, but not great. I really liked the small expo (about 15 companies) in the sense that all the companies were interesting and talking to their representatives gave me insights into what’s going on with companies who target Big Data. And I learned a lot from my conversations with attendees and speakers.

I think the event is somewhat more valuable to people who are in hybrid business+technology roles rather than to people who are in pure technical roles. The bottom line is that I thought the Big Data Innovations Summit was a good use of my time and I’m likely to go back next year.

(Note: Thanks to the event technical staff for taking some photos of me before, during, and after my talk.)

Posted in Conferences | Leave a comment

Microsoft R Server Released

Microsoft released “Microsoft R Server” last week. It’s very difficult for me to explain what R Server is, because I’m not sure exactly what it is and isn’t — mostly because the documentation I’ve been able to find has been bloated with marketing-speak.

Here’s what I think Microsoft R Server is. R is an open source language and large library that can perform a wide range of classical statistics techniques. R is often used interactively by typing commands in a shell, but you can write R scripts too.

Microsoft_R_Server

A company called Revolution Analytics was created in 2007 to wrap the free R language in value-add stuff so they could charge users money for something that’s free. At some point, the company released Revolution R which essentially put the R language inside the Visual Studio IDE tool.

Then in January 2015, Microsoft bought this company that sells technology you can get for free. The name was changed to Microsoft R.

Then, last week, in January 2016, Microsoft R Server was announced. As far as I can tell, Microsoft R Server continues down the path of adding extra stuff to the free R so that companies can be charged money. The current value-add, again, as far as I can tell by wading through the marketing verbiage, is improved performance (OK, that’s nice) and connections to huge data storage capabilities in Microsoft Azure (that’s very nice) — that you have to pay for (OK, not so nice).

When I went to download and install Microsoft R Server, the documentation was atrocious. Too much blah blah blah. As a developer, I want a one-page Quick Start document so I can install and evaluate the technology myself, rather than be assaulted by annoying marketing text like, “Microsoft R Server is your flexible choice for analyzing data at scale, building intelligent apps, and discovering valuable insights across your business.” Seriously? I found a “Getting Started” PDF, opened it up — and it was 66 pages long. Seriously?

After I finally installed Microsoft R Server, I coded up a custom neural network in R to try things out. It was quite nice. I like it. Thumbs up.

As you can tell, I like R a lot. But I don’t like intrusive marketing messaging when I’m trying to install and understand a new product or technology.

Posted in Machine Learning | Leave a comment

2016 Super Bowl Prediction – Zoltar Picks the Panthers by 1 Point

For the upcoming Super Bowl on Feb. 7, 2016, Zoltar predicts the Carolina Panthers will beat the Denver Broncos by 1 point. Because the Las Vegas betting line has the Panthers favored by 5 points, Zoltar theoretically recommends betting on the underdog Broncos. You’ll win your bet if the Broncos win outright, or if the Panthers win but by 4 points or less.

Predicting the results of NFL football games is a fascinating problem. I wrote a computer program called Zoltar (named after the arcade fortune telling machine) to do just that. Unlike many prediction programs that use classical statistics techniques, Zoltar uses an information-theoretic approach.

For last week’s predictions, Zoltar was correct on both games when predicting just the winner. The Broncos beat the Patriots and the Panthers beat the Cardinals.

Last week against the Vegas point spread, Zoltar was 1-0, correctly advising a bet on the Vegas underdog Broncos (Zoltar did not recommend a bet on the Panthers-Cardinals game).

For the 2015 season, against the Vegas point spread, Zoltar went a pretty decent 56 correct out of 88 for 63% accuracy.

ZoltarVsVegasThruWeek20B

For the 2015 season, through week 20 (third and final round of playoffs before Super Bowl), in head-to-head predictions, meaning just predicting the winner of a game (which isn’t particularly useful except for parlay betting), Zoltar has gone 194-72 for 73% accuracy. I couldn’t find any week 18 or 19 head-to-head predictions for Bing Predictions. For week 20, Bing got one correct (Panthers) and one wrong (Patriots). For the season Bing Predictions is 162-96 for 63% accuracy.

ZoltarVsBingThruWeek20

If you just picked the Las Vegas point spread favorite to win each game, for the regular season you’d have gotten about 63% accuracy.

Posted in Machine Learning

12 Conferences for Software Developers in 2016

Here are a few 2016 conferences for software developers that you might want to investigate. Because I work with mostly with the Microsoft technologies stack, and I live on the West Coast (Seattle area), my list is biased towards those kinds of conferences (and events that I speak at).

DevIntersectionKeynoteCropped


1. Big Data Innovation Summit, January 28-29, Las Vegas. This event is really more about data analysis than software development, but because data has become increasingly important, you might want to check this event out. Co-located with a data analytics event. See http://theinnovationenterprise.com/summits/.


2. Visual Studio Live, March 7-11, Las Vegas. VS Live is one of my three favorite events. It’s relatively small (maybe about 400 attendees) which gives you a chance to connect with people. VS Live also have 2016 events in Austin, Boston, Redmond, Anaheim, Washington DC, and Orlando. Highly recommended. See http://www.vslive.com.


3. Microsoft Build, March 30 – April 1, San Francisco. The Build conference is a very large event that is mostly forward-looking (read “marketing”) rather than nuts-and-bolts information. Build has a lot of speakers who are managers instead of developers. Very glitzy. See http://build.microsoft.com/.


4. Interop, May 2-6, Las Vegas. Interop is a very big event aimed at both IT engineers and software developers. The event is aligned with InformationWeek magazine. I often speak at Interop. Recommended. See http://www.interop.com.


5. OSCON (Open Source Convention), May 16-19, Austin, Texas. I really enjoy the quirky OSCON event but am a bit annoyed they moved to Austin after being in Portland , OR for as long as I can remember. Recommended. See http://www.oscon.com.


6. There are several Python conferences. PyCon is scheduled for May 28 – June 5, in Portland. See https://us.pycon.org/2016/. SciPy is scheduled for July 11-17 in Austin. See http://scipy2015.scipy.org. PyData hasn’t announced their date for a 2016 U.S. event yet.


7. Future Insights Live, June 20-23, Las Vegas. This conference is mostly for Web designers and developers. I have never attended this event but I’ve heard good things about it. See http://www.futureinsightslive.com.


8. The R User Conference (aka useR!), June 27-30 Stanford, CA. R is a mathematical statistics language so isn’t intended for most software developers. But it may be worth checking out. See http://user2016.org/


9. JavaOne, September 18-22, San Francisco. A very large event for Java developers. Recommended if you use Java. I like Java but I’m not happy with the way Oracle is directing the language. See http://www.oracle.com/javaone/.


10. Microsoft Ignite, September 26-20, Atlanta. Ignite is a large conference aimed mostly at IT engineers who wrestle with systems like Exchange, Lync, and SharePoint. Atlanta is one of my least favorite cities for conferences — no idea why Microsoft would choose such a hot-humid, high-crime city. See http://ignite.microsoft.com.


11. DevConnections, October 10-13, Las Vegas. DevConnections is one of the longest running events for developers and IT engineers who use Microsoft technologies. It also has strong support from IBM. Recommended. See http://www.devconnections.com. (DevConnections is not affiliated with the similar sounding DevIntersection).


12. DevIntersection, October 24-28, Las Vegas. DevIntersection is one of my three favorite events for developers who use Microsoft technologies. Includes a co-located IT Edge conference for IT engineers. Highly recommended. See http://www.devintersection.com/. (DevIntersection is not affiliated with the similar sounding DevConnections).


IntroSlide2

Posted in Conferences

Linear Regression with R

I wrote an article titled “Linear Regression with R” in the January 2016 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2016/01/01/linear-regression-with-r.aspx.

LinearRegressionWithR_VSM

R is an old (I think dating back to the 1980s and perhaps earlier) scripting language and execution environment used for statistical data analyses. Linear regression is perhaps the most fundamental technique of classical statistics. In my article I demonstrate how to perform LR on some fake data to predict a person’s annual income from their age, political leaning (conservative, moderate, liberal), and years of education.

The result is an equation:

I = -74.9 + (0.09 * age) +
            (-33.11 * isLiberal) +
            (-18.22 * isModerate) +
            (10.00 * education)

So, for a person who is 30 years old, is a moderate politically, and who has 16 years of education, the predicted income is:

I = -74.9 + (0.09 * 30) +
            (-33.11 * 0) +
            (-18.22 * 1) +
            (10.00 * 16)

  = -74.9 + 2.7 + 0 + -18.22 + 160.0
  = 69.58 ($69,580.00 per year)

R is a strange language and in my article I explain R from the viewpoint of a software developer rather than from the viewpoint of a college statistics student.

Posted in Machine Learning

NFL 2015 Conference Championship Predictions – Zoltar Likes the Underdog Broncos vs. the Patriots

Predicting the results of NFL football games is a fascinating problem. I wrote a computer program called Zoltar (named after the arcade fortune telling machine) to do just that. Unlike many prediction programs that use classical statistics techniques, Zoltar uses an information-theoretic approach.

Zoltar’s point spread predictions for the third round of the playoffs (week 20, conference championships) of the 2015 NFL season are:

Zoltar favorite =     broncos  by    5  underdog =    patriots
Zoltar favorite =    panthers  by    2  underdog =   cardinals

The Las Vegas point spreads are:

Vegas favorite =    patriots  by    3   underdog =     broncos
Vegas favorite =    panthers  by    3   underdog =   cardinals

So, Zoltar strongly disagrees with the Las Vegas prediction for the Broncos vs. Patriots game, and very closely agrees with Vegas on the Panthers vs. Cardinals game.

Zoltar theoretically recommends a bet on the Vegas underdog Broncos, thinking that the Broncos will win outright, or that the Patriots won’t cover the 3 point spread. I’m a bit dubious but Zoltar generally knows what he’s doing.

For last week’s predictions, Zoltar was correct in all four games when predicting just the winner.

Last week against the Vegas point spread, Zoltar was 0-0, because he didn’t recommends any bets (all Zoltar predictions were very close to the Las Vegas point spreads).

For the 2015 season, through week 19 (second round of playoffs), in head-to-head predictions, meaning just predicting the winner of a game (which isn’t particularly useful except for parlay betting), Zoltar has gone 192-72 for 73% accuracy. I couldn’t find any week 18 or 19 predictions for Bing Predictions. For the regular season Bing Predictions was 161-95 for 63% accuracy.

If you just picked the Las Vegas point spread favorite to win each game, for the regular season you’d have gotten about 63% accuracy.

ZoltarVsBingThruWeek19

Update, Monday, Jan. 26, 2016: Zoltar correctly predicted both winners (Broncos beat the Patriots 20-18, and the Panthers beat the Cardinals 49-15). Zoltar was 1-0 against the Vegas point spread, correctly advising a bet on the Vegas underdog Broncos.

Posted in Machine Learning