R Language OOP using S3

I wrote an article titled “R Language OOP using S3” in the January 2017 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2017/01/01/r-language-oop-using-s3.aspx.

r_s3_oop_banner

The R language has been around for decades and is used mostly for interactive data analysis. For example, you could do a linear regression analysis with just two commands — one to load data from a file using the read.table() function, and one to run the analysis using the lm() function.

Because R has evolved over time, there are three different object oriented programming models — S3, S4, and RC. Even though S3 is the oldest and technically weakest OOP model, it’s widely used because many of the built in functions, including lm(), were written using S3.

S3 is very, very different from OOP in any other language. So much so that the first time you look at S3, it doesn’t resemble OOP at all (assuming you’ve seen OOP in Java or Python or C# or almost any other language).

Anyway, in my article I walk readers through an example of creating and using a Person object — the standard beginning OOP example. A side effect of learning S3 OOP is that it gives you some interesting insights into OOP in other languages.

oopusings3demoscreenshot

Posted in R Language | Leave a comment

The Vowpal Wabbit Machine Learning Tool

Vowpal Wabbit (VW) is a command line, open source, machine learning tool. I ran into the creator of VW, John Langford, at work the other day which inspired me to take a look at VW because I hadn’t used VW in several years. (Note: the weird name Vowpal Wabbit is an indirect reference to a Lewis Carroll (author of “Alice in Wonderland”) nonsense word meaning “fast”.

My goal was to run a simple binary classification, using the tutorial at https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial as a guide, on a Windows machine.

vowpalwabbitdemoscreen

I decided to use part of the Iris Data set — that full data has three classes (“setosa”, “versicolor”, “virginica” species). So the first step was to get the data into a format that VW can understand. I wrote a script which used the raw data I copied from Wikipedia and generated:

0 'setosa | seplen:5.1 sepwid:3.5 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.9 sepwid:3.0 petlen:1.4 petwid:0.2
. . .
1 'versicolor | seplen:5.1 sepwid:2.5 petlen:3.0 petwid:1.1
1 'versicolor | seplen:5.7 sepwid:2.8 petlen:4.1 petwid:1.3

The first column is the class (0 or 1), then comes an optional class label, then the four features (sepal length, sepal width, petal length, petal width). There are 50 lines of setosa data and 50 lines of versicolor data. I saved as iris_vw.txt.

VW was designed strictly for a Linux environment. Luckily John hooked me up with Markus Cozowicz who had built a set of Windows binaries, complete with an MSI installer file, at https://github.com/eisber/vowpal_wabbit/releases. Nice!

I installed VW at the default C:\Program Files\VowpalWabbit and added the location to my System PATH.

I trained a model using the command:

> vw.exe -d iris_vw.txt -l 0.05 -c --passes 5 --holdout_off
 -f iris.model

This came from the tutorial. The “d” is the training data. The “l” is the learning rate. The “c” is “use a cache”. The “passes” is number of training passes. The “holdout_off” means no holdout data. The “f” is what filename to write to trained model to.

Then I created a tiny three-case test data set:

| seplen:5.1 sepwid:3.5 petlen:1.4 petwid:0.2
| seplen:7.0 sepwid:3.2 petlen:4.7 petwid:1.4
| seplen:6.0 sepwid:3.0 petlen:3.0 petwid:0.8

The first item was copied from a “setosa” line of training data. The second item was copied from a “versicolor” line. The third line is sort of a mix of features.

I made predictions on the test data:

vw.exe -i iris.model -t -d iris_test_vw.txt
 -p stdout --quiet

The “i” is the trained model to use. The “t” means data is a test file. The “d” is the test data. The “p” is where to send results (screen). The “–quiet” means not verbose.

The result was:

0.358822
0.936860
0.642465

I believe these are probabilities of class “1” so the predictions are “setosa”, “versicolor”, “versicolor”.

Conclusion: Vowpal Wabbit is blazingly fast, but has a very steep learning curve, and is not intended for casual users.

========= Data file:

0 'setosa | seplen:5.1 sepwid:3.5 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.9 sepwid:3.0 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.7 sepwid:3.2 petlen:1.3 petwid:0.2
0 'setosa | seplen:4.6 sepwid:3.1 petlen:1.5 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.6 petlen:1.4 petwid:0.2
0 'setosa | seplen:5.4 sepwid:3.9 petlen:1.7 petwid:0.4
0 'setosa | seplen:4.6 sepwid:3.4 petlen:1.4 petwid:0.3
0 'setosa | seplen:5.0 sepwid:3.4 petlen:1.5 petwid:0.2
0 'setosa | seplen:4.4 sepwid:2.9 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.9 sepwid:3.1 petlen:1.5 petwid:0.1
0 'setosa | seplen:5.4 sepwid:3.7 petlen:1.5 petwid:0.2
0 'setosa | seplen:4.8 sepwid:3.4 petlen:1.6 petwid:0.2
0 'setosa | seplen:4.8 sepwid:3.0 petlen:1.4 petwid:0.1
0 'setosa | seplen:4.3 sepwid:3.0 petlen:1.1 petwid:0.1
0 'setosa | seplen:5.8 sepwid:4.0 petlen:1.2 petwid:0.2
0 'setosa | seplen:5.7 sepwid:4.4 petlen:1.5 petwid:0.4
0 'setosa | seplen:5.4 sepwid:3.9 petlen:1.3 petwid:0.4
0 'setosa | seplen:5.1 sepwid:3.5 petlen:1.4 petwid:0.3
0 'setosa | seplen:5.7 sepwid:3.8 petlen:1.7 petwid:0.3
0 'setosa | seplen:5.1 sepwid:3.8 petlen:1.5 petwid:0.3
0 'setosa | seplen:5.4 sepwid:3.4 petlen:1.7 petwid:0.2
0 'setosa | seplen:5.1 sepwid:3.7 petlen:1.5 petwid:0.4
0 'setosa | seplen:4.6 sepwid:3.6 petlen:1.0 petwid:0.2
0 'setosa | seplen:5.1 sepwid:3.3 petlen:1.7 petwid:0.5
0 'setosa | seplen:4.8 sepwid:3.4 petlen:1.9 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.0 petlen:1.6 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.4 petlen:1.6 petwid:0.4
0 'setosa | seplen:5.2 sepwid:3.5 petlen:1.5 petwid:0.2
0 'setosa | seplen:5.2 sepwid:3.4 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.7 sepwid:3.2 petlen:1.6 petwid:0.2
0 'setosa | seplen:4.8 sepwid:3.1 petlen:1.6 petwid:0.2
0 'setosa | seplen:5.4 sepwid:3.4 petlen:1.5 petwid:0.4
0 'setosa | seplen:5.2 sepwid:4.1 petlen:1.5 petwid:0.1
0 'setosa | seplen:5.5 sepwid:4.2 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.9 sepwid:3.1 petlen:1.5 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.2 petlen:1.2 petwid:0.2
0 'setosa | seplen:5.5 sepwid:3.5 petlen:1.3 petwid:0.2
0 'setosa | seplen:4.9 sepwid:3.6 petlen:1.4 petwid:0.1
0 'setosa | seplen:4.4 sepwid:3.0 petlen:1.3 petwid:0.2
0 'setosa | seplen:5.1 sepwid:3.4 petlen:1.5 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.5 petlen:1.3 petwid:0.3
0 'setosa | seplen:4.5 sepwid:2.3 petlen:1.3 petwid:0.3
0 'setosa | seplen:4.4 sepwid:3.2 petlen:1.3 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.5 petlen:1.6 petwid:0.6
0 'setosa | seplen:5.1 sepwid:3.8 petlen:1.9 petwid:0.4
0 'setosa | seplen:4.8 sepwid:3.0 petlen:1.4 petwid:0.3
0 'setosa | seplen:5.1 sepwid:3.8 petlen:1.6 petwid:0.2
0 'setosa | seplen:4.6 sepwid:3.2 petlen:1.4 petwid:0.2
0 'setosa | seplen:5.3 sepwid:3.7 petlen:1.5 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.3 petlen:1.4 petwid:0.2
1 'versicolor | seplen:7.0 sepwid:3.2 petlen:4.7 petwid:1.4
1 'versicolor | seplen:6.4 sepwid:3.2 petlen:4.5 petwid:1.5
1 'versicolor | seplen:6.9 sepwid:3.1 petlen:4.9 petwid:1.5
1 'versicolor | seplen:5.5 sepwid:2.3 petlen:4.0 petwid:1.3
1 'versicolor | seplen:6.5 sepwid:2.8 petlen:4.6 petwid:1.5
1 'versicolor | seplen:5.7 sepwid:2.8 petlen:4.5 petwid:1.3
1 'versicolor | seplen:6.3 sepwid:3.3 petlen:4.7 petwid:1.6
1 'versicolor | seplen:4.9 sepwid:2.4 petlen:3.3 petwid:1.0
1 'versicolor | seplen:6.6 sepwid:2.9 petlen:4.6 petwid:1.3
1 'versicolor | seplen:5.2 sepwid:2.7 petlen:3.9 petwid:1.4
1 'versicolor | seplen:5.0 sepwid:2.0 petlen:3.5 petwid:1.0
1 'versicolor | seplen:5.9 sepwid:3.0 petlen:4.2 petwid:1.5
1 'versicolor | seplen:6.0 sepwid:2.2 petlen:4.0 petwid:1.0
1 'versicolor | seplen:6.1 sepwid:2.9 petlen:4.7 petwid:1.4
1 'versicolor | seplen:5.6 sepwid:2.9 petlen:3.6 petwid:1.3
1 'versicolor | seplen:6.7 sepwid:3.1 petlen:4.4 petwid:1.4
1 'versicolor | seplen:5.6 sepwid:3.0 petlen:4.5 petwid:1.5
1 'versicolor | seplen:5.8 sepwid:2.7 petlen:4.1 petwid:1.0
1 'versicolor | seplen:6.2 sepwid:2.2 petlen:4.5 petwid:1.5
1 'versicolor | seplen:5.6 sepwid:2.5 petlen:3.9 petwid:1.1
1 'versicolor | seplen:5.9 sepwid:3.2 petlen:4.8 petwid:1.8
1 'versicolor | seplen:6.1 sepwid:2.8 petlen:4.0 petwid:1.3
1 'versicolor | seplen:6.3 sepwid:2.5 petlen:4.9 petwid:1.5
1 'versicolor | seplen:6.1 sepwid:2.8 petlen:4.7 petwid:1.2
1 'versicolor | seplen:6.4 sepwid:2.9 petlen:4.3 petwid:1.3
1 'versicolor | seplen:6.6 sepwid:3.0 petlen:4.4 petwid:1.4
1 'versicolor | seplen:6.8 sepwid:2.8 petlen:4.8 petwid:1.4
1 'versicolor | seplen:6.7 sepwid:3.0 petlen:5.0 petwid:1.7
1 'versicolor | seplen:6.0 sepwid:2.9 petlen:4.5 petwid:1.5
1 'versicolor | seplen:5.7 sepwid:2.6 petlen:3.5 petwid:1.0
1 'versicolor | seplen:5.5 sepwid:2.4 petlen:3.8 petwid:1.1
1 'versicolor | seplen:5.5 sepwid:2.4 petlen:3.7 petwid:1.0
1 'versicolor | seplen:5.8 sepwid:2.7 petlen:3.9 petwid:1.2
1 'versicolor | seplen:6.0 sepwid:2.7 petlen:5.1 petwid:1.6
1 'versicolor | seplen:5.4 sepwid:3.0 petlen:4.5 petwid:1.5
1 'versicolor | seplen:6.0 sepwid:3.4 petlen:4.5 petwid:1.6
1 'versicolor | seplen:6.7 sepwid:3.1 petlen:4.7 petwid:1.5
1 'versicolor | seplen:6.3 sepwid:2.3 petlen:4.4 petwid:1.3
1 'versicolor | seplen:5.6 sepwid:3.0 petlen:4.1 petwid:1.3
1 'versicolor | seplen:5.5 sepwid:2.5 petlen:4.0 petwid:1.3
1 'versicolor | seplen:5.5 sepwid:2.6 petlen:4.4 petwid:1.2
1 'versicolor | seplen:6.1 sepwid:3.0 petlen:4.6 petwid:1.4
1 'versicolor | seplen:5.8 sepwid:2.6 petlen:4.0 petwid:1.2
1 'versicolor | seplen:5.0 sepwid:2.3 petlen:3.3 petwid:1.0
1 'versicolor | seplen:5.6 sepwid:2.7 petlen:4.2 petwid:1.3
1 'versicolor | seplen:5.7 sepwid:3.0 petlen:4.2 petwid:1.2
1 'versicolor | seplen:5.7 sepwid:2.9 petlen:4.2 petwid:1.3
1 'versicolor | seplen:6.2 sepwid:2.9 petlen:4.3 petwid:1.3
1 'versicolor | seplen:5.1 sepwid:2.5 petlen:3.0 petwid:1.1
1 'versicolor | seplen:5.7 sepwid:2.8 petlen:4.1 petwid:1.3
Posted in Machine Learning | Leave a comment

Tricking Image Recognition Software

I was interested by the 2013 research paper “Intriguing Properties of Neural Networks” by C. Szegedy, W. Zaremba, et al. One of the “intriguing properties” is that it is possible to trick image recognition. See https://cs.nyu.edu/~zaremba/docs/understanding.pdf.

There are several examples of this you can find on the Internet. My favorite is the bus-ostrich example:

bustoostrich

The image on the left is obviously a bus and you can train a convolutional deep neural network to recognize that image as a bus. However, by slightly altering the image in a clever way, the altered image on the right is classified by the neural net as an ostrich!

I think there are two morals to the story. First, this intriguing property could lead to some sort of security problems. Second, the intriguing property suggests that maybe convolutional neural networks have some inherent weakness and new approaches for image recognition are needed.

Posted in Machine Learning | Leave a comment

NFL 2016 Week 20 Predictions – Zoltar and Vegas Agree Exactly

Zoltar is my NFL prediction computer program. Here are Zoltar’s predictions for week 20 (conference championships) of the 2016 NFL season:


Zoltar:     falcons  by    4  dog =    packers     Vegas:     falcons  by    4
Zoltar:    patriots  by    6  dog =   steelers     Vegas:    patriots  by    6

Zoltar theoretically suggests betting when the Vegas line is more than 3.0 points different from Zoltar’s prediction. For week 20, Zoltar has no suggestions because Zoltar’s predictions and the Vegas point spread line are exactly the same (this is the first time that’s ever happened).

Zoltar’s early Super Bowl LI prediction is the Patriots by 3 points over the Atlanta Falcons.


In week 19, Zoltar went 2-1 against the Vegas point spread. Zoltar correctly recommended a bet on the Vegas favorite Falcons (who covered their 4.5 points against the Seahawks), and he correctly recommended a bet on the Vegas favorite Patriots (who covered their 14.0 points against the Texans). Zoltar incorrectly recommended a bet of the Vegas favorite Chiefs (who didn’t even win, much less cover their 1.0 point against the Green Bay Packers).

For the 2016 regular season plus the eight playoff games so far, Zoltar is 45-29 against the Vegas spread, for 61% accuracy. Historically, Zoltar is usually between 62% and 72% accuracy against the Vegas spread over the course of an entire season, so for 2016 Zoltar is doing only OK.

Theoretically, if you must bet $110 to win $100 (typical) then you’ll make money if you predict at 53% accuracy or better. But realistically, you need to predict at 60% accuracy or better.

Just for fun, I track how well Zoltar and Cortana/Bing Predictions do when just trying to predict just which team will win a game. This isn’t useful except for parlay betting.

In week 19, just predicting winners, Zoltar was poor 2-2. Cortana/Bing was also 2-2. (And so was the Las Vegas point spread).

For the 2016 season, just predicting winners, Zoltar is 180-82 (69% accuracy). Cortana/Bing is 167-95 (64% accuracy). There were two tie games in the season, which I didn’t include.

Note: Zoltar sometimes predicts a 0-point margin of victory. In those situations, to pick a winner so I can compare against Cortana/Bing, in the first four weeks of the season, Zoltar picks the home team to win. After week 4, Zoltar uses historical data for the current season (which usually ends up in a prediction that the home team will win).

zoltarweek20predictionsrun

Posted in Machine Learning | Leave a comment

Using the Microsoft TLC Machine Learning Tool

Microsoft has a machine learning tool called TLC, which I think used to stand for The Learning Code, but now it’s just an acronym. TLC is essentially a very large library of machine learning code that can be called in several ways (on a command line, as an API in code, Python, and a couple of internal Microsoft tools).

I believe that TLC is the core code used by Microsoft’s Azure Machine Learning service, but I could be wrong.

The TLC ML library has been around for many years. I hadn’t worked with TLC for a long time so I thought I’d refresh my memory.

I decided to do a classification analysis on the classical 150-item Iris Data Set using TLC called on a command line. I downloaded the TLC tool and extracted it.

The first step was to create a 120-item training file and a 30-item test file. I did so using the default tab-delimited format. For example, the first part of the training file was:

# file: iris_train.txt
#Label  SepLen  SepWid  PetLen  PetWid
0	3.5	1.4	0.2	5.1
0	3.0	1.4	0.2	4.9
(etc.)

Next I created a configuration file that had all the instructions:

# file: iris_traintest.rps
# desc: TLC train-test on the Iris dataset
# call: > maml.exe @iris_traintest.rsp

TrainTest
data=iris_train.txt
test=iris_test.txt
loader=text{
  hasheader=-
  col=Label:Num:0
  col=Features:Num:1-*}
trainer=MultiClassNeuralNetwork{
  hidden=10
  iter=1000
  lr=0.001
  loss=crossentropy
  refresh=200
  momentum=0.0
  weightdecay=0.0
  initwts=0.1
  shuf=+}
seed=12

# end-file

This is really a minimal config file; I estimate TLC has probably well over 200 options. If you understand neural networks, most of the config file should make sense, but if you don’t know NNs, the config probably looks incomprehensible.

tlcdemo_part1


tlcdemo_part2

I ran the confile file, and after fixing a few errors, finally got some output.

Bottom line: The TLC tool is very complex and has a steep learning curve, but for someone who works with ML on a daily basis, learning TLC might be a good investment.

Posted in Machine Learning | Leave a comment

Workforce Diversity in Tech Companies

The other day, my news feed popped up an article that referenced one on bloomberg.com titled “Facebook’s Hiring Process Hinders its Efforts to Create a Diverse Workforce”. See https://www.bloomberg.com/news/articles/2017-01-09/facebook-s-hiring-process-hinders-its-effort-to-create-a-diverse-workforce. Because I work in a tech environment and I’ve helped my company’s HR people understand how to recruit tech employees, I was intrigued.

Briefly, starting in 2015, Facebook recruiters received incentives to place minority people in engineering positions. After two years, there was essentially no effect — the percentages of various minorities hadn’t changed much. The article blamed this no-change on the fact that technical managers at Facebook had hiring approval authority.

facebookdiversityarticle

So, I scrolled down to the comments section of the aggregation site, knowing exactly what I’d find. As expected, commenters were outraged. A typical mild comment was something like, “So Facebook is surprised that managers hire based on ability instead of race or gender!?” Most comments (48 out of the first 50 I scanned) were a lot harsher.

This article led me to look up research (as opposed to opinion) on the topic of workforce diversity. The central research paper appears to be “Demography and Diversity in Organizations: A Review of 40 Years of Research” by K. Williams and C. O’Reily III. The review’s key findings are:

1. The research suggests that dissimilarity of individuals’ general backgrounds (things other than race, gender, age) may improve creativity due to conflict, but does not improve process implementation.

2. In general, gender diversity has negative effects on process and is associated with high turnover rates, especially when men are in a female-dominated group.

3. Ethnic diversity, unless successfully managed, has negative effects on group process.

OK, none of those research results is surprising. But what was interesting to me is that I had to dig deeply to find actual research results, and the results I found didn’t seem to be well known or published in general media.

In the end, every tech company is different, and diversity in a particular group may or may not be advantageous. I work in a really, really diverse workplace. I don’t place too much stock in either the research or spewed opinions. I just like to write algorithms. But tech company diversity efforts are a mildly interesting topic anyway.

Posted in Miscellaneous | Leave a comment

An Angular 2 Hello World Web Site

Angular 2 is a YAJF — yet another JavaScript framework. I’ve worked with the older AngularJS framework but hadn’t worked with Angular 2 so I thought I’d take a look.

angular2helloworlddemo

From my perspective, Angular 2 is really an entire programming language that ultimately gets translated down to ordinary HTML and JavaScript. So why not just write HTML and JavaScript directly? That’s a very good question.

In my opinion, Angular 2 is only useful when you have a seriously complex Web application. The learning curve for Angular 2 is enormous, the code base is highly volatile and so there’s a high likelihood of incompatible code, and there are a huge number of dependencies.

In short, I am not a fan of Angular 2, but I don’t have any arguments for people who want to use it.

You don’t download and install Angular 2 directly. Everything is done using the NodeJS framework, so that’s what you install. Then, to manage Angular 2 projects, I installed the angular-cli command line interface program.

I created a simple Angular 2 project using angular-cli:

> ng new HelloWorld

Then I fired up a local development Web server:

> ng serve

And then I launched a browser and navigated to the home page of the project at http://localhost:4200 and everything worked.

However, when I looked at the HelloWorld project I found there were 22,171 files using 137 MB of storage. What!? Well, almost all the files are libraries that aren’t being used by the project, but still, this is just crazy.

angular2files

Posted in Miscellaneous | Leave a comment