The Vowpal Wabbit Machine Learning Tool

Vowpal Wabbit (VW) is a command line, open source, machine learning tool. I ran into the creator of VW, John Langford, at work the other day which inspired me to take a look at VW because I hadn’t used VW in several years. (Note: the weird name Vowpal Wabbit is an indirect reference to a Lewis Carroll (author of “Alice in Wonderland”) nonsense word meaning “fast”.

My goal was to run a simple binary classification, using the tutorial at https://github.com/JohnLangford/vowpal_wabbit/wiki/Tutorial as a guide, on a Windows machine.

vowpalwabbitdemoscreen

I decided to use part of the Iris Data set — that full data has three classes (“setosa”, “versicolor”, “virginica” species). So the first step was to get the data into a format that VW can understand. I wrote a script which used the raw data I copied from Wikipedia and generated:

0 'setosa | seplen:5.1 sepwid:3.5 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.9 sepwid:3.0 petlen:1.4 petwid:0.2
. . .
1 'versicolor | seplen:5.1 sepwid:2.5 petlen:3.0 petwid:1.1
1 'versicolor | seplen:5.7 sepwid:2.8 petlen:4.1 petwid:1.3

The first column is the class (0 or 1), then comes an optional class label, then the four features (sepal length, sepal width, petal length, petal width). There are 50 lines of setosa data and 50 lines of versicolor data. I saved as iris_vw.txt.

VW was designed strictly for a Linux environment. Luckily John hooked me up with Markus Cozowicz who had built a set of Windows binaries, complete with an MSI installer file, at https://github.com/eisber/vowpal_wabbit/releases. Nice!

I installed VW at the default C:\Program Files\VowpalWabbit and added the location to my System PATH.

I trained a model using the command:

> vw.exe -d iris_vw.txt -l 0.05 -c --passes 5 --holdout_off
 -f iris.model

This came from the tutorial. The “d” is the training data. The “l” is the learning rate. The “c” is “use a cache”. The “passes” is number of training passes. The “holdout_off” means no holdout data. The “f” is what filename to write to trained model to.

Then I created a tiny three-case test data set:

| seplen:5.1 sepwid:3.5 petlen:1.4 petwid:0.2
| seplen:7.0 sepwid:3.2 petlen:4.7 petwid:1.4
| seplen:6.0 sepwid:3.0 petlen:3.0 petwid:0.8

The first item was copied from a “setosa” line of training data. The second item was copied from a “versicolor” line. The third line is sort of a mix of features.

I made predictions on the test data:

vw.exe -i iris.model -t -d iris_test_vw.txt
 -p stdout --quiet

The “i” is the trained model to use. The “t” means data is a test file. The “d” is the test data. The “p” is where to send results (screen). The “–quiet” means not verbose.

The result was:

0.358822
0.936860
0.642465

I believe these are probabilities of class “1” so the predictions are “setosa”, “versicolor”, “versicolor”.

Conclusion: Vowpal Wabbit is blazingly fast, but has a very steep learning curve, and is not intended for casual users.

========= Data file:

0 'setosa | seplen:5.1 sepwid:3.5 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.9 sepwid:3.0 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.7 sepwid:3.2 petlen:1.3 petwid:0.2
0 'setosa | seplen:4.6 sepwid:3.1 petlen:1.5 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.6 petlen:1.4 petwid:0.2
0 'setosa | seplen:5.4 sepwid:3.9 petlen:1.7 petwid:0.4
0 'setosa | seplen:4.6 sepwid:3.4 petlen:1.4 petwid:0.3
0 'setosa | seplen:5.0 sepwid:3.4 petlen:1.5 petwid:0.2
0 'setosa | seplen:4.4 sepwid:2.9 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.9 sepwid:3.1 petlen:1.5 petwid:0.1
0 'setosa | seplen:5.4 sepwid:3.7 petlen:1.5 petwid:0.2
0 'setosa | seplen:4.8 sepwid:3.4 petlen:1.6 petwid:0.2
0 'setosa | seplen:4.8 sepwid:3.0 petlen:1.4 petwid:0.1
0 'setosa | seplen:4.3 sepwid:3.0 petlen:1.1 petwid:0.1
0 'setosa | seplen:5.8 sepwid:4.0 petlen:1.2 petwid:0.2
0 'setosa | seplen:5.7 sepwid:4.4 petlen:1.5 petwid:0.4
0 'setosa | seplen:5.4 sepwid:3.9 petlen:1.3 petwid:0.4
0 'setosa | seplen:5.1 sepwid:3.5 petlen:1.4 petwid:0.3
0 'setosa | seplen:5.7 sepwid:3.8 petlen:1.7 petwid:0.3
0 'setosa | seplen:5.1 sepwid:3.8 petlen:1.5 petwid:0.3
0 'setosa | seplen:5.4 sepwid:3.4 petlen:1.7 petwid:0.2
0 'setosa | seplen:5.1 sepwid:3.7 petlen:1.5 petwid:0.4
0 'setosa | seplen:4.6 sepwid:3.6 petlen:1.0 petwid:0.2
0 'setosa | seplen:5.1 sepwid:3.3 petlen:1.7 petwid:0.5
0 'setosa | seplen:4.8 sepwid:3.4 petlen:1.9 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.0 petlen:1.6 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.4 petlen:1.6 petwid:0.4
0 'setosa | seplen:5.2 sepwid:3.5 petlen:1.5 petwid:0.2
0 'setosa | seplen:5.2 sepwid:3.4 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.7 sepwid:3.2 petlen:1.6 petwid:0.2
0 'setosa | seplen:4.8 sepwid:3.1 petlen:1.6 petwid:0.2
0 'setosa | seplen:5.4 sepwid:3.4 petlen:1.5 petwid:0.4
0 'setosa | seplen:5.2 sepwid:4.1 petlen:1.5 petwid:0.1
0 'setosa | seplen:5.5 sepwid:4.2 petlen:1.4 petwid:0.2
0 'setosa | seplen:4.9 sepwid:3.1 petlen:1.5 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.2 petlen:1.2 petwid:0.2
0 'setosa | seplen:5.5 sepwid:3.5 petlen:1.3 petwid:0.2
0 'setosa | seplen:4.9 sepwid:3.6 petlen:1.4 petwid:0.1
0 'setosa | seplen:4.4 sepwid:3.0 petlen:1.3 petwid:0.2
0 'setosa | seplen:5.1 sepwid:3.4 petlen:1.5 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.5 petlen:1.3 petwid:0.3
0 'setosa | seplen:4.5 sepwid:2.3 petlen:1.3 petwid:0.3
0 'setosa | seplen:4.4 sepwid:3.2 petlen:1.3 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.5 petlen:1.6 petwid:0.6
0 'setosa | seplen:5.1 sepwid:3.8 petlen:1.9 petwid:0.4
0 'setosa | seplen:4.8 sepwid:3.0 petlen:1.4 petwid:0.3
0 'setosa | seplen:5.1 sepwid:3.8 petlen:1.6 petwid:0.2
0 'setosa | seplen:4.6 sepwid:3.2 petlen:1.4 petwid:0.2
0 'setosa | seplen:5.3 sepwid:3.7 petlen:1.5 petwid:0.2
0 'setosa | seplen:5.0 sepwid:3.3 petlen:1.4 petwid:0.2
1 'versicolor | seplen:7.0 sepwid:3.2 petlen:4.7 petwid:1.4
1 'versicolor | seplen:6.4 sepwid:3.2 petlen:4.5 petwid:1.5
1 'versicolor | seplen:6.9 sepwid:3.1 petlen:4.9 petwid:1.5
1 'versicolor | seplen:5.5 sepwid:2.3 petlen:4.0 petwid:1.3
1 'versicolor | seplen:6.5 sepwid:2.8 petlen:4.6 petwid:1.5
1 'versicolor | seplen:5.7 sepwid:2.8 petlen:4.5 petwid:1.3
1 'versicolor | seplen:6.3 sepwid:3.3 petlen:4.7 petwid:1.6
1 'versicolor | seplen:4.9 sepwid:2.4 petlen:3.3 petwid:1.0
1 'versicolor | seplen:6.6 sepwid:2.9 petlen:4.6 petwid:1.3
1 'versicolor | seplen:5.2 sepwid:2.7 petlen:3.9 petwid:1.4
1 'versicolor | seplen:5.0 sepwid:2.0 petlen:3.5 petwid:1.0
1 'versicolor | seplen:5.9 sepwid:3.0 petlen:4.2 petwid:1.5
1 'versicolor | seplen:6.0 sepwid:2.2 petlen:4.0 petwid:1.0
1 'versicolor | seplen:6.1 sepwid:2.9 petlen:4.7 petwid:1.4
1 'versicolor | seplen:5.6 sepwid:2.9 petlen:3.6 petwid:1.3
1 'versicolor | seplen:6.7 sepwid:3.1 petlen:4.4 petwid:1.4
1 'versicolor | seplen:5.6 sepwid:3.0 petlen:4.5 petwid:1.5
1 'versicolor | seplen:5.8 sepwid:2.7 petlen:4.1 petwid:1.0
1 'versicolor | seplen:6.2 sepwid:2.2 petlen:4.5 petwid:1.5
1 'versicolor | seplen:5.6 sepwid:2.5 petlen:3.9 petwid:1.1
1 'versicolor | seplen:5.9 sepwid:3.2 petlen:4.8 petwid:1.8
1 'versicolor | seplen:6.1 sepwid:2.8 petlen:4.0 petwid:1.3
1 'versicolor | seplen:6.3 sepwid:2.5 petlen:4.9 petwid:1.5
1 'versicolor | seplen:6.1 sepwid:2.8 petlen:4.7 petwid:1.2
1 'versicolor | seplen:6.4 sepwid:2.9 petlen:4.3 petwid:1.3
1 'versicolor | seplen:6.6 sepwid:3.0 petlen:4.4 petwid:1.4
1 'versicolor | seplen:6.8 sepwid:2.8 petlen:4.8 petwid:1.4
1 'versicolor | seplen:6.7 sepwid:3.0 petlen:5.0 petwid:1.7
1 'versicolor | seplen:6.0 sepwid:2.9 petlen:4.5 petwid:1.5
1 'versicolor | seplen:5.7 sepwid:2.6 petlen:3.5 petwid:1.0
1 'versicolor | seplen:5.5 sepwid:2.4 petlen:3.8 petwid:1.1
1 'versicolor | seplen:5.5 sepwid:2.4 petlen:3.7 petwid:1.0
1 'versicolor | seplen:5.8 sepwid:2.7 petlen:3.9 petwid:1.2
1 'versicolor | seplen:6.0 sepwid:2.7 petlen:5.1 petwid:1.6
1 'versicolor | seplen:5.4 sepwid:3.0 petlen:4.5 petwid:1.5
1 'versicolor | seplen:6.0 sepwid:3.4 petlen:4.5 petwid:1.6
1 'versicolor | seplen:6.7 sepwid:3.1 petlen:4.7 petwid:1.5
1 'versicolor | seplen:6.3 sepwid:2.3 petlen:4.4 petwid:1.3
1 'versicolor | seplen:5.6 sepwid:3.0 petlen:4.1 petwid:1.3
1 'versicolor | seplen:5.5 sepwid:2.5 petlen:4.0 petwid:1.3
1 'versicolor | seplen:5.5 sepwid:2.6 petlen:4.4 petwid:1.2
1 'versicolor | seplen:6.1 sepwid:3.0 petlen:4.6 petwid:1.4
1 'versicolor | seplen:5.8 sepwid:2.6 petlen:4.0 petwid:1.2
1 'versicolor | seplen:5.0 sepwid:2.3 petlen:3.3 petwid:1.0
1 'versicolor | seplen:5.6 sepwid:2.7 petlen:4.2 petwid:1.3
1 'versicolor | seplen:5.7 sepwid:3.0 petlen:4.2 petwid:1.2
1 'versicolor | seplen:5.7 sepwid:2.9 petlen:4.2 petwid:1.3
1 'versicolor | seplen:6.2 sepwid:2.9 petlen:4.3 petwid:1.3
1 'versicolor | seplen:5.1 sepwid:2.5 petlen:3.0 petwid:1.1
1 'versicolor | seplen:5.7 sepwid:2.8 petlen:4.1 petwid:1.3
Advertisements
This entry was posted in Machine Learning. Bookmark the permalink.