Recap of the 2015 Seattle PyData Conference

I spoke at the 2015 PyData Conference from July 24-26 in Redmond, Washington (near Seattle). I estimate there were about 1,000 attendees. The PyData Conference is all about the use of the Python programming language for data analysis. See http://seattle.pydata.org/.

The conference was hosted by Microsoft and the event was held at the Microsoft conference center. I speak at quite a few conferences and overall, I was quite impressed by PyData in terms of the quality of the speakers, the range of session topics, the conference organization and logistics, and the number of attendees.

WelcomeBannerSmall

My talk was “Swarm Intelligence Optimization using Python”. I described and demonstrated the three main types of swarm optimization algorithms. The session had about 120 people. Before I started speaking, I asked the attendees where they were worked. From a show of hands I estimate about half worked at Microsoft and the other half were from other companies.

HoodBakerRoomSmall

The swarm optimization talk was very well received, at least based on the enthusiastic questions during and after the session. To be honest, this didn’t really surprise me, because when I speak at conferences I make a point of trying to speak on topics that are interesting and understandable regardless of an attendee’s background.

MeTallkingToShahrokhSmall

The 2015 PyData Conference was fairly business-like and serious, but there was a nice little mini-Expo area where companies like O’Reilly Media, Continuum Analytics, and Dato had information booths. And there was a social event on Saturday night in nearby downtown Bellevue, Wash.

My bottom line is that if you use Python, or are interested in the field of Data Science, you should consider attending PyData 2016 (if there is one.)

Posted in Conferences | Leave a comment

Mini-Batch Neural Network Training

I wrote an article titled “Variation on Back-Propagation: Mini-Batch Neural Network Training” in the July 2015 issue of Visual Studio Magazine. See https://visualstudiomagazine.com/articles/2015/07/01/variation-on-back-propagation.aspx.

A neural network is a complicated math function that has many constant values called weights that, along with the input values, determine the output values. Training a neural network is the process of finding the values of the weights. This is accomplished by using a set of training data that has known input values and known, correct output values.

MiniBatchTraining

There are many algorithms to train a neural network. By far the most common is the back-propagation algorithm. Back-propagation works by calculating a set of values called the gradients. Gradients are calculus derivatives that indicate how to adjust the current set of weight values so that when the NN is fed the training data input values, the calculated output values get closer to the known correct output values. There is one gradient value for each NN weight.

There are three variations of back-propagation. The first variation is called batch training. In this variation, all the training items are used to calculate the weight gradients, and then each weight value is adjusted.

The second variation is called online, or stochastic, training. In this variation, the gradients are calculated for each individual training item (giving an estimate of the gradients for the entire data set), and then each weight value is adjusted using the estimated gradients.

The third variation is called mini-batch training. In this variation, a batch of training items is used to compute the estimated gradients, and then each weight value is adjusted using the estimated gradients.

For example, suppose a NN has 86 weights. If there are 500 training items then batch training computes the 86 gradients using all 500 training items and then updates each weight, so there’d be one set of updates for one pass through the data set.

In online (stochastic) training, the 86 gradients are estimated using one of the 500 data items at a time, and after each estimate, all 86 weights are updated, and so there’d be 500 sets of updates for one pass through the training data.

For mini-batch training, if the batch size is set to 100, then 100 training items are used to estimate the 86 gradients, and then the 86 weight would be updated. This would happen 500 / 100 = 5 times so there’d be 5 sets of updates for one pass through the training data.

There has been much research and discussion about which form of back-propagation works best. In my opinion, based on the research I’ve seen, there’s no one best approach and so the best approach, batch, online, or mini-batch, depends on the problem under investigation.

Posted in Machine Learning | Leave a comment

Recap of the 2015 OSCON (Open Source Conference) in Portland

I spoke at the 2015 OSCON (Open Source Conference) in Portland, Oregon, the week of July 20 – 24. I’d estimate there were about 4000 people at OSCON (attendees, speakers, exhibitors). I gave a talk titled, “Solve Optimization Problems using Swarm Intelligence” where I described some clever algorithms based on biological systems such as flocks of birds and colonies of honeybees. About 100 people attended my session. I demoed the algorithms using the Python language.

03_PreTalkPalmsUp

My biggest impression of OSCON was that it had a ton of energy. The talk topics were wildly diverse which makes sense because open source is in programming languages, data, IT, and everything else. There was a lot of energy and excitement in the conference Expo too. I’d guess there were at least 100 booths and tables featuring everything from the largest companies (Microsoft, HP, IBM, etc.) down to tiny user groups.

ExpoArea

Another impression from OSCON was that, as I expected, I saw many stereotypical open source people. By that I mean there were far more tattoos, piercings, dyed hair, strange clothing choices, and so on, than I normally see in my day to day working world. But there were a lot of people like me at OSCON too — boring khaki pants and a button-down collared shirt from Sears.

05_ViewOfRoomAndAttendees

Sometimes I wonder about the motivation for people who clearly go out of their way to dress in a non-conformal mode. In the end, I decided that any generalization would be useless. But I love to people watch and believe me, OSCON was a people watchers gold mine.

06_PreviewingSlides

I talked to quite a few attendees and other speakers and everyone I talked to was enjoying themselves and felt the event was a good use of time. I haven’t spoken at OSCON for several years. At this 2015 edition of the conference, the event seemed a bit more mainstream — and I mean that in a good way. Not many angry hippy anarchists. I liked the bulletin boards where people posted job listings and announcements, and I liked the self-organizing birds of a feather feature where attendees set up their own discussions in the evenings.

I was somewhat surprised at how many non-technical talks there were. I’m not a fan of listening to things like “Build Your Open Source Resume” and “Selling Open Source 101″, but I’ll bet many attendees do like such topics and they served to increase the wild range of talk topics at OSCON.

JamesDrawingMicrosoftLogo

The bottom line is that I highly recommend OSCON. If you’re in the open source community, OSCON is almost a must-attend event. But even if you’re not an active user of open source software, I can recommend OSCON because it’s wildly interesting and you’ll likely learn something you can apply anywhere.

Posted in Conferences | Leave a comment

The Java Substring Function vs. the C# Substring Function

It’s not uncommon for me to port C# code to Java, or Java code to C#. The two languages have many similarities but there are quite a few differences to deal with. For example, both languages have sub-string functions to extract a portion of a string, but the functions are slightly different.

The Java version of sub-string takes two int parameters that indicate the inclusive start index and the exclusive end index. For example, suppose string s has 10 characters and is “0123456789”. The Java call to String t = s.substring(0,3) means extract the characters at indices [0, 3) = [0, 2] so t = “012”. And the Java call to String u = s.substring(2,5) means extract the characters at [2, 5) = [2, 4] so u = “234”.

The C# version of sub-string takes two int parameters that indicate the inclusive start index and the total length of the resulting string. The C# call to string t = s.Substring(0,3) means extract 3 characters starting at index 0 so t = “012”. The C# call to string u = s.Substring(2,5) means extract 5 characters starting at index 2 so u = “23456”.

JavaVsCSharpSubstring

Notice that when the first parameter is 0, the Java and C# functions give the same result. But if the first parameter is not 0, the Java and C# functions give different results.

When I’m porting Java code to C#, I often use a program defined C# method that simulates how the Java substring method works so I don’t have to deal with indices:

static string JavaStyleSubstring(string s, int beginIndex,
  int endIndex)
{
  // simulates Java substring function
  int len = endIndex - beginIndex;
  return s.Substring(beginIndex, len);
}

This approach isn’t very efficient but is useful if performance isn’t a major concern.

Posted in Machine Learning, Miscellaneous | Leave a comment

Neural Networks with WEKA Quick Start Tutorial

Here’s a quick (should take you about 15 minutes) tutorial that describes how to install the WEKA machine learning tool and create a neural network that classifies the famous Iris Data set.


1. Go to the WEKA Web site by doing an Internet search or navigating directly to http://www.cs.waikato.ac.nz/ml/weka/. Click on the Download link.

01_WekaWebSite


2. Your machine almost certainly has Java installed on it so click on the Windows x64 self-extracting executable without the Java VM link. You can use the x86 version too. If, for some reason your machine does not have Java, either install Java or click a with-Java-VM install link.

02_DownloadWeka


3. You will be directed to sourceforge.net. Click on the Run option in the IE pop-up dialog to run the install program.

03_AllowWekaToInstall


4. The install has 7 mini-dialog boxes. You can accept all defaults and just click through the screens.

04_InstallBegin

05_LicenseAgreement

06_FullInstall

07_InstallLocation

08_StartMenuFolder

09_InstallComplete

10_FinishAndLaunch


5. Assuming you left the “Start Weka” checkbox checked, the Weka GUI Chooser mini-program will launch. If Weka doesn’t automatically launch, you can find it in the Start Menu or do a search for “Weka”. On the GUI Chooser, click on the Explorer button to get to the actual WEKA program. (This process is kind of strange and confuses many people who are new to WEKA).

11_Explorer


6. Create a data file to analyze by launching Notepad, then copy-paste the data at the bottom of this blog post into Notepad and then save on your machine (I used location C:\Data\WekaData\) as file IrisData.arff. Alternatively, you can find this data on the Internet. You may run into trouble with invisible control characters; try again or try using WordPad or Word.

The goal is to create a neural network that classifies an iris flower as one of three species (setosa, versicolor, or virginica) based on four numeric values (sepal length and width, and petal length and width). (A sepal is a leaf-like structure).

12_CreateDataFile


7. Click on the Open File tab then navigate to your data file and click the Open button.

13_OpenDataFile


8. If the data file loads correctly, WEKA will automatically show you summary information about your data. Click on the Classify tab to start creating a neural network.

14_DataFileHasBeenLoaded


9. Click on the Choose button — WEKA has many tools. Under the “functions” folder, select the “MultilayerPerceptron” item. This is what WEKA calls a neural network.

15_SelectClassifier


10. In the Test Options area, select the “Percentage split” option and set it to 80%. You are telling WEKA to use 80% of your 150-item data set (120 items) to create the neural network and to use the remaining 20% (30 items) to evaluate its accuracy. Click the “Start” button. WEKA will create a neural network. The resulting “Classifier output” area has all kinds of information. The most important is the “Correctly Classified Instances”.

16_SplitAndStartRun


== IrisData.arff file:

@RELATION Iris

@ATTRIBUTE sepallength	NUMERIC
@ATTRIBUTE sepalwidth 	NUMERIC
@ATTRIBUTE petallength 	NUMERIC
@ATTRIBUTE petalwidth	NUMERIC
@ATTRIBUTE class 	{Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2,3.5,1.5,0.2,Iris-setosa
5.2,3.4,1.4,0.2,Iris-setosa
4.7,3.2,1.6,0.2,Iris-setosa
4.8,3.1,1.6,0.2,Iris-setosa
5.4,3.4,1.5,0.4,Iris-setosa
5.2,4.1,1.5,0.1,Iris-setosa
5.5,4.2,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.0,3.2,1.2,0.2,Iris-setosa
5.5,3.5,1.3,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
4.4,3.0,1.3,0.2,Iris-setosa
5.1,3.4,1.5,0.2,Iris-setosa
5.0,3.5,1.3,0.3,Iris-setosa
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
4.8,3.0,1.4,0.3,Iris-setosa
5.1,3.8,1.6,0.2,Iris-setosa
4.6,3.2,1.4,0.2,Iris-setosa
5.3,3.7,1.5,0.2,Iris-setosa
5.0,3.3,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
6.3,3.3,4.7,1.6,Iris-versicolor
4.9,2.4,3.3,1.0,Iris-versicolor
6.6,2.9,4.6,1.3,Iris-versicolor
5.2,2.7,3.9,1.4,Iris-versicolor
5.0,2.0,3.5,1.0,Iris-versicolor
5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor
6.1,2.9,4.7,1.4,Iris-versicolor
5.6,2.9,3.6,1.3,Iris-versicolor
6.7,3.1,4.4,1.4,Iris-versicolor
5.6,3.0,4.5,1.5,Iris-versicolor
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
5.6,2.5,3.9,1.1,Iris-versicolor
5.9,3.2,4.8,1.8,Iris-versicolor
6.1,2.8,4.0,1.3,Iris-versicolor
6.3,2.5,4.9,1.5,Iris-versicolor
6.1,2.8,4.7,1.2,Iris-versicolor
6.4,2.9,4.3,1.3,Iris-versicolor
6.6,3.0,4.4,1.4,Iris-versicolor
6.8,2.8,4.8,1.4,Iris-versicolor
6.7,3.0,5.0,1.7,Iris-versicolor
6.0,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1.0,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1.0,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
6.0,3.4,4.5,1.6,Iris-versicolor
6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor
5.6,3.0,4.1,1.3,Iris-versicolor
5.5,2.5,4.0,1.3,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor
6.1,3.0,4.6,1.4,Iris-versicolor
5.8,2.6,4.0,1.2,Iris-versicolor
5.0,2.3,3.3,1.0,Iris-versicolor
5.6,2.7,4.2,1.3,Iris-versicolor
5.7,3.0,4.2,1.2,Iris-versicolor
5.7,2.9,4.2,1.3,Iris-versicolor
6.2,2.9,4.3,1.3,Iris-versicolor
5.1,2.5,3.0,1.1,Iris-versicolor
5.7,2.8,4.1,1.3,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica
4.9,2.5,4.5,1.7,Iris-virginica
7.3,2.9,6.3,1.8,Iris-virginica
6.7,2.5,5.8,1.8,Iris-virginica
7.2,3.6,6.1,2.5,Iris-virginica
6.5,3.2,5.1,2.0,Iris-virginica
6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica
6.4,3.2,5.3,2.3,Iris-virginica
6.5,3.0,5.5,1.8,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
6.0,2.2,5.0,1.5,Iris-virginica
6.9,3.2,5.7,2.3,Iris-virginica
5.6,2.8,4.9,2.0,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
6.3,2.7,4.9,1.8,Iris-virginica
6.7,3.3,5.7,2.1,Iris-virginica
7.2,3.2,6.0,1.8,Iris-virginica
6.2,2.8,4.8,1.8,Iris-virginica
6.1,3.0,4.9,1.8,Iris-virginica
6.4,2.8,5.6,2.1,Iris-virginica
7.2,3.0,5.8,1.6,Iris-virginica
7.4,2.8,6.1,1.9,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
6.4,2.8,5.6,2.2,Iris-virginica
6.3,2.8,5.1,1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica
6.9,3.1,5.4,2.1,Iris-virginica
6.7,3.1,5.6,2.4,Iris-virginica
6.9,3.1,5.1,2.3,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica
%
Posted in Machine Learning

Linear Regression using C#

I wrote an article titled “Linear Regression using C#” in the July 2015 issue of MSDN Magazine. See https://msdn.microsoft.com/en-us/magazine/mt238410.aspx.

Linear regression (LR) is one of the most fundamental and important types of statistical analysis. In LR, the goal is to analyze the relationship between a single numeric variable, and one or more predictor variables (which can be either numeric or categorical). For example, my MSDN article creates a dummy data set in order to analyze the relationship between a person’s annual income (the variable to predict is called the dependent variable in LR terminology) and the person’s education level, work experience, and sex (the predictor variables are called the independent variables).

LinearRegressionGraphWithBorder

There are surprisingly few examples available on the Internet that show how to code linear regression using a general purpose programming language like C#. I think this is due to the fact that coding LR is somewhat tricky, requiring specialized knowledge of statistics, and that there are many canned functions available (such as in Excel, the R language, SAS, SPSS, and so on).

Knowing how to implement LR analysis in code can be useful to a programmer in at least two ways. First, coding LR into a software system is sometimes necessary and external tools or libraries might not be feasible. Second, by knowing how to code LR, a developer gains full understanding of exactly how LR works and its strengths and limitations.

LinearRegressionDemo

Posted in Machine Learning

Finding the Inverse of a Matrix using Swarm Optimization

Just for fun, I thought I’d see if I could find the inverse of a matrix using swarm optimization. Finding the inverse of a matrix is a very common task in many algorithms and there are many proven techniques, so finding an inverse using swarm optimization isn’t really useful. But I wanted to see if it could be done.

The answer is yes, it is possible. Swarm optimization algorithms loosely mimic the coordinated behavior of groups of simple individuals such as flocks of birds. Each particle represents a possible solution to the problem at hand. There are multiple possible solutions and they move, virtually, towards better solutions.

SwarmMatrixInverse

I wrote a demo program. The demo sets up a 5×5 matrix with random values between -10 and +10. I computed the true 5×5 inverse of the random matrix using standard techniques. Then I applied a particle swarm technique I wrote. The inverse calculated by the particle swart technique was quite close to the true inverse.

There’s only one scenario where I can imagine the swarm matrix inverse technique might be useful. As it turns out, some matrices have no inverse, and for some matrices standard techniques just don’t work. In these situations swarm optimization could be used to estimate the inverse. Anyway, it was an interesting exploration.

Posted in Machine Learning