Graphing the Michalewicz Function

The Michalewicz function is a strange math function that is sometimes used to test the effectiveness of numerical optimization algorithms. In math terms, the function is:


The function can accept one or more input values. The function is tricky because there are several local minimum values and several flat areas which make the one global minimum value hard to find for algorithms. I decided to graph the Michalewicz function in with two variables, x and y; therefore the dimension of the function is two, but the graph appears in 3D.

I used SciLab, which is a free alternative to the very pricey MatLab. The SciLab statements I used to create the graph were:

-->z=-( (sin(x) .* (sin((1 * x.^2)/%pi)).^20) +
         (sin(y) .* (sin((2 * y.^2)/%pi)).^20) );

It took me quite some time to correctly translate the math version of the function into the corresponding SciLab statement, mostly getting my parentheses correct. The first statement above sets up X and Y values from 0 to 3.2 (just a bit less than of pi) every 0.05. The second statement defines the Michalewicz function for two variables. Statements three, four, and five create the graph shown below. For two variables, x and y, the global minimum value is approximately z = -1.8013 when x = 2.20319 and y = 1.57049.


Posted in Machine Learning | Leave a comment

My Top Ten Favorite Awkward Family Photos

Most of my blog posts are pretty technical but these photos from the Web site at are too hilarious not to re-post. Here are my top 10 favorites from their Hall of Fame section. Click on each image to get a larger view for the full, disturbing effect. Be sure to read the captions too.

1. Is her name Tippi?

2. Party hat boy

3. Sparkler child with exposed propane tank.

4. This will end badly.

5. The Mona Lisa of disturbing photos.

6. It took me a moment to see the other one . . .

7. Cue Twilight Zone theme music. . .

8. This left me speechless.

9. Best Easter Bunny ever. Ever.

10. He brought his best friend to the photography studio.

Posted in Miscellaneous, Top Ten | Leave a comment

Training Neural Networks using Multi-Swarm Optimization

I wrote an article titled, “Using Multi-Swarm Training on Your Neural Networks” in the February 2015 issue of Visual Studio Magazine. See

You can think of a neural network as a complicated mathematical equation that has some numeric constants, called weights and biases, that must be determined so that the network can make predictions. Determining the values of the weights and biases is called training the network.


Training is done by using a set of data that has known input and output values. Training tries different values for the weights and biases, trying to find values so that the neural network’s computed output values are very close to the known, correct output values in the training data.

There are several algorithms that can be used to train a neural network. The most common is a calculus based technique called the back-propagation algorithm. An alternative is particle swarm optimization (PSO). PSO loosely models the behavior of groups, such as schools of fish and flocks of birds.

Multi-swarm optimization (MSO) extends PSO by using several swarms of particles instead a just a single swarm. Using multiple swarms prevents the training process from getting stuck at a good, but not optimal, solution for the values of the weights and biases.

Posted in Machine Learning | Leave a comment

Workshop on Neural Networks at the 2015 Visual Studio Live Conference

Visual Studio Live is one of my favorite conferences for software developers who use Microsoft technologies. I will be giving an all-day workshop on neural networks at the upcoming VS Live event, March 16-20, 2015 in Las Vegas. See VS Live is run by 1105 Media, one of the most respected names in technology media.


In the workshop, I will teach attendees everything they need to know in order to create neural networks using the C# language and the Visual Studio tool. I’ll assume that attendees have intermediate level coding skill but I won’t assume they know anything at all about neural networks.

Some of the workshop topics are:

Overview – What is a neural network?
Feed-Forward – The NN input-process-output mechanism
Normalization – Preparing Data for NN analysis
Back-Propagation – Training a NN to make predictions
Particle Swarms – An advanced training technique

Each topic will have a complete, production quality demo program to experiment with, and a short quiz to test understanding of the topic. Here are a couple example quiz questions:

1. Which of the following is the best way to encode a binary
 predictor variable?

A. False = 0, True = 1
B. False = 1, True = 0
C. False = -1, True = +1
D. False = 0, True = 9

3. Suppose you are trying to predict a single numeric value, such
 as a person's credit score, based on predictor variables such as
 annual income, age, and so on. This type of problem is called
 neural network regression (instead of classification).
 Why is softmax not used for NN regression?

A. Because there's only one output node,
 softmax would always return 0.
B. Because there's only one output node,
 softmax would always try to divide by 0.
C. Because there's only one output node,
 softmax would always return 1.
D. None of the above - softmax works fine for NN regression
 problems with one output node.  

If you want to get up to speed with neural networks, consider attending the 2015 Visual Studio Live conference in Las Vegas.

Posted in Conferences, Machine Learning | 2 Comments

L1 and L2 Regularization for Machine Learning

I wrote an article titled, “L1 and L2 Regularization for Machine Learning” in the January 2015 issue of Microsoft MSDN Magazine. See

The most difficult part of L1 and L2 regularization is understanding what they are, as opposed to understanding how to write code that implements them. Briefly, many forms of machine learning are essentially math equations that can be used to make predictions. The two most prominent examples are called neural network classification, and logistic regression classification. The underlying math equations have numeric constants, like 3.45, that are called weights.


Training a classifier is the process of finding the values of the weights. This is done by using a set of training data that has known input values and output values. Training tries different values for the weights so that, for the training data, the computed outputs closely match the known correct outputs.

Unfortunately, if you train long enough it’s almost always possible to find a set of values for the weights so that the computed outputs match the training outputs almost perfectly. But when you use the weights on new, previously unseen data with unknown output values, to make predictions, the predictions are very poor. This is called over-fitting – the weights fit the training data too well.

One characteristic of weight values that are over-fitted is that the values tend to be large. L1 and L2 regularization restrict the values of the weights. L1 regularization penalizes the sum of the absolute values of the weights. L2 regularization penalizes the sum of the squared values of the weights.

L1 regularization sometimes has a nice side effect of pruning out unneeded features by setting their associated weights to 0.0 but L1 regularization doesn’t easily work with all forms of training. L2 regularization works with all forms of training, but doesn’t give you implicit feature selection. In practice, you must use trial and error to determine which form of regularization (or neither) is better for a particular problem.

Posted in Machine Learning

Graphing Rastrigin’s Function using the R Language

Rastrigin’s function is a standard benchmark used to test the effectiveness of numerical optimization algorithms. The R language is used most often for statistical kinds of work. The R language and environment are somewhat similar to MatLab, however R is open source and MatLab is very pricey (in my opinion).

R has been around since the mid-1990s. I don’t use R very often. I was surprised when Microsoft acquired the Revolution Analytics company in January 2015. RA focuses mainly on integrating R with Hadoop. Anyway, the acquisition means that R is now an informal part of the Microsoft technologies ecosystem.


So, I decided to dust off my R and take a look at the most recent version. I installed R and decided to generate a 3D surface plot of Rastrigin’s function. The R commands I used were:

1. x<-seq(-5.12,5.12,length=100)

2. y<-x

3. f<-function(x,y) { 20+(x^2-10*cos(2*3.14*x))+
   (y^2-10*cos(2*3.14*y)) }

4. z<-outer(x,y,f)

5. z[]<-1

6. persp(x,y,z,theta=30,phi=30,expand=0.5,col="red",

The first command sets up an array named x with 100 values ranging from -5.12 to +5.12. The second command makes a copy of the x values in a new array named y. The third statement defines a function f which implements Rastrigin’s function. The fourth statement uses the built-in outer function, which accepts two arrays and a function, to generate a matrix named z. The fifth statement is loosely, “if any value in matrix z is not available, put a value of 1 there.”

The sixth statement, calls the persp (“perspective plot”) to generate the 3D graph. The persp function has a zillion parameters and I only used a few.

I can’t say I’m a huge fan of either R or MatLab. However, they are both very nice when I need to create a 3D graph. The main weakness with R graphs is that it takes a little bit extra effort to make a color gradient. Notice the graph is all red which makes some of the hills and valleys hard to see. I’ll show how to do a color gradient surface plot version of Rastrigin’s function in another post.

Posted in Machine Learning

The 2015 Interop Conference in Las Vegas

The Interop conference is one of the largest IT conferences in the world. I’ll be speaking at the 2015 event, April 27 through May 1, in Las Vegas. If you are an IT person you should consider trying to get your organization to send you. See Use the code SPEAKERVIP and you can get 25% off the regular price. That’s a huge discount.

The Interop conference has been running since 1988. That’s an incredibly long time for a technology conference, which tells me Interop is doing something right. This year, I bet there’ll be well over 10,000 attendees, and well over 170 exhibitor companies.


My talk is titled, “Solving Business Problems with Neural Networks”. I’ll describe what neural networks are, the kinds of problems they can solve, and discuss some practical aspects of how companies can actually use neural networks.

I am primarily a software researcher and developer so the IT world is slightly out of my area. But I often speak at Microsoft IT events like the Microsoft Management Summit (MMS) and TechEd. Both of these conferences, and a few others, have been combined into the new Microsoft Ignite conference.


IT conferences like Interop have a very different feel to them than conferences intended more for software developers (such as Visual Studio Live and DevIntersection). It’s hard for me to articulate the difference but IT events tend to be more serious, in a funny kind of way. I think this is related to the notion that IT guys tend to have a defensive mindset (“we must protect the network!”) but developers tend to have a creative streak (“let’s make a cool application!”).

On the other hand, the Expos at IT events are always a lot more fun than those at developer conferences.

Interop will be at the Mandalay Bay in Las Vegas — one of my favorite places for conferences. If you go to the 2015 Interop conference, be sure to seek me out and say hello. Interop 2015 – Highly recommended!

Posted in Conferences