A radial basis function network is similar to a neural network. Both accept some numeric inputs, and generate some numeric outputs and can be used to make predictions. Training a radial basis function network involves three major steps. In the first step, a set of centroids is determined, one centroid for every hidden node. In the second major step, a set of widths is determined, one width value for each hidden node. In the third step, a set of weight constants is determined.

Suppose you want to create an RBF network that performs classification and prediction where the x independent, predictor variables are a person’s age, annual income, sex (male or female), and political party affiliation (democrat, republican, other), and the y-value to be predicted is credit-worthiness (low, medium, high). A single raw training data item for a 45-year old male who makes $48,000.00 per year and is a Republican, and who has medium credit-worthiness might look like:

45 48000.00 Male R -> Medium

RBF networks work only with numeric data. A 10-item set of training data that where the numeric x-data has been normalized, the categorical x-data has been 1-of-(C-1) effects-encoded, and the categorical y-value has been 1-of-C encoded could look something like this:

age income sex politics credit ==================================================== 1.0 -0.5 -1.0 0.0 1.0 0.0 1.0 0.0 -0.8 1.2 +1.0 1.0 0.0 1.0 0.0 0.0 2.1 -0.3 -1.0 -1.0 -1.0 0.0 1.0 0.0 -1.7 -2.0 -1.0 1.0 0.0 1.0 0.0 1.0 0.5 0.7 +1.0 0.0 1.0 0.0 0.0 1.0 1.5 0.6 -1.0 0.0 1.0 0.0 1.0 0.0 -1.3 -1.1 +1.0 1.0 0.0 1.0 0.0 0.0 1.1 0.5 -1.0 -1.0 -1.0 0.0 1.0 0.0 -1.3 2.2 -1.0 1.0 0.0 1.0 0.0 1.0 0.9 -0.4 +1.0 0.0 1.0 0.0 0.0 1.0

Suppose M represents the number of training data items, each input data item has NI input x-values, each output value has NO values, and the number of hidden nodes is NH. For this example, M = 10, NI = 5, and NO = 3. The number of hidden nodes does not directly depend on the training data. Suppose the number of hidden nodes is set to four, so NH = 4.

The first step in RBF network training is to find NH centroids. A centroid can be thought of as a set representative x-values. There are many ways to find centroids. Two common technique are using simple random selection, and using k-means (or k-medoid) clustering. In simple random selection you would randomly select four of the training items and extract their x-values. For example, if the random selection process selected training items [0], [2], [4], and [6], then the x-values from those training items would be used as centroids: (1.0, -0.5, -1.0, 0.0, 1.0) would be the first of the four centroids and (0.9, -0.4, 1.0, 0.0, 1.0) would be the fourth centroid.

The widths are values that represent how far the centroids are from each other. One simple approach is to compute the average Euclidean distance (square root of the sum of squared differences between vector components) between all centroids and use this as a common width value for all four hidden nodes.

The third step in training an RBF network is to find the set of weights and bias values that create a network whose outputs best match those of the training data. In theory this can be done quickly and easily because there are, loosely speaking, n equations with n unknowns and so classical matrix-based methods can be used. Unfortunately, in practice matrix-based techniques often fail and so are not often used. There are many possible techniques to estimate the best weights and bias values, including gradient descent, and particle swarm optimization.

To summarize, there are many options available to implement RBF network training. Maybe too many options, and this may be part of the reason why RBF networks don’t seem to be used as much as other classification techniques such as neural networks and support vector chines.

I am writing an article that presents a complete RBF network training example. It is tentatively scheduled to appear in MSDN Magazine in December 2013.