Neural Network Data Normalization and Encoding

Even though neural networks have been studied for decades, there are many fundamental topics that seem to be not well understood by the software developer community. One of these fundamental topics is neural network data normalization and encoding. I wrote an article on the topic and it appears in the July 2013 issue of Visual Studio Magazine. See

Normalization and encoding are really two related but different topics. Data normalization is applied to numeric input data. For example, if you are trying to predict a person’s political party affiliation from their age, sex, annual income, and religious affiliation, you might want to normalize the age and annual income. The idea is that ages are values like 25 and 42 but annual incomes are values like 45,000.00 and 30,000.00. Normalizing data converts all numeric values so that they are typically between -10.0 and +10.0. The most common forms of data normalization are min-max, and Gaussian.

I generally use Gaussian normalization which first computes the mean and standard deviation of a data set, then normalizes by taking each data value, subtracting the mean, and then dividing by the standard deviation. For example, suppose you are working with men’s heights, in inches. Suppose the mean of the heights is 67.0 inches and the standard deviation of the heights is 4.0 inches. A man whose raw height is 69.0 inches would have Gaussian normalized height of (69.0 – 67.0) / 4.0 = 0.5 inches.

Data encoding deals with categorical and binary data. For example, political party affiliation might have possible values (democrat, republican, other) and sex might have values (male, female). Encoding converts categorical and binary data into numeric values because neural network work only with numeric values. The most common forms of encoding are minus-one-plus-one, 1-of-C, 1-of-(C-1), and effects. Encoding is surprisingly tricky and even among my colleagues at Microsoft Research, there are some differences of opinion on how to encode categorical and binary data.


This entry was posted in Machine Learning. Bookmark the permalink.