When I was first learning about neural networks, one topic that gave me difficulty was understanding how and when to manipulate the training data. There were many individual examples but they were all independent and it took me some time to piece all the information together.
I wrote an article “How to Standardize Data for Neural Networks” in the January 2014 issue of Visual Studio Magazine that summarizes all the guidelines for normalizing numeric data and encoding categorical data. See http://visualstudiomagazine.com/articles/2014/01/01/how-to-standardize-data-for-neural-networks.aspx.
For example, suppose you are trying to predict a person’s political party affiliation (democrat, republican, independent, other) from age (a number between 18 and 120), sex (male or female), annual income (a number like 45,000.00), and location (urban, suburban, rural). So, the first line of training data might be:
30 male $38,000.00 urban democrat
Because neural networks work only with numbers, the training data must be converted to give something like:
-1.23 -1.0 -1.34 ( 0.0 1.0) (0.0 0.0 0.0 1.0)
Here the age of 30 has been normalized to -1.23, the sex of male has been encoded as -1.0, the income of $38,000.00 has been normalized to -1.34, location of urban has been encoded as (0.0, 1.0), and the y-value of democrat has been encoded as (0.0, 0.0, 0.0, 1.0).