Suppose you want to create a neural network that can perform binary classification. A standard benchmark example is the Banknote Data Set where the goal is to predict if a banknote is authentic or a forgery based on four statistical properties of an image of the banknote.
You can use either the one-node technique or the two-node technique. Using the one-node technique you set up the training data and a one-output-node neural network so that 0 represents one class (authentic) and 1 represents the other class (forgery). The neural network will output a single value between 0.0 and 1.0. If the computed output value is less than 0.5 the prediction is class 0, otherwise, if the computed output is greater than or equal to 0.5, the prediction is class 1.
Using the two-node technique, you encode one class as (0, 1) (authentic) and the other class (1, 0) (forgery). The neural network has two output nodes where the values sum to 1.0, for example (0.65, 0.35). If the first value is smaller than the second, your prediction maps to (0, 1) which maps to class 0 which maps to authentic. Otherwise your prediction maps to (1, 0) which maps to class 1 which maps to forgery.
So, which technique is better? I ran some experiments and concluded there’s no significant technical advantage to one technique over the other. The two-node technique requires you to train twice as many hidden-to-output weights but on each training iteration you improve your model more quickly than in a training iteration of the one-node technique. Put another way, training the one-node model is faster per iteration but you must train longer to get good results.
Bottom line: Both the one-node and two-node technique for binary classification work fine and picking one technique over the other is a matter of subjective preference.