I wrote an article titled “Consensus Classification using C#” in the November 2014 issue of Microsoft MSDN Magazine. See http://msdn.microsoft.com/en-us/magazine/dn818500.aspx. Classification and its partner, prediction, are the most fundamental forms of machine learning.
A classification problem is one where you create a system that can predict a value that can be one of two or more discrete classes. For example, you might want to predict whether a person is a political Democrat or a Republican, based on the person’s previous voting record.
There are many different machine learning classification techniques, including neural network classification, logistic regression classification, and decision tree classification. In the MSDN magazine article I present a classification technique that isn’t a standard one.
The idea of consensus classification is to take existing data, with known input and output values, and then instead of creating one very complex rule to determine the output (as in neural networks and logistic regression), generate many very short and simple rules. To predict the output for new input data, the system uses all the applicable simple rules and then the final prediction is the consensus.
For example, suppose there are 100 rules similar to, “if person voted yes on issue #3 and no on issue #12 then person is a Democrat.” Then, for some voting record, if 60 of the simple rules predict Republican, and 30 of the rules predict Democrat, and 10 of the rules aren’t relevant, the final prediction is Republican because that the consensus (majority) opinion.
The consensus classification system I present is an example of what is sometimes called “ensemble learning”. I note in the article that there is no single best classification/prediction technique. Consensus classification has worked well for me in some rare situations where standard techniques just aren’t very effective.