I wrote an article in the March 2017 issue of Visual Studio Magazine titled “Revealing Secrets with R and Factor Analysis”. See https://visualstudiomagazine.com/articles/2017/03/01/revealing-secrets-r-factor-statistics.aspx.
Factor Analysis is a classical statistics technique that analyzes data to determine if some set of observed data can be explained by a smaller set of “latent variables.” The idea is rather subtle and is best explained by example. In my article I create a set of a fake movie preference data for 20 people. Each person rates how much they like each of seven movies: “Forbidden Planet”, Dark City”, The Hangover”, “Meet the Parents”, “Ben Hur”, “Gladiator”, and “Galaxy Quest”.
The first two movies (“Forbidden Planet” and “Dark City”) are science fiction. The next two are comedies. The two after that are historical. The last movie, “Galaxy Quest”, is both science fiction and comedy.
A factor analysis can tell you if people’s movie preferences are related to the latent variable, genre. If so, then you could use that information to predict the preference of some new movie by the people in the data.
Performing factor analysis with R is very easy. The harder part is interpreting the results. For my dummy data, the key part of the R results is:
Loadings: Factor1 Factor2 Factor3 ForbiddenPlanet -0.141 0.987 TheHangover 0.930 -0.205 MeetTheParents 0.798 -0.174 -0.226 BenHur -0.216 -0.142 0.964 Gladiator -0.484 -0.182 0.665 GalaxyQuest 0.591 0.557 -0.488 DarkCity 0.761 -0.273
Notice “Forbidden Planet” and “Dark City” have high values of the “Factor2” latent variable (which we know to be “science fiction”). Similarly, “The Hangover” and “Meet the Parents” correspond to a “Factor1”, and “Ben Hur” and “Gladiator” correspond to a “Factor3”.
Factor analysis isn’t too common in the hard sciences, but it’s used fairly often in fields such as psychology and marketing.