Suppose you want to predict a person’s annual income based on their number years of experience, age, number years education, and so on. In classical statistics it’s common to spend a lot of time on feature engineering — deciding which predictors to use and which to not use, and creating derived predictors from raw predictors. One example might be creating an “age-education” variable which is the square root of the age times the years of education.
But in neural prediction systems it’s quite rare to perform lots of feature engineering. The idea is that during training, the neural system will figure out which predictors aren’t important and assign very small weights, and because of the neural activation function, non-linear combinations of predictor values are being created.
This morning (as I write this post) I decided to do some feature engineering on the airline passenger dataset to verify that it doesn’t work well. This is a time series regression problem where goal is to predict the number of airline passengers. A data setup for straightforward approach looks like:
|curr 1.12 1.18 1.32 1.29 |next 1.21 |curr 1.18 1.32 1.29 1.21 |next 1.35 |curr 1.32 1.29 1.21 1.35 |next 1.48 . . .
The first line means there were (112,000 118,000 132,000 129,000) passengers in months 1-4 and 121,000 passengers in month 5. Using this approach gives pretty good results with a standard neural network, and not-as-good results using a more sophisticated LSTM recurrent network. I created a feature engineering derived dataset:
|curr 1.12 1.18 1.32 1.29 |next_pct 1.0804 |next_raw 1.21 |curr 1.18 1.32 1.29 1.21 |next_pct 1.1441 |next_raw 1.35 |curr 1.32 1.29 1.21 1.35 |next_pct 1.1212 |next_raw 1.48 . . .
Instead of predicting the raw passenger count, I predicted the percentage increase based on the first value in the sequence. The first line means that in month 5, the passenger count was 1.0804 times 1.12, which is 1.21.
Anyway, after thrashing around a bit with a PyTorch LSTM network I got some results. The results are a bit difficult to interpret but overall the feature engineering approach I tried doesn’t appear like a promising approach — as expected.
Things like this happen all the time. In the field of machine learning, you spend a lot of time creating systems that just don’t work well. An important mindset for success is dealing with the failures that are much more common than the successes.
Dealing with failure is common across many fields. My friends who are in sales have the ability to let their form of failure (not making a sale) not affect them. Good baseball players fail more than half the time when batting but don’t dwell on the failures. And so on.
There is a lot of research evidence that indicates that women fear failure much more than men. For example, see “Gender Differences in Fear of Failure amongst Engineering Students” by Nelson, Newman, McDaniel, Buboltz. This fear causes women to quickly drop out of computer science and engineering classes. On the other hand, fashion models seem to have little fear of fashion failure.