Suppose you have some historical data that shows the total number of airline passengers, every month, for a few years. The goal of time series regression is to predict the number of passengers in the next month.
Time series regression is typically very difficult. My usual approach is to use a rolling window of data, with a neural network. But I’ve been exploring the use of “current-next” data, instead of rolling data, with an LSTM network, instead of a regular neural network.
The idea is that rolling window data explicitly has a memory of previous data used to predict the next data value. But LSTM networks have a sort of memory that’s automatically built in to their architecture.
Put more concretely, a rolling window data file might resemble:
|predictors 1.12 1.18 1.32 1.29 |passengers 1.21 |predictors 1.18 1.32 1.29 1.21 |passengers 1.35 |predictors 1.32 1.29 1.21 1.35 |passengers 1.48 . . .
And current-next data for an LSTM could look like:
|curr 1.12 |next 1.18 |curr 1.18 |next 1.32 |curr 1.32 |next 1.29 |curr 1.29 |next 1.21 |curr 1.21 |next 1.35 |curr 1.35 |next 1.48 . . .
Coding an LSTM network is very, very tricky. I first tried using plain CNTK, my preferred deep neural network library. However, I gave up after a few hours and switched to using the Keras library over CNTK. Keras is easier to use than CNTK directly.
My initial results were hard to interpret. I used the first 120 months to train the model. The model does very well early on, but fades noticeably on the last 24 months of test data. There may be over-fitting going on. When I get some time, I’ll experiment by reducing the number LSTM memory cells (I used 12) and adding in a drop-out layer (a standard anti-over-fitting technique).
Another thing I want to try is to use windowed-data with an LSTM with the idea that more explicit predictor information might augment the implicit predictor information stored in an LSTM’s memory state.